-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
serdect: safer bytes handling #1112
Conversation
Perhaps you can leave it alone for now and we can examine arrays separately in a followup. I'd also be curious to know which formats we currently test against this impacts and what the impact is, i.e. is this a mostly backwards compatible change or are there formats other than MessagePack where this is a breaking change. Note that if this is a breaking change it will take quite awhile (i.e. months) to roll out as |
It is a breaking change for CBOR because the tags for an array and a bytestring are different (see the change in the test). Bincode seems to be unaffected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also add an assert!()
in the proptest!
s checking that the length remains the same?
EDIT: I think we should also add 0xFF
to the test bytes of the other formats as well.
@tarcieri this is definitely a breaking change because CBOR's output is changed.
Also apparently CBOR exhibits the same issue as MessagePack (which is also why it did change it's output). I think we should definitely fix that for arrays as well.
Since it's a breaking change, please bump the Also please be aware since it's a breaking change that we can't update Alternatively consider adding new APIs and deprecating the old ones. |
Re: array changes, ugh, that just really really sucks. It is just very very unfortunate have to break everyone's serializations to add a completely redundant length prefix to work around deficient format designs/implementations. That's going to be a very ugly breaking change with wide-ranging impact, which will needlessly make a whole slew of messages longer, and all serializations that don't include the redundant length prefix incompatible with the newer, needlessly longer format. Is there really no better solution? It seems like an absolute last resort. Perhaps it's worth inquiring upstream for potential alternative solutions? Any of you volunteering to support and explain the changes to people who will inevitably ask why we're breaking all existing serializations of arrays? |
Also what's the solution for people who have existing message serializations? Are they just screwed and have to serialize everything? |
Just to clarify, imho it's not a deficiency in the implementations but in Serde itself. If we communicate we want a tuple, they will use their tuple encoding. Serde just doesn't offer a way to communicate homogeneous fixed-size sequences.
As pointed out above as far as I understand the implementations themselves can't do much about it. I was initially thinking they could detect that the tuple is homogeneous and use their array encoding ... but that's not right either, because some users require specific formats e.g. implementing a specific protocol, that then would break. It could otherwise be done by making it a non-standard option, e.g. CBOR/MessagePack could have an option to turn off varint encoding in specific cases. Otherwise it's up to Serde, which isn't gonna happen anytime soon I assume.
I'm happy to respond to issues and help with support (as long as it's Rust related).
I think so? But I guess that's the point of breaking changes?
I honestly don't see a good way around this. |
Since the array serializer under this scheme would lose the advantage of not having a length prefix, perhaps it should be deprecated (or made into a thin wrapper around a slice deserializer). |
Breaking changes can still have a migration path, and we're not really offering one here |
I think the argument that @fjarri made before is that the array serializer does bound checking, which the slice one does not.
I honestly can't think of one. It would require some sort of versioning support, which we don't have in place either. It's a bleak outlook I agree :/ |
Looking at Serde it actually did have support for this in the past but was removed in favor of tuples. Related:
In addition Serde also uses |
As far as I understand it's a deficiency of specific formats. For MessagePack at the very least, probably for CBOR too.
Yes, I would like something that I can use with arrays with the least amount of boilerplate. Ideally, something I can just put into
There won't be that many users affected after all, the only thing that breaks is CBOR + By the way, I did not want to cause all this frustration. I just wanted something I could use in place of https://github.com/nucypher/rust-umbral/blob/master/umbral-pre/src/serde_bytes.rs for my crates without copying it everywhere. I thought |
I guess we should try to fix it, including arrays, though it will take quite some time to get it rolled out |
I'm pretty sure this is incorrect, MessagePack only uses varint encoding because Serde communicates this array as a tuple, which MessagePack handles differently. The same can be said about CBOR, which functions as desired when we are using the proper encoding that CBOR does provide.
I consider this whole issue a bug. We misused Serde's API, is how I see it. So thanks for figuring this all out, reporting it and helping us understand it! |
How would you represent a fixed-size Because that's what I meant when I said that it's a deficiency of formats themselves. If you consider a length specifier to be acceptable for fixed-size arrays, then yes, it is a problem on the |
I assume with "length specifier" you mean varint encoding? That would be the "bin 8" format. Which is the format The problem is that So what I'm trying to get here is, that If you meant with length specifier just the length of the byte string ... then AFAIK MessagePack can't do that. |
No, the information about the length of the following array. Like, for a Messagepack
Well, it could do an array with a length specifier, and |
serdect/README.md
Outdated
The table below lists the crates `serdect` is tested against. | ||
❌ marks the cases for which the serialization is not constant-size for a given data size (due to the format limitations), and therefore will not be constant-time too. | ||
|
||
| Crate | `array` | `slice` | | ||
|--------------------------------------------------------------------|:--------:|:--------:| | ||
| [`bincode`](https://crates.io/crates/bincode) v1 | ✅ | ✅ | | ||
| [`ciborium`](https://crates.io/crates/ciborium) v0.2 | ❌ | ✅ | | ||
| [`rmp-serde`](https://crates.io/crates/rmp-serde) v0.2 | ❌ | ✅ | | ||
| [`serde-json-core`](https://crates.io/crates/serde-json-core) v0.5 | ✅ | ✅ | | ||
| [`serde-json`](https://crates.io/crates/serde-json) v1 | ✅ | ✅ | | ||
| [`toml`](https://crates.io/crates/toml) v0.7 | ✅ | ✅ | | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are good to go to fix arrays in this PR as well, please correct me if I'm wrong @tarcieri, so this table could probably just list crates that we test against.
Well, Bincode doesn't actually use varint encoding by default, that's why it doesn't hit that problem. Additionally, Bincode's varint encoding doesn't affect So if you used Bincode with varint encoding, a Generally speaking you want to use varint encoding if you are dealing with individual integers, but not integer strings. Which is exactly what Serde doesn't expose, but both MessagePack and CBOR support (and Bincode would too). |
@fjarri are you still interested in pushing this PR over the finishing line? |
I am, but I am currently away on a vacation, and will only be near a
computer next Thursday. Perhaps this can be merged as is, and arrays
treated in a subsequent PR.
…On Fri, Jul 21, 2023 at 03:27 daxpedda ***@***.***> wrote:
@fjarri <https://github.com/fjarri> are you still interested in pushing
this PR over the finishing line?
According to tarcieri, #1112 (comment)
<#1112 (comment)>,
we have the green light to fix arrays as well now.
—
Reply to this email directly, view it on GitHub
<#1112 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAHQAB5DSWTN6333V4D4BDXRJDRNANCNFSM6AAAAAAZN7X3QE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I'd prefer to save this for the next breaking release cycle, so maybe you can finish it up before we merge (no rush) |
Since serde does not support uniform serialization of fixed-sized arrays.
Made the changes for the arrays. Not sure if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay!
Just a couple of documentation and test nits.
Would also like to see an assert!
in the proptests to make sure the length actually stays the same. Basically testing what this PR is fixing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I forgot to mention one last change I would like to see: move the Visitor
implementation into it's own file. Now that it's equally shared between slice and array implementation, it doesn't make sense that it's living in the slice
module.
Would also be interesting if we could move ExactLength
and UpperBound
into their respective modules.
a155af0
to
d8e666a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
PR #1112 contained breaking changes. To reflect that, this bumps the `serdect` version to a prerelease. Note that there is not a crate release associated with this bump. If there is a prerelease, it will be `v0.3.0-pre.0`.
PR #1112 contained breaking changes. To reflect that, this bumps the `serdect` version to a prerelease. Note that there is not a crate release associated with this bump. If there is a prerelease, it will be `v0.3.0-pre.0`.
pub fn deserialize_hex_or_bin<'de, D>(buffer: &mut [u8], deserializer: D) -> Result<&[u8], D::Error> | ||
pub fn deserialize_hex_or_bin<'de, D>(buffer: &mut [u8], deserializer: D) -> Result<(), D::Error> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fjarri running into problems with this change when upgrading this crate.
This API in particular is for deserializing variable-length data, however removing the &[u8]
return value means the amount of data that was actually deserialized was lost.
buffer
is intended to hold a backing buffer which is as large as the deserialized data can possibly be, but it may be smaller, and the return value is there to return the actual amount of data deserialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a tracking issue: #1322
See #1111 for the discussion leading to this PR.
Note: this is a breaking change in the ABI.
Previous behavior: byte slices were serialized with the default
serde
implementation for[T]
withT = u8
. For some formats like MessagePack or CBOR it leads to array serialization, and separateu8
values get treated as packed integers - that is, every value over 127 is prefixed (0xCC in MessagePack, 0x18 in CBOR). This leads to non-constant-time behavior, and also is counter-intuitive, because it results in variable serialized length for the byte slices of the same length.Changes:
serialized_bytes
in theslice
module, funneling to the specific "bytestring" pathways in corresponding formats, which don't do packing.slice::deserialize_hex_bin_vec()
changed accordingly. Note thatslice::deserialize_hex_or_bin()
deserialized from a bytestring already, so it was inconsistent with the serializer (which could be revealed in CBOR tests, if it was used there).In tests, I replaced the 0x0F value with 0xFF to check if the packing occurs. CBOR and MessagePack do packing, so their tests include the packing markers in the tests for array serialization, and the lack of constant-timeness is reflected in the table in Readme.