serdect: safer bytes handling #1112

fjarri · 2023-06-21T00:31:48Z

See #1111 for the discussion leading to this PR.

Note: this is a breaking change in the ABI.

Previous behavior: byte slices were serialized with the default serde implementation for [T] with T = u8. For some formats like MessagePack or CBOR it leads to array serialization, and separate u8 values get treated as packed integers - that is, every value over 127 is prefixed (0xCC in MessagePack, 0x18 in CBOR). This leads to non-constant-time behavior, and also is counter-intuitive, because it results in variable serialized length for the byte slices of the same length.

Changes:

Use serialized_bytes in the slice module, funneling to the specific "bytestring" pathways in corresponding formats, which don't do packing.
slice::deserialize_hex_bin_vec() changed accordingly. Note that slice::deserialize_hex_or_bin() deserialized from a bytestring already, so it was inconsistent with the serializer (which could be revealed in CBOR tests, if it was used there).
Added a table with supported formats to Readme.md.

In tests, I replaced the 0x0F value with 0xFF to check if the packing occurs. CBOR and MessagePack do packing, so their tests include the packing markers in the tests for array serialization, and the lack of constant-timeness is reflected in the table in Readme.

tarcieri · 2023-06-21T00:49:58Z

In tests, I replaced the 0x0F value with 0xFF to check if the packing occurs. Currently the CBOR array tests fail because of it.

Perhaps you can leave it alone for now and we can examine arrays separately in a followup.

I'd also be curious to know which formats we currently test against this impacts and what the impact is, i.e. is this a mostly backwards compatible change or are there formats other than MessagePack where this is a breaking change.

Note that if this is a breaking change it will take quite awhile (i.e. months) to roll out as serdect is a very deep dependency.

fjarri · 2023-06-21T04:56:41Z

It is a breaking change for CBOR because the tags for an array and a bytestring are different (see the change in the test). Bincode seems to be unaffected.

daxpedda

Could we also add an assert!() in the proptest!s checking that the length remains the same?
EDIT: I think we should also add 0xFF to the test bytes of the other formats as well.

@tarcieri this is definitely a breaking change because CBOR's output is changed.
Also apparently CBOR exhibits the same issue as MessagePack (which is also why it did change it's output). I think we should definitely fix that for arrays as well.

serdect/README.md

serdect/src/slice.rs

serdect/tests/cbor.rs

serdect/tests/messagepack.rs

tarcieri · 2023-06-21T12:26:02Z

Since it's a breaking change, please bump the serdect version to v0.3.0-pre.

Also please be aware since it's a breaking change that we can't update elliptic-curve until elliptic-curve v0.14. That will also involve cutting a breaking crypto-bigint v0.6 release. All of that is several months away at least.

Alternatively consider adding new APIs and deprecating the old ones.

tarcieri · 2023-06-21T12:29:19Z

Re: array changes, ugh, that just really really sucks. It is just very very unfortunate have to break everyone's serializations to add a completely redundant length prefix to work around deficient format designs/implementations.

That's going to be a very ugly breaking change with wide-ranging impact, which will needlessly make a whole slew of messages longer, and all serializations that don't include the redundant length prefix incompatible with the newer, needlessly longer format.

Is there really no better solution? It seems like an absolute last resort. Perhaps it's worth inquiring upstream for potential alternative solutions?

Any of you volunteering to support and explain the changes to people who will inevitably ask why we're breaking all existing serializations of arrays?

tarcieri · 2023-06-21T12:41:00Z

Also what's the solution for people who have existing message serializations? Are they just screwed and have to serialize everything?

daxpedda · 2023-06-21T12:58:02Z

It is just very very unfortunate have to break everyone's serializations to add a completely redundant length prefix to work around deficient format designs/implementations.

Just to clarify, imho it's not a deficiency in the implementations but in Serde itself. If we communicate we want a tuple, they will use their tuple encoding. Serde just doesn't offer a way to communicate homogeneous fixed-size sequences.

Perhaps it's worth inquiring upstream for potential alternative solutions?

As pointed out above as far as I understand the implementations themselves can't do much about it. I was initially thinking they could detect that the tuple is homogeneous and use their array encoding ... but that's not right either, because some users require specific formats e.g. implementing a specific protocol, that then would break.

It could otherwise be done by making it a non-standard option, e.g. CBOR/MessagePack could have an option to turn off varint encoding in specific cases.

Otherwise it's up to Serde, which isn't gonna happen anytime soon I assume.

Any of you volunteering to support and explain the changes to people who will inevitably ask why we're breaking all existing serializations of arrays?

I'm happy to respond to issues and help with support (as long as it's Rust related).

Also what's the solution for people who have existing message serializations? Are they just screwed and have to serialize everything?

I think so? But I guess that's the point of breaking changes?

We could just not do this and leave a note in the Readme stating compatibility issues with certain protocols, e.g. CBOR and MessagePack.
Alternatively we could just offer two sets of APIs, but this doesn't really make a whole lot of sense because e.g. if you are using crypto-bigint, you can't really select which one to use. Unless we use crate features or some other not so viable workaround.

I honestly don't see a good way around this.
The only viable one is just to leave it incompatible with certain formats.

tarcieri · 2023-06-21T13:11:29Z

Since the array serializer under this scheme would lose the advantage of not having a length prefix, perhaps it should be deprecated (or made into a thin wrapper around a slice deserializer).

tarcieri · 2023-06-21T13:12:22Z

I think so? But I guess that's the point of breaking changes?

Breaking changes can still have a migration path, and we're not really offering one here

daxpedda · 2023-06-21T13:14:39Z

Since the array serializer under this scheme would lose the advantage of not having a length prefix, perhaps it should be deprecated (or made into a thin wrapper around a slice deserializer).

I think the argument that @fjarri made before is that the array serializer does bound checking, which the slice one does not.

I think so? But I guess that's the point of breaking changes?

Breaking changes can still have a migration path, and we're not really offering one here

I honestly can't think of one. It would require some sort of versioning support, which we don't have in place either.
Really the only migration users can do is just to deserialize all their existing stuff with the old version and serialize it back into the new version.

It's a bleak outlook I agree :/

daxpedda · 2023-06-21T13:47:48Z

Looking at Serde it actually did have support for this in the past but was removed in favor of tuples.
Somebody hit the exact same problem as well: serde-rs/serde#2120 (comment).

fjarri · 2023-06-21T16:18:50Z

Just to clarify, imho it's not a deficiency in the implementations but in Serde itself.

As far as I understand it's a deficiency of specific formats. For MessagePack at the very least, probably for CBOR too. serde can't do much about it.

I think the argument that @fjarri made before is that the array serializer does bound checking, which the slice one does not.

Yes, I would like something that I can use with arrays with the least amount of boilerplate. Ideally, something I can just put into #[serde(with)] annotation, for an array, a vector, or a Box<[u8]> (this currently works for serializers but not for deserializers).

It's a bleak outlook I agree :/

There won't be that many users affected after all, the only thing that breaks is CBOR + slice (in this PR specifically, assuming we don't change arrays to slice behavior). The other tested formats are unchanged.

By the way, I did not want to cause all this frustration. I just wanted something I could use in place of https://github.com/nucypher/rust-umbral/blob/master/umbral-pre/src/serde_bytes.rs for my crates without copying it everywhere. I thought serdect could be that after some adjustments, but if that does not work, I have no problems just making my own crate.

tarcieri · 2023-06-21T17:38:07Z

I guess we should try to fix it, including arrays, though it will take quite some time to get it rolled out

daxpedda · 2023-06-21T21:05:32Z

Just to clarify, imho it's not a deficiency in the implementations but in Serde itself.

As far as I understand it's a deficiency of specific formats. For MessagePack at the very least, probably for CBOR too. serde can't do much about it.

I'm pretty sure this is incorrect, MessagePack only uses varint encoding because Serde communicates this array as a tuple, which MessagePack handles differently. The same can be said about CBOR, which functions as desired when we are using the proper encoding that CBOR does provide.

By the way, I did not want to cause all this frustration. I just wanted something I could use in place of https://github.com/nucypher/rust-umbral/blob/master/umbral-pre/src/serde_bytes.rs for my crates without copying it everywhere. I thought serdect could be that after some adjustments, but if that does not work, I have no problems just making my own crate.

I consider this whole issue a bug. We misused Serde's API, is how I see it. So thanks for figuring this all out, reporting it and helping us understand it!

fjarri · 2023-06-21T21:09:58Z

I'm pretty sure this is incorrect, MessagePack only uses varint encoding because Serde communicates this array as a tuple, which MessagePack handles differently.

How would you represent a fixed-size u8 array without a length specifier in MessagePack format?

Because that's what I meant when I said that it's a deficiency of formats themselves. If you consider a length specifier to be acceptable for fixed-size arrays, then yes, it is a problem on the serdect side and can be fixed. And I guess serde is at fault for not providing ways to say "this byte array is fixed-size" and convey that to format implementations, so that crates like bincode that support it could serialize them without the length specifier.

daxpedda · 2023-06-21T21:44:18Z

How would you represent a fixed-size u8 array without a length specifier in MessagePack format?

I assume with "length specifier" you mean varint encoding? That would be the "bin 8" format. Which is the format rmp-serde uses for encode_bytes(). This encoding does not use any varint encoding.

The problem is that encode_tuple() can't tell rmp-serde that this is a homogeneous array of bytes, that is why it chooses an inappropriate MessagePack format.
In the case of rmp-serde, it uses either the "fixarray", "array 16" or "array 32" format and then encodes every element with it's own format, which in our case is u8. For encoding an individual u8 rmp-serde chooses either "positive fixint", "uint 8", "uint 16", "uint 32" or "uint 64", which is what creates the varint encoding.

So what I'm trying to get here is, that encode_tuple doesn't give the correct information to rmp-serde to make the best decision here. Even though MessagePack doesn't have a dedicated format for fixed size array, it has the "bin" format, which is the right format to use in this case.
If Serde had a serialize_fixed_bytes() function, MessagePack could do the right thing, which in this case is doing the same thing that serialize_bytes() does, because the MessagePack format can't optimize this type any further.

If you meant with length specifier just the length of the byte string ... then AFAIK MessagePack can't do that.
But it should still pick the best format for the task, which is still "bin" and not "array".

fjarri · 2023-06-21T21:48:18Z

I assume with "length specifier" you mean varint encoding? That would be the "bin 8" format.

No, the information about the length of the following array. Like, for a Messagepack "DC0010000102030405060708090A0B0C0D0ECCFF", 0x10 is the length specifier. Bincode does not add that to fixed-size arrays.

If Serde had a serialize_fixed_bytes() function, MessagePack could do the right thing

Well, it could do an array with a length specifier, and bincode could do the actual right thing because the format allows it. But yes, these are my thoughts too. It would be great if serde provided that possibility.

daxpedda · 2023-06-21T21:09:17Z

serdect/README.md

+The table below lists the crates `serdect` is tested against.
+&#x274c; marks the cases for which the serialization is not constant-size for a given data size (due to the format limitations), and therefore will not be constant-time too.
+
+| Crate                                                              | `array`  | `slice`  |
+|--------------------------------------------------------------------|:--------:|:--------:|
+| [`bincode`](https://crates.io/crates/bincode) v1                   | &#x2705; | &#x2705; |
+| [`ciborium`](https://crates.io/crates/ciborium) v0.2               | &#x274c; | &#x2705; |
+| [`rmp-serde`](https://crates.io/crates/rmp-serde) v0.2             | &#x274c; | &#x2705; |
+| [`serde-json-core`](https://crates.io/crates/serde-json-core) v0.5 | &#x2705; | &#x2705; |
+| [`serde-json`](https://crates.io/crates/serde-json) v1             | &#x2705; | &#x2705; |
+| [`toml`](https://crates.io/crates/toml) v0.7                       | &#x2705; | &#x2705; |
+


I think we are good to go to fix arrays in this PR as well, please correct me if I'm wrong @tarcieri, so this table could probably just list crates that we test against.

daxpedda · 2023-06-21T22:19:38Z

If Serde had a serialize_fixed_bytes() function, MessagePack could do the right thing

Well, it could do an array with a length specifier, [..]

rmp-serde does make an array with a length specifier (in MessagePack all arrays use length specifiers), which is the problem, because then every element needs to be encoded separately, which introduces the varint encoding.

[..], and bincode could do the actual right thing because the format allows it.

Well, Bincode doesn't actually use varint encoding by default, that's why it doesn't hit that problem. Additionally, Bincode's varint encoding doesn't affect u8s.

So if you used Bincode with varint encoding, a [u16; N] and used serialize_tuple(), you would have the exact same problem in Bincode. In this case Serde fails even further, because you can't communicate non-u8 slices at all, serialize_bytes() only takes &[u8].

Generally speaking you want to use varint encoding if you are dealing with individual integers, but not integer strings. Which is exactly what Serde doesn't expose, but both MessagePack and CBOR support (and Bincode would too).

serdect/Cargo.toml

daxpedda · 2023-07-21T09:27:39Z

@fjarri are you still interested in pushing this PR over the finishing line?
According to tarcieri, #1112 (comment), we have the green light to fix arrays as well now.

fjarri · 2023-07-21T18:37:10Z

I am, but I am currently away on a vacation, and will only be near a computer next Thursday. Perhaps this can be merged as is, and arrays treated in a subsequent PR.

…

On Fri, Jul 21, 2023 at 03:27 daxpedda ***@***.***> wrote: @fjarri <https://github.com/fjarri> are you still interested in pushing this PR over the finishing line? According to tarcieri, #1112 (comment) <#1112 (comment)>, we have the green light to fix arrays as well now. — Reply to this email directly, view it on GitHub <#1112 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAHQAB5DSWTN6333V4D4BDXRJDRNANCNFSM6AAAAAAZN7X3QE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

tarcieri · 2023-07-21T18:40:28Z

I'd prefer to save this for the next breaking release cycle, so maybe you can finish it up before we merge (no rush)

Since serde does not support uniform serialization of fixed-sized arrays.

fjarri · 2023-08-02T06:07:32Z

Made the changes for the arrays. Not sure if LengthCheck makes sense, or I should have just copy-pasted the implementation for simplicity. Also note that deserialize for slices does not return the reference to the buffer anymore (since it's mutated in place anyway).

daxpedda

Sorry for the delay!

Just a couple of documentation and test nits.
Would also like to see an assert! in the proptests to make sure the length actually stays the same. Basically testing what this PR is fixing.

serdect/src/array.rs

serdect/tests/cbor.rs

serdect/tests/messagepack.rs

daxpedda

Actually I forgot to mention one last change I would like to see: move the Visitor implementation into it's own file. Now that it's equally shared between slice and array implementation, it doesn't make sense that it's living in the slice module.

Would also be interesting if we could move ExactLength and UpperBound into their respective modules.

daxpedda

Thank you!

PR #1112 contained breaking changes. To reflect that, this bumps the `serdect` version to a prerelease. Note that there is not a crate release associated with this bump. If there is a prerelease, it will be `v0.3.0-pre.0`.

tarcieri · 2024-01-09T18:45:40Z

serdect/src/slice.rs

-pub fn deserialize_hex_or_bin<'de, D>(buffer: &mut [u8], deserializer: D) -> Result<&[u8], D::Error>
+pub fn deserialize_hex_or_bin<'de, D>(buffer: &mut [u8], deserializer: D) -> Result<(), D::Error>


@fjarri running into problems with this change when upgrading this crate.

This API in particular is for deserializing variable-length data, however removing the &[u8] return value means the amount of data that was actually deserialized was lost.

buffer is intended to hold a backing buffer which is as large as the deserialized data can possibly be, but it may be smaller, and the return value is there to return the actual amount of data deserialized.

Made a tracking issue: #1322

fjarri force-pushed the serdect-bytes branch from c7d118c to 6b6ca2f Compare June 21, 2023 00:34

fjarri mentioned this pull request Jun 21, 2023

serdect: is it actually constant-time? #1111

Closed

fjarri force-pushed the serdect-bytes branch from 6b6ca2f to de53178 Compare June 21, 2023 06:16

daxpedda suggested changes Jun 21, 2023

View reviewed changes

fjarri added 3 commits June 21, 2023 11:52

serdect: test that no packing is happening for byte values

c7c1ef2

Fix CBOR test

7cd682b

serdect: use serialize_bytes() for serializing bytestrings

56c88f9

fjarri force-pushed the serdect-bytes branch from de53178 to cc18460 Compare June 21, 2023 18:54

fjarri force-pushed the serdect-bytes branch from cc18460 to 3466e57 Compare June 21, 2023 21:52

daxpedda reviewed Jun 21, 2023

View reviewed changes

fjarri force-pushed the serdect-bytes branch from 3466e57 to f9b4ff3 Compare June 21, 2023 21:55

baloo reviewed Jun 24, 2023

View reviewed changes

serdect/Cargo.toml Outdated Show resolved Hide resolved

Add MessagePack tests

686787b

Add a table with tested formats

66a73b1

fjarri force-pushed the serdect-bytes branch from f9b4ff3 to 66a73b1 Compare June 25, 2023 06:31

Make array logic fall back to the slice logic

dbd314b

Since serde does not support uniform serialization of fixed-sized arrays.

daxpedda suggested changes Aug 5, 2023

View reviewed changes

serdect/src/array.rs Outdated Show resolved Hide resolved

serdect/tests/cbor.rs Outdated Show resolved Hide resolved

serdect/tests/messagepack.rs Outdated Show resolved Hide resolved

Implement RFCs

d7cccf4

daxpedda approved these changes Aug 6, 2023

View reviewed changes

daxpedda suggested changes Aug 6, 2023

View reviewed changes

fjarri force-pushed the serdect-bytes branch 2 times, most recently from a155af0 to d8e666a Compare August 6, 2023 20:21

Move visitors to a shared module

4181e81

fjarri force-pushed the serdect-bytes branch from d8e666a to 4181e81 Compare August 6, 2023 20:23

daxpedda approved these changes Aug 7, 2023

View reviewed changes

Merge branch 'master' into serdect-bytes

89a47cd

tarcieri merged commit 57b7d02 into RustCrypto:master Oct 31, 2023
172 checks passed

tarcieri mentioned this pull request Oct 31, 2023

serdect: bump version to v0.3.0-pre #1243

Merged

fjarri deleted the serdect-bytes branch November 1, 2023 06:34

tarcieri mentioned this pull request Nov 1, 2023

hybrid-array: Added serde impls for Array RustCrypto/utils#979

Closed

tarcieri mentioned this pull request Nov 10, 2023

Remove serde support rozbb/rust-hpke#53

Merged

fjarri mentioned this pull request Nov 11, 2023

Reduce message sizes entropyxyz/synedrion#47

Merged

tarcieri reviewed Jan 9, 2024

View reviewed changes

tarcieri mentioned this pull request Jan 9, 2024

serdect: slice::deserialize_hex_or_bin needs to return the amount of deserialized data #1322

Closed

tarcieri mentioned this pull request Apr 1, 2024

add basic serde support for Scalar, G1, G2 with human readable encoding zkcrypto/bls12_381#125

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

serdect: safer bytes handling #1112

serdect: safer bytes handling #1112

fjarri commented Jun 21, 2023 •

edited

Loading

tarcieri commented Jun 21, 2023

fjarri commented Jun 21, 2023

daxpedda left a comment •

edited

Loading

tarcieri commented Jun 21, 2023

tarcieri commented Jun 21, 2023 •

edited

Loading

tarcieri commented Jun 21, 2023

daxpedda commented Jun 21, 2023

tarcieri commented Jun 21, 2023

tarcieri commented Jun 21, 2023

daxpedda commented Jun 21, 2023

daxpedda commented Jun 21, 2023

fjarri commented Jun 21, 2023

tarcieri commented Jun 21, 2023

daxpedda commented Jun 21, 2023

fjarri commented Jun 21, 2023 •

edited

Loading

daxpedda commented Jun 21, 2023

fjarri commented Jun 21, 2023 •

edited

Loading

daxpedda Jun 21, 2023

daxpedda commented Jun 21, 2023

daxpedda commented Jul 21, 2023

fjarri commented Jul 21, 2023 via email

tarcieri commented Jul 21, 2023

fjarri commented Aug 2, 2023

daxpedda left a comment

daxpedda left a comment

daxpedda left a comment

tarcieri Jan 9, 2024

tarcieri Jan 9, 2024

		pub fn deserialize_hex_or_bin<'de, D>(buffer: &mut [u8], deserializer: D) -> Result<&[u8], D::Error>
		pub fn deserialize_hex_or_bin<'de, D>(buffer: &mut [u8], deserializer: D) -> Result<(), D::Error>

serdect: safer bytes handling #1112

serdect: safer bytes handling #1112

Conversation

fjarri commented Jun 21, 2023 • edited Loading

tarcieri commented Jun 21, 2023

fjarri commented Jun 21, 2023

daxpedda left a comment • edited Loading

Choose a reason for hiding this comment

tarcieri commented Jun 21, 2023

tarcieri commented Jun 21, 2023 • edited Loading

tarcieri commented Jun 21, 2023

daxpedda commented Jun 21, 2023

tarcieri commented Jun 21, 2023

tarcieri commented Jun 21, 2023

daxpedda commented Jun 21, 2023

daxpedda commented Jun 21, 2023

fjarri commented Jun 21, 2023

tarcieri commented Jun 21, 2023

daxpedda commented Jun 21, 2023

fjarri commented Jun 21, 2023 • edited Loading

daxpedda commented Jun 21, 2023

fjarri commented Jun 21, 2023 • edited Loading

daxpedda Jun 21, 2023

Choose a reason for hiding this comment

daxpedda commented Jun 21, 2023

daxpedda commented Jul 21, 2023

fjarri commented Jul 21, 2023 via email

tarcieri commented Jul 21, 2023

fjarri commented Aug 2, 2023

daxpedda left a comment

Choose a reason for hiding this comment

daxpedda left a comment

Choose a reason for hiding this comment

daxpedda left a comment

Choose a reason for hiding this comment

tarcieri Jan 9, 2024

Choose a reason for hiding this comment

tarcieri Jan 9, 2024

Choose a reason for hiding this comment

fjarri commented Jun 21, 2023 •

edited

Loading

daxpedda left a comment •

edited

Loading

tarcieri commented Jun 21, 2023 •

edited

Loading

fjarri commented Jun 21, 2023 •

edited

Loading

fjarri commented Jun 21, 2023 •

edited

Loading