Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(enum): u16/u32 discriminants #315

Closed
wants to merge 19 commits into from

Conversation

dzmitry-lahoda
Copy link
Contributor

@dzmitry-lahoda dzmitry-lahoda commented Oct 14, 2024

Added support for tag_width to support u16/u32 enums.

tag_width requires use_discriminant = true and repr(u8/u16/u32) to ensure backward and forward compatiblity.

See #315 (comment) for full design

@dzmitry-lahoda dzmitry-lahoda marked this pull request as ready for review October 14, 2024 18:05
@dzmitry-lahoda dzmitry-lahoda changed the title #draft feat(enum): u16 disriminator feat(enum): u16 discriminant Oct 14, 2024
@dzmitry-lahoda
Copy link
Contributor Author

dzmitry-lahoda commented Oct 14, 2024

@frol @dzmitry-lahoda - i have added u16 enum discriminant.

As for EnumExt I retained ability to compile against without change in code of caller.
But as you can observe generated code is ugly.

Alternative is delete EnumExt, so it would not make assumption about discriminant size (compatible with u32 in future).

@dzmitry-lahoda
Copy link
Contributor Author

Seems all tests work except snaps, do not change snaps until have some decision about EnumExt

Cargo.toml Outdated Show resolved Hide resolved
Copy link
Collaborator

@dj8yfo dj8yfo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see little point in generalizing just up to u16.
this has to be generalized all the way up to https://docs.rs/borsh/1.5.1/borsh/schema/type.DiscriminantValue.html .

so i8, u16/i16, u32/i32 and i64 have to be covered (all of these are Into<i64>)
Also taking into account tag_width in https://docs.rs/borsh/1.5.1/borsh/schema/enum.Definition.html#

Probably EnumExt can be left intact, but EnumExtI8, EnumExtU16, EnumExtI16,
EnumExtU32, EnumExtI32, EnumExtI64 can be added instead. Doing that with a macro won't be too space-consuming.

input
.attrs
.iter()
.find(|x| x.path() == REPR)?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, this behaviour should be controlled by borsh-specific item attribute, say #[borsh(enum_repr = "i16")],
not by repr, which is used to control memory layout in program, which is, strictly speaking, independent of how it's serialized for later transfer or saving to disk (which is what serialization is usually used for).
Say, an enum may not have #[repr(u16)] and have all discriminants withing u8 range, but still have #[borsh(enum_repr = "u16")] and use 2 bytes for tag serialization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we have borsh enum_repr, how should borsh handle this

fn main(){
    #[borsh(enum_repr = "u16")]
    #[repr(u32)]
    enum A {
        Q = 1u8,
    }
}

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may use u16 instead of "u16"? may just do borsh(rerp= instead of #[borsh(enum_repr` because it is on enum anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may it be like this?

    #[borsh(repr = u16)]
    enum A {
        Q = 1u8,
    }

and fail compile same way #repr fails because u8 was set (so borsh rerp interpreted types same way/algorithm) as native repr.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may it be like this?

looks good, repr should then check it annotates an enum item.

fails because u8 was set

well, one would've to change 1u8 to 1 to compile with borsh attribute.

Specifying Q = -10 would allow annotating with #[repr(i8)] and #[borsh(repr = i16)] at the same time

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may it be like this?

Still, maybe naming it enum_tag will be better, rust's repr has other subattributes, not related to enums at all

@dj8yfo
Copy link
Collaborator

dj8yfo commented Oct 17, 2024

@dzmitry-lahoda , actually , as https://docs.rs/borsh/1.5.1/borsh/schema/enum.Definition.html has tag_width but doesn't have signed: bool field for Enum variant, and we do not want to break public api for this,
the number of variants should be narrowed: EnumExt (u8), EnumExtI16 (i16), EnumExtI32 (i32), EnumExtI64 (i64),
which should cover most, if not all, practical cases for ser/de of enums with borsh.
tag_width 1would mean u8, 2 - i16, 4 - i32, 8 - i64

@dj8yfo dj8yfo marked this pull request as draft October 17, 2024 11:39
@dzmitry-lahoda
Copy link
Contributor Author

EnumExt (u8), EnumExtI16 (i16), EnumExtI32 (i32), EnumExtI64 (i64)

What is reason have signed instead of unsigned? Can we have all UNsigned? We can convert all singed to "singed" byte by byte.

I am concerned because I do not recall cases when needed negative variants. We have errors enum in protobuf(blockchain domain) - only unsigned, windows winapi/COM return codes - unsigned, http codes - unsigned.

Do not recall places where need singed enums?

@dj8yfo
Copy link
Collaborator

dj8yfo commented Oct 17, 2024

Well, if signed discriminants had been useless, rust wouldn't have added support for them.

Doing it better really calls for adding that tag_signed field into Enum schema definition, but that's a breaking change we don't want to do (now). u8 -> i16 -> i32 -> i64 should cover most if not all use cases one might need in practice.
Adding tag_signed and i8/u16/u32/u64 support may be deffered to a later breakage.

@dj8yfo
Copy link
Collaborator

dj8yfo commented Oct 17, 2024

Alternatively ,existing tag_width can be used to encode sign without breakage: 1 means u8, 1 + 128 would mean i8, 2 would mean u16, 2 + 128 would mean i16, 4 would mean u32, 4 + 128 would mean i32, 8 + 128 would mean i64 . The derived implementation wouldn't accept u64 as it would be problematic to do Into<i64> (DiscriminantValue) in derive proc-macro, unlike other variants, but manual implementation would still be allowed, should someone absolutely need it.
@dzmitry-lahoda what do you think? This looks generic enough and achievable without breakage.

@dj8yfo
Copy link
Collaborator

dj8yfo commented Oct 18, 2024

Alternatively ,existing tag_width can be used to encode sign without breakage

giving an example in runnable playground , in case above phrasing was too confusing

fn main() {
    for int_type in ["u8", "i8", "i16", "u16", "i32", "u32", "i64"] {
        println!("int_type: {}", int_type);
        let tag_width_encoded = forward(int_type);

        let (width, signed) = reverse(tag_width_encoded);

        println!("tag_width: {}, signed: {}", width, signed);
        println!("########");
    }
}

fn forward(int_type: &str) -> u8 {
    let (width, signed): (u8, bool) = match int_type {
        "u8" => (1, false),
        "i8" => (1, true),
        "u16" => (2, false),
        "i16" => (2, true),
        "u32" => (4, false),
        "i32" => (4, true),
        "i64" => (8, true),
        "u64" | _ => panic!("unsupported"),
    };

    let signed_bit = if signed { 2u8.pow(7) } else { 0u8 };

    let tag_width_encoded = width | signed_bit;
    println!("width {}, tag_width_encoded {}", width, tag_width_encoded);
    tag_width_encoded
}

fn reverse(tag_width_encoded: u8) -> (u8, bool) {
    // ############### reverse direction for https://docs.rs/borsh/latest/borsh/schema/struct.BorshSchemaContainer.html#method.validate
    let signed = tag_width_encoded.leading_ones() > 0;

    let width = tag_width_encoded & !(2u8.pow(7));

    (width, signed)
}

@dzmitry-lahoda
Copy link
Contributor Author

Seems OK, supports unsigned and signed and handles backward compatibility. Will update PR to described pattern. Thank you

@dzmitry-lahoda
Copy link
Contributor Author

dzmitry-lahoda commented Oct 27, 2024

After attempt to handle via proposed way, couple observations:

  1. on current master negative discriminant nor more than 256 discriminates are supported
  2. rust discriminant values are tightly coupled with repr. specifically default value for big numbers is isize. if I want u32 for example, I had to set repr(u32).
  3. also rust uses repr limit to limit numeric value too https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=06eb3439953320352a18f7be5fc55b90

Said that, I think right code for now is next:

  1. No ext impl for non u8.
  2. No support for negative.
  3. Deep integration with repr as it was. Btw it is impossible to parse out of variants its limits, only rust knows limits. So best option is to use repr, variants.len and tag_width to see if any conflict.
  4. Schema integration.

All of these changes are backward compatble and as I think forward too.

I do not think proposed design with many traits per type nor suggested hack to support for negative are good solutions.

@dj8yfo
Copy link
Collaborator

dj8yfo commented Oct 28, 2024

on current master negative discriminant nor more than 256 discriminates are supported

negative discriminants are not supported at all on current master (with use_discriminant = true). Only consts/literals, that can be interpreted as u8.

i didn't really get the meaning of points 2. and 3. of the first numbered bullets block, but i created a playground https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=60fcba754c164f666dc3fc2e678d9bd6 to illustrate that same literal can be interpreted as different types depending on containing struct.

No ext impl for non u8.

i'll take a look at your implementation, and maybe scratch up an alternative one, and we'll choose a simpler one

No support for negative.

why not?

Deep integration with repr as it was.

i've created a playground https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=255d369b7ab66eda0dd0afb93ecb8b8a to showcase that repr may be u8 and borsh(enum_tag) - i16, and while the latter may be ineffient if there are no discriminants larger than u8, it may be a forward-looking choice, that accounts for appearance of variant XLater::D later, when repr changes to i16, but serialization of XEarlier::A, XEarlier::B, XEarlier::C doesn't change. All in all, it's a design decision, that i want to do, to stress that borsh(enum_tag) has a different meaning than repr, though both will be in sync in most of the cases.

proposed design with many traits per type

see above. there's no need to rush this, it'a complex feature

suggested hack to support for negative

hack is to express sign without breaking the schema, a separate tag_signed: bool field is clearly a better choice
for when 2.0 borsh is done, or if we need to do a pre<2.0 breakage for another, more important reason

@dzmitry-lahoda
Copy link
Contributor Author

dzmitry-lahoda commented Nov 3, 2024

@dj8yfo thank you for response.

, that i want to do, to stress that borsh(enum_tag) has a different meaning than repr, though both will be in sync in most of the cases.

I do not feel this is fully reasonable, i think in some cases, specifically in my case, they are tightly coupled and cannot be decoupled because of how rust works with enums. i feel that better statement is different meaning but overlapping-conflicting meaning.

In general,
seems things are complicated, specifically when doing backward compatible things.
Current, specifically case of repr(u16) and use_discriminant = true. I cannot interpret it as that borsh must serde into 2 bytes as is, because previous interpretation was u8, until some const A: u16 = 666; ... A = SOME_VALUE added - in this case borsh failed to compile.
So I cannot start from small values like 0,1,2 and assume I can gracefully go to values 256, 257,....

So I think that introducing tag_with in very limited narrow form seems will be backward, forward compatible and be human understandable. And covering cases I have outlined for positive errors/return/version flags.

Specifically, rule to be:
if tag_with added, then repr must be same size and unsigned type and use discriminant to be true.
This rule is backward and forward compatible as I think, so severely limiting, with possible behaviors to be allowed later.
In this case repr(unsigned) indicates that all values are unsinged by default in this case default value of tag_sign would be unsinged (until expcitly specified otherway, but not sure what that would mean, and yet that seems compatible with default users expectation and future compatible).

So:

#[borsh(use_discriminant = true)]
#[repr(u16)]

will serde into 1 byte.

But

#[borsh(use_discriminant = true, tag_width = 2)]
#[repr(u16)]

into 2 bytes.

Next will be errors (new tag_width meta added, leads to new checks - fine) (some of them may be just for now), build fail test cases:

#[borsh(use_discriminant = true, tag_width = 2)]
#[repr(i16)]
#[borsh(use_discriminant = false, tag_width = 2)]
#[repr(u16)]
#[borsh(use_discriminant = true, tag_width = 2)]
#[repr(u8)]
#[borsh(use_discriminant = true, tag_width = 2)]
#[repr(u32)] 
#[repr(C)] 
#[repr(transparent)]  
#[repr(usize)]
#[repr(isize)]

@dzmitry-lahoda dzmitry-lahoda changed the title feat(enum): u16 discriminant feat(enum): u16/u32 discriminants Nov 3, 2024
@dzmitry-lahoda dzmitry-lahoda marked this pull request as ready for review November 3, 2024 23:36
@dj8yfo
Copy link
Collaborator

dj8yfo commented Nov 8, 2024

@dzmitry-lahoda

pleaze update pr from master branch

I do not feel this is fully reasonable, i think in some cases, specifically in my case, they are tightly coupled and cannot be decoupled because of how rust works with enums. i feel that better statement is different meaning but overlapping-conflicting meaning.

i don't won't to reuse rust's repr attribute, which was made for other purpose by other people. it has a ton of other suboptions, which have no relation to current use case in borsh with enums.

please, do this as a separate attribute [borsh(enum_tag = i16)] there's no need whatsoever to mix it with repr(i16) logic.
if you managed to get it working with [borsh(tag_width = 2)] there's only a small step left to get it working
with [borsh(enum_tag = ...)]

if someone finds it impossible to specify literals for both [repr(i..)] and [borsh(i...)] so that it satisfies both types, then we may come up with a syntax to provide a per-variant enum discriminant [borsh(enum_discriminant = LITERAL)] override to have a distinction for syntax that is already used as mixed-purpose. but i doubt it that it is needed or will be needed

doing a mandatory per-variant [borsh(enum_discriminant = LITERAL)] from the very beginning would've been cleaner,
but it would've been much more verbose.

adding a once-per-item #[borsh(enum_tag = ...)] only for cases, when non-default non-u8 tag is needed, isn't that big a change to do. and those cases are supposed to be rare

@dzmitry-lahoda
Copy link
Contributor Author

dzmitry-lahoda commented Nov 8, 2024

I find enum_tag = i16 confusing. And need more specification to understand it.

Concerns

Let have example of enum where user have set repr(u16)

Why would reading variant of discriminant to read into i16 value as per borsh enum_tag type?
So user clearly knows that his in memory and in code representation is u16, and yet borsh in memory representation when discriminant is parsed would be i16. How comparison of i16 as parsed by borsh to be compared with repr u16 values? Is it to be casted/converted somehow(should it be try_from/into or into/from/as)?

From that, why at all enum_tag to be negative and be at all?
If type borsh reads from memory is disconnected/irrelevant/meaningless to what user uses in code(because of repr), than why at allow enum_tag can be negative? So there is no sense in having negative enum_tag at all. Because only reasonable read type is [u8; TAG_WIDTH]. This open possibility to have [u8; 3] for example.

So I think that enum_tag is useless. And there are 2 options.
Either borsh to use [u8; TAG_WIDTH] as data type returned from reading discriminant, or borsh to align with expected by user data type.

What if borsh has u16 and repr has u32? And what is repr has u32 and borsh has u16? What is expected behavior in all parts of borsh serde?

Is enum_tag required or optional? In what cases it is optional and in which required?

Proposal

There is tag_width only, no enum_tag nor signed.
tag_width can be any integer within [0..8].
tag_width can be used only if use_discriminant = true.

If u16 number as in Rust(repr) to be compared to [u8; 3]`, they compared bite by byte in le order. Zero bytes or absent bytes on any side to be considered zero bytes for that comparison.

If during ser, actual rust repr data is bigger than borsh placeholder, ser error.
If during de, actual borsh data is bigger than rust repr, de error.

@dzmitry-lahoda
Copy link
Contributor Author

dzmitry-lahoda commented Nov 8, 2024

@dj8yfo

please take a look into my comments.

I still think my current solution is best. As it avoids need to ANSWER all these questions now. It just forces repr to be in sync with width if new tag_width was set.

You may consider relax it later as needed, including removal any connection to repr.

But forcing tag_witdth to be same as repr, if you want to use new tag - is super safe backward and forward compatible way.

@dj8yfo
Copy link
Collaborator

dj8yfo commented Nov 8, 2024

I find enum_tag = i16 confusing. And need more specification to understand it.

that will be documented

@dj8yfo dj8yfo marked this pull request as draft November 8, 2024 15:22
@dzmitry-lahoda
Copy link
Contributor Author

I closing this PR for now. I think docs should be first, than impl should go as it is unclear what is expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants