Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - NodeToClientV_16 protocol changed unexpectedly #1130

Open
HeinrichApfelmus opened this issue May 31, 2024 · 5 comments
Open

[BUG] - NodeToClientV_16 protocol changed unexpectedly #1130

HeinrichApfelmus opened this issue May 31, 2024 · 5 comments
Labels
bug Something isn't working
Milestone

Comments

@HeinrichApfelmus
Copy link
Contributor

HeinrichApfelmus commented May 31, 2024

Internal/External
External

Summary

The NodeToClientV_16 protocol has changed unexpectedly between two released versions of the cardano-node: version 8.9.2 accepts queries that 8.11.0-sancho rejects, and vice versa.

Steps to reproduce
(Discovered while testing cardano-wallet against Sanchonet. Screenshot of an example query failure below.)

Expected behavior

Changes in protocol semantics should come with a new version, here NodeToClientV_17.

Downstream projects such as cardano-wallet are mainnet-ready and therefore need to be tested against cardano-node versions that are also mainnet-ready, in this case 8.9.2. However, in order to make Sanchonet more useful, it is also desirable to have cardano-wallet work on Sanchonet. However, having a node that is ready for mainnet and a different node that is compatible with Sanchonet now requires us to test against two different node versions — especially if their interface changes.

One can argue that NodeToClientV_16 is experimental and subject to change (that's why it can be disabled in the node configuration file). However, the point of version numbers is that any particular version is not subject to change, e.g. cardano-node-8.9.2 is not subject to change anymore, it only becomes obsolete. Still, there is the matter of granularity: In the case of the NodeToClientVersion type, it was probably felt that adding NodeToClientV_17 would spam the version number space for no good reason. Well, now there is a good reason.

To solve this, I have the feeling that it would be best if NodeToClientVersion simply tracks the version of the ouroboros-consensus library. When negotiating the protocol, a higher version of ouroboros-consensus can choose to explicitly support a lower version, e.g. ouroboros-consensus-0.18 can include code that works with ouroboros-consensus-0.15 as well. I believe that this would semi-automated the version tracking. However, this solution is also too good to be true: The block type may differ between versions of a dependency, which is not tracked in the version of ouroboros-consensus. 🤔 The version number of cardano-node would provide a good tracking reference (and in practice, it is used as such), but that is a downstream package, which imports the NodeToClientVersion type rather than exporting it.

System info

  • OS Name: NixOS, x86_64-linux.
  • Consensus version: ouroboros-consensus-0.14.0.0 vs ouroboros-consensus-0.18.0.0.

Screenshots and attachments

node-to-client

Additional context

This defect was introduced in #1015 .

The change to the semantics of the NodeToClientV_16 protocol happened here:
3fc6d28#diff-28148a724bec997ce56e6643032266b891d3e841987ec42462d88e29245f18ae

A better solution would have been to define CardanoNodeToClientVersion13 and to add a line

  supportedNodeToClientVersions _ = Map.fromList $
      [ 
      , (NodeToClientV_17, CardanoNodeToClientVersion13)
      ]
@HeinrichApfelmus HeinrichApfelmus added the bug Something isn't working label May 31, 2024
@amesgen
Copy link
Member

amesgen commented May 31, 2024

One can argue that NodeToClientV_16 is experimental and subject to change

This is indeed the case; experimental versions currently have no backwards-compatibility guarantees between different node versions. For example, various Conway queries changed all the time (added/removed/different payload) between different node releases, but they were gated behind an experimental NodeToClient version. In particular, it is not possible to use older cardano-cli versions with newer cardano-nodes for Conway-related functionality due to this.

The rationale here is that not changing experimental versions across node releases would come with a huge maintenance overhead: For example, the exact data or format for certain queries might change in any version, and it would introduce a large amount of "instant legacy" code in order to keep every version backwards-compatible with every intermediate iteration of experimental features that are developed over the course of multiple node releases. Most crucially, after the experimental version has been released, nobody will ever use these intermediate versions.

In contrast, we do strive to not break backwards-compatibility for non-experimental versions; in fact, #1015 has extra code to ensure just that.


If I understand you correctly, you would like to guarantee that experimental features work even when using an older cardano-wallet release in combination with newer cardano-node releases, right? Currently, the assumption is that when downstream consumers want to use experimental features, they need to use a version corresponding to the exact cardano-node release they are using. Would this be feasible for cardano-wallet?

@HeinrichApfelmus
Copy link
Contributor Author

HeinrichApfelmus commented May 31, 2024

If I understand you correctly, you would like to guarantee that experimental features work even when using an older cardano-wallet release in combination with newer cardano-node releases, right?

I'm not entirely sure.

Our original motivation was that we want to test cardano-wallet with a single version of the node. By our release-policy, this node must be mainnet-ready. If there is a newer node release which is not mainnet-ready and whose experimental features do not work with the existing version of cardano-wallet, we can't use it to test experimental features. The guarantee you mention would change that — but I think the problem are not the experimental features, but the fact that the released node is not mainnet-ready. This kind of defeats the purpose of experimental features — the entire node is experimental, including old features (here: "mainnet-readiness").

Most crucially, after the experimental version has been released, nobody will ever use these intermediate versions.

Yes, but that is ok. In our case, cardano-wallet expected a particular serialization format for NodeToClientV_16. It would be ok for a newer version of cardano-node, say 8.11.0, to offer both NodeToClientV_15 and NodeToClientV_17, but not NodeToClientV_16 anymore. This would mean that the old version of cardano-wallet doesn't work with the experimental features of cardano-node-8.10.0, but we would get early notification of that fact through version numbers. The supported range of version numbers does not have to be contiguous.

Currently, the assumption is that when downstream consumers want to use experimental features, they need to use a version corresponding to the exact cardano-node release they are using. Would this be feasible for cardano-wallet?

Yes, that is alright — what I'm asking for is to make that fact visible in the version of the feature, as opposed to some other version of the software that includes the feature. 🤔

@amesgen
Copy link
Member

amesgen commented May 31, 2024

Thanks, that makes a lot of sense, I think I now understand your use case better: You are interested in making the experimental version more expressive, by updating them when anything changes, but not keeping the previous behavior around 👍

Our original motivation was that we want to test cardano-wallet with a single version of the node. By our release-policy, this node must be mainnet-ready. If there is a newer node release which is not mainnet-ready and whose experimental features do not work with the existing version of cardano-wallet, we can't use it to test experimental features. The guarantee you mention would change that — but I think the problem are not the experimental features, but the fact that the released node is not mainnet-ready. This kind of defeats the purpose of experimental features — the entire node is experimental, including old features (here: "mainnet-readiness").

One thing that comes to mind here is to test stable features with one (mainnet-ready) version of the node, and the experimental features with another (potentially not mainnet-ready) version of the node.

In some sense, this is even necessary if you want to continue testing experimental features: Suppose we bumped the NodeToClient version to V_17 for the last node release and removed V_16. Then you would have gotten a better error message while testing (instead of the decoder failure), but you still couldn't test the experimental Conway features with your chosen cardano-node version if I understand correctly.

It would be ok for a newer version of cardano-node, say 8.11.0, to offer both NodeToClientV_15 and NodeToClientV_17, but not NodeToClientV_16 anymore. This would mean that the old version of cardano-wallet doesn't work with the experimental features of cardano-node-8.10.0, but we would get early notification of that fact through version numbers. The supported range of version numbers does not have to be contiguous.

That (or something in that line) sounds doable; maybe after a bit of elbow grease as it is currently a bit annoying/laborious to add new NodeToClient versions, but it doesn't have to be that way. Will come back to you on this 👍

@dnadales dnadales added this to the QX milestone Jun 5, 2024
@dnadales dnadales moved this to 🔖 Ready in Consensus Team Backlog Jun 5, 2024
@amesgen
Copy link
Member

amesgen commented Jun 5, 2024

The Consensus team just talked about this, we agreed that bumping the experimental version when there are breaking changes sounds feasible and useful for downstream consumers.

A sketch of how to do this would be to add a special constructor NodeToNode_Experimental to the end of NodeToClientVersion, with a payload (most primitively, an integer) that can be bumped every time we release a new version of the ouroboros-consensus-cardano package with breaking changes of any kind (changes to serializationqueries) to the NodeToClient protocols. Details to figure out are how to integrate this into the existing Handshake mini-protocol, which is owned by the Network team.

While this seems relatively straightforward, we do not yet know when we will be able to allocate some time to work on this.


FTR: Some variations in the idea above that were mentioned:

  • Just add more plain NodeToClientV_X versions. This is rather annoying, as the NodeToClientVersion ADT is defined upstream, in Network.

  • Introduce a separate "Consensus version" that is also negotiated at the beginning of every N2C connection; then we (Consensus) have complete control (in particular on how to resolve this issue) over how to modify it without having to do any coordination with Network.

    Also see [FEAT] - Separately negotiated version number for NTN and NTC block-specific payloads ouroboros-network#4770 as it touches a few similar points in the context of queries.

@HeinrichApfelmus
Copy link
Contributor Author

Thanks! 😊 🙏

While this seems relatively straightforward, we do not yet know when we will be able to allocate some time to work on this.

In both cases, with or without accurate versioning of the experimental protocols, downstream projects will have to test against two different node versions as long as the experimental node version is not mainnet-ready, and doing so will also alleviate the problem. I think it's ok to assign a lesser priority to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 🔖 Ready
Development

No branches or pull requests

3 participants