Rolling deployments & maintaining quorum #227
-
With Khepri we have state in the application itself, so for deployments to work we have to ensure the data is moved to the new build every time. That raises many questions such as:
In general, I think its good to have a discussion where we can share ideas and best practices on this topic. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 4 replies
-
Can you please elaborate on the "not cluster aware" part? Khepri was built for deployments with 3, 5, 7 nodes, and so on. Single node deployments of Khepri in production is something I see little value in. There is no such thing as a "cluster-aware one node deployment". Multi-node deployments should follow a pretty standard rolling upgrade procedure. |
Beta Was this translation helpful? Give feedback.
-
@a3kov there isn't really a whole lot of room for debate. This RabbitMQ upgrade guide explains how multi-node Raft-base systems are upgraded. In fact, most distributed systems are upgraded in a similar fashion, where a minority or overwhelming minority (say, 1 or 3 depending on how many nodes you have in total) of nodes are upgraded and restarted at any given moment, then then next batch, then the next. In RabbitMQ, there are CLI and HTTP API endpoints that allow you to check if the node is quorum-critical, that is, if restarting it will lead some quorum queues or streams without a quorum. |
Beta Was this translation helpful? Give feedback.
-
Sure. By not cluster aware I mean if you don't set up a cluster (during deployment new build doesn't connect to the old one).
That's a pity. I see it could be useful if one starts with one node later growing to a cluster. The advantage would be not having to rewrite the code. How many nodes are in the cluster should be handled by the DB itself and not be the client's concern, imo |
Beta Was this translation helpful? Give feedback.
-
@a3kov: Are you talking about the Khepri library in general, or are you talking about RabbitMQ with Khepri enabled? |
Beta Was this translation helpful? Give feedback.
@a3kov there isn't really a whole lot of room for debate. This RabbitMQ upgrade guide explains how multi-node Raft-base systems are upgraded. In fact, most distributed systems are upgraded in a similar fashion, where a minority or overwhelming minority (say, 1 or 3 depending on how many nodes you have in total) of nodes are upgraded and restarted at any given moment, then then next batch, then the next.
In RabbitMQ, there are CLI and HTTP API endpoints that allow you to check if the node is quorum-critical, that is, if restarting it will lead some quorum queues or streams without a quorum.
Nothing like that exists of Khepri but you can quite easily steal the idea.