Rolling deployments & maintaining quorum #227

a3kov · 2023-09-30T08:47:17Z

a3kov
Sep 30, 2023

With Khepri we have state in the application itself, so for deployments to work we have to ensure the data is moved to the new build every time. That raises many questions such as:

if our deployment is not cluster-aware, do we need to share files between releases and make sure 2 builds are not running at the same time ?
in a cluster aware deployment , how do we maintain the quorum if we only have 1 node ? are we forced to have 3 beam instances even if we only have 1 hardware/vm node ?
when doing rolling deployment, how to monitor that the quorum is maintained between rolling out a new build for every node ?

In general, I think its good to have a discussion where we can share ideas and best practices on this topic.

Answered by michaelklishin

Oct 2, 2023

@a3kov there isn't really a whole lot of room for debate. This RabbitMQ upgrade guide explains how multi-node Raft-base systems are upgraded. In fact, most distributed systems are upgraded in a similar fashion, where a minority or overwhelming minority (say, 1 or 3 depending on how many nodes you have in total) of nodes are upgraded and restarted at any given moment, then then next batch, then the next.

In RabbitMQ, there are CLI and HTTP API endpoints that allow you to check if the node is quorum-critical, that is, if restarting it will lead some quorum queues or streams without a quorum.
Nothing like that exists of Khepri but you can quite easily steal the idea.

View full answer

michaelklishin · 2023-10-02T14:37:21Z

michaelklishin
Oct 2, 2023
Maintainer

Can you please elaborate on the "not cluster aware" part? Khepri was built for deployments with 3, 5, 7 nodes, and so on.

Single node deployments of Khepri in production is something I see little value in. There is no such thing as a "cluster-aware one node deployment".

Multi-node deployments should follow a pretty standard rolling upgrade procedure.

0 replies

michaelklishin · 2023-10-02T14:40:54Z

michaelklishin
Oct 2, 2023
Maintainer

@a3kov there isn't really a whole lot of room for debate. This RabbitMQ upgrade guide explains how multi-node Raft-base systems are upgraded. In fact, most distributed systems are upgraded in a similar fashion, where a minority or overwhelming minority (say, 1 or 3 depending on how many nodes you have in total) of nodes are upgraded and restarted at any given moment, then then next batch, then the next.

In RabbitMQ, there are CLI and HTTP API endpoints that allow you to check if the node is quorum-critical, that is, if restarting it will lead some quorum queues or streams without a quorum.
Nothing like that exists of Khepri but you can quite easily steal the idea.

0 replies

a3kov · 2023-10-02T14:41:56Z

a3kov
Oct 2, 2023
Author

Sure. By not cluster aware I mean if you don't set up a cluster (during deployment new build doesn't connect to the old one).

Single node deployments of Khepri in production is something I see little value in. There is no such thing as a "cluster-aware one node deployment".

That's a pity. I see it could be useful if one starts with one node later growing to a cluster. The advantage would be not having to rewrite the code. How many nodes are in the cluster should be handled by the DB itself and not be the client's concern, imo

1 reply

michaelklishin Oct 2, 2023
Maintainer

Well Khepri stores its data both on disk and in memory, so if you want to use a single node deployment, you can. You just have to re-attache the volume where the data directory resides between deployments, and perhaps make sure that the directory does not use any deployment-specific variables, such as the hostname.

dumbbell · 2023-10-10T12:27:59Z

dumbbell
Oct 10, 2023
Maintainer

@a3kov: Are you talking about the Khepri library in general, or are you talking about RabbitMQ with Khepri enabled?

3 replies

a3kov Oct 10, 2023
Author

Khepri in general. RabbitMQ case is covered by RMQ docs as was noted above

a3kov Oct 10, 2023
Author

I'm interested in using it in Elixir apps. I understand that its beta software atm, but I think the future is bright for this tech especially with the corporate backing in RMQ. I will be following the development

michaelklishin Oct 10, 2023
Maintainer

RabbitMQ with Khepri enabled cannot behave meaningfully differently from standalone Khepri. So the approach outlined in RabbitMQ docs should work, or at the very least serve as a good starting point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rolling deployments & maintaining quorum #227

{{title}}

Replies: 4 comments 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Rolling deployments & maintaining quorum #227

a3kov Sep 30, 2023

Replies: 4 comments · 4 replies

michaelklishin Oct 2, 2023 Maintainer

michaelklishin Oct 2, 2023 Maintainer

a3kov Oct 2, 2023 Author

michaelklishin Oct 2, 2023 Maintainer

dumbbell Oct 10, 2023 Maintainer

a3kov Oct 10, 2023 Author

a3kov Oct 10, 2023 Author

michaelklishin Oct 10, 2023 Maintainer

a3kov
Sep 30, 2023

Replies: 4 comments 4 replies

michaelklishin
Oct 2, 2023
Maintainer

michaelklishin
Oct 2, 2023
Maintainer

a3kov
Oct 2, 2023
Author

michaelklishin Oct 2, 2023
Maintainer

dumbbell
Oct 10, 2023
Maintainer

a3kov Oct 10, 2023
Author

a3kov Oct 10, 2023
Author

michaelklishin Oct 10, 2023
Maintainer