JetStreams poor performance even with partitioning #5551
Replies: 8 comments 20 replies
-
tagging @mtmk who can take a look at the nats.net.v2 publisher code |
Beta Was this translation helpful? Give feedback.
-
Do you use async publishing from the clients? |
Beta Was this translation helpful? Give feedback.
-
Do you use async publishing from the clients? |
Beta Was this translation helpful? Give feedback.
-
These numbers seem low, I can do a simple Are the streams replicated? |
Beta Was this translation helpful? Give feedback.
-
ok if that is the case, then the main dominant issue, since you use sync publishes, will be how long the fastest follower responds to the appendEntry from the NRG (NATS raft group). When you do async we can batch multiple ones into AppendEntries. Some batching can also occur with multiple sync publishers on different connections. So what that means is if the RTT say between two server is 10ms, then the theoretical maximum throughput for sync publishes (meaning one message per AppendEntry) will be 1000/s. In this case partitions are probably not doing anything or very little. |
Beta Was this translation helpful? Give feedback.
-
ok if that is the case, then the main dominant issue, since you use sync
publishes, will be how long the fastest follower responds to the
appendEntry from the NRG (NATS raft group).
When you do async we can batch multiple ones into AppendEntries. Some
batching can also occur with multiple sync publishers on different
connections.
So what that means is if the RTT say between two server is 10ms, then the
theoretical maximum throughput for sync publishes (meaning one message per
AppendEntry) will be 1000/s.
In this case partitions are probably not doing anything or very little.
…On Mon, Jun 17, 2024 at 12:33 PM Anton Smolkov ***@***.***> wrote:
@mtmk <https://github.com/mtmk> Thanks for your response.
As i answered below i use synchronous publishing, but from 128 tasks in
parallel.
So it’s equivalent to batch size of 128 from your example.
I tried to expand the amount of in-flight parallel tasks, but with no
success.
I’ll check it again tomorrow and publish results here, but as you can see
- even with current parallelism level NATS brokers CPUs are fully utilized.
—
Reply to this email directly, view it on GitHub
<#5551 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAV74NVAGTJLC3WBS4BWLDZH42ZHAVCNFSM6AAAAABJN6JDICVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TOOJYGM3DA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
There is something not right about your setup because those numbers are not very high for such a large deployment and it’s also strange that you manage to max out a total of 70 CPU cores on all your servers.
I would do some tests varying parameters to try and investigate: eg try for the streams using memory storage, try with replicas=1, try using “nats bench” and see if you see similar numbers or not.
|
Beta Was this translation helpful? Give feedback.
-
Next step would be to schedule a call so we can triage your system. |
Beta Was this translation helpful? Give feedback.
-
Hi there.
During my performance evaluations of JetStream, I noticed subpar results in terms of publishing efficiency and scalability.
It appears that overall publishing performace is low and JetStream doesn't achieve linear scalability through partitioning.
Could this be standard behavior, and might you suggest strategies to enhance JetStream's performance?
Publishing code (Just in case)
https://gist.github.com/AntonSmolkov/e561fa2bf92eaab9c1423c2f818754ed
Envinronment:
v1.25.4
(public cloud, Intel® Xeon® Gold 6230 (Cascade Lake))2.10.16-alpine
NATS.Client.JetStream 2.2.2
Settings:
GOMEMLIMIT: 4000MiB
100 bytes
.JetStreams settings details
Results TL;DR
128 R3 JetStreams overall performace:
Core NATS Pub/Sub performance:
Round 1
Specs
Graphs
expand
NATS:
Producer:
RPS:
Result
Overall result across all broker and producer instances is about 26000 RPS
Broker instaces use ~100% CPU and ~70% RAM resources (25 cores, 18 GiB).
Producer instances use ~70% of CPU and ~30% of RAM (7 CPU cores, 3 GiB)
nats stream reports
confirms that messages are distributed evenlyexpand
Obtaining Stream statsRound 2
Specs
Graphs
expand
NATS:
Producer:
RPS:
Result
Overall result across all broker and producer instances is about 38000 RPS
Broker instaces ~100% CPU and ~70% RAM (50 cores, 33 GiB).
Producer instances use ~60% of CPU and ~25% of RAM (12 CPU cores, 4.6 GiB)
nats stream reports
confirms that messages are distributed evenlyexpand
Round 3
Specs
Graphs
expand
NATS:
Producer:
RPS:
Result
Overall result across all broker and producer instances is about 43500 RPS
Broker instaces use ~100% CPU and ~70% RAM (70 cores, 46 GiB).
Producer instances use ~60% of CPU and ~25% of RAM (15 CPU cores, 6 GiB)
nats stream reports
confirms that messages are distributed evenlyexpand
Round 4 (core pub/sub, for comparison)
(All JetStreams were deleted beforehand)
Specs
Graphs
expand
NATS:
Producer:
RPS:
Result
Overall result across all broker and producer instances is about 7_620_000 (7 million!) RPS
Broker instaces use ~20% CPU and ~6% RAM resources (5 cores, 1.4 GiB).
Producer instances use ~100% of CPU and ~10% of RAM (20 CPU cores, 2 GiB)
Update: NATS Bench resulsts
128 JetStream partitions were set up using this guide
Specs
Nats bench 1 (sync)
Graphs
expand
Result
Overall result across all brokers instances is about 34560 RPS
Broker instaces use ~100% CPU and ~40% RAM (70 cores, 29 GiB).
Producer instance(
nats-box
) uses ~60% of CPU and ~1% of RAM (3 CPU cores, 241 MiB)nats bench 2 (async)
Graphs
expand
Result
Overall result across all brokers instances is about 318720 RPS
Broker instaces use ~90% CPU and ~65% RAM (68 cores, 49 GiB).
Producer instance(
nats-box
) uses ~80% of CPU and ~25% of RAM (4 CPU cores, 1.35 GiB)Beta Was this translation helpful? Give feedback.
All reactions