Investigate context deadline exceeded with DragonflyDB #389

FZambia · 2024-06-11T06:02:19Z

There was a report in community TG:

Hello. We have a problem with CF. We had CF v5.1.1, Redis cluster, k8s and it conf works fine. We change redis cluster to dragonflydb on our staging and it works fine too. After that we change redis cluster to dragonflydb on our prod and we have a lot of errors error updating presence for channel and error adding presence. We update CF to v5.4.0 but errors is still going. How can we fix it ?

docker.dragonflydb.io/dragonflydb/dragonfly:v1.19.0
Centrifugo v5.4.0

Current assumption is that benchmarks should help to reproduce.

The text was updated successfully, but these errors were encountered:

romange · 2024-06-14T13:10:57Z

Please let us know how it goes

FZambia · 2024-06-22T06:45:53Z

I was unable to reproduce this error with benchmarks, tried on MacOS (Docker) and Ubuntu 24.04 (with Docker and without Docker).

Some other findings:

Redis overperforms Dragonflydb up to order of magnitude in Centrifuge benchmarks. I am explaining this to myself that we use pipelining over a single connection in Centrifuge, and I guess DF batches calls over uring collecting them from different connections. And this means requests coming through a single conn are just waiting more time to be executed. This is actually not bad since in practice we have many Centrifuge nodes and eventually a higher throughput could be achieved potentially.

I quickly tried running 10 instances of benchmarks in parallel, I see that a higher throughput may be achieved with DF in this case, so for me it proves the theory above. Still was far away from Redis throughput though. And CPU was like 450% compared to 100% of Redis. This is the limit for Redis, but it's clear how Redis provides the best throughput it can on a single core.

Also, latencies are very unstable with DF when using several pipelining connections, running the same bench may result into 20k rps, then into 100k rps, then again 20k. While with Redis latencies are stable and benchmark rps is always consistent.

For now I've run presence benchmarks, Centrifuge uses Lua in such requests. It's not very handy to experiment at this point, had to do many manual tweaks, so would be nice to automate various bench conditions.

romange · 2024-06-22T07:46:59Z

@FZambia , thank you for performing these tests. Can you instruct me on how to run a centrifuge benchmark?
Pipelining indeed has an inherent delay in Dragonfly, because each request is being dispatched to possibly another thread and then the connection waits for it to finish before dispatching the next one. Having said, that I would like to see if we have some unexpected bottleneck with this usecase.

FZambia · 2024-06-23T06:11:05Z

Yep - let me prepare sth suitable for reproducing various scenarios in convenient way, now it's not trivial.

But a benchmark which just uses a single connection with pipelining may be run like this after cloning:

docker compose up redis dragonflydb

Redis bench (uses 6379 port):

go test -run xxx -bench BenchmarkRedisAddPresence_ManyCh/rd_single -benchmem -tags integration

Same but with Dragonfly (uses different port - 7379):

go test -run xxx -bench BenchmarkRedisAddPresence_ManyCh/df_single -benchmem -tags integration

Go 1.21 or higher should be installed. These benches run many ops in parallel, all operations are then collected to a single pipeline.

I'll try to find and implement a simple way to run benches which utilize several pipelines instead of one - to quickly experiment how it scales adding more connections.

romange · 2024-06-23T06:25:02Z

I will check it out, thanks!

Does centrifuge usually open a single upstream connection from a single centrifuge process?
DF becomes more efficient when multiple connections "talk" to it.

FZambia · 2024-06-23T20:37:50Z

Usually yes, it uses single pipeline. But that's what I've been talking in comments above - i tried to run multiple conns with separate pipelines and for now could not achieve good results, but I had super hacky bash scripts to experiment, will try to write a cleaner Go bench with more conns

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate context deadline exceeded with DragonflyDB #389

Investigate context deadline exceeded with DragonflyDB #389

FZambia commented Jun 11, 2024 •

edited

Loading

romange commented Jun 14, 2024

FZambia commented Jun 22, 2024 •

edited

Loading

romange commented Jun 22, 2024 •

edited

Loading

FZambia commented Jun 23, 2024 •

edited

Loading

romange commented Jun 23, 2024

FZambia commented Jun 23, 2024

Investigate context deadline exceeded with DragonflyDB #389

Investigate context deadline exceeded with DragonflyDB #389

Comments

FZambia commented Jun 11, 2024 • edited Loading

romange commented Jun 14, 2024

FZambia commented Jun 22, 2024 • edited Loading

romange commented Jun 22, 2024 • edited Loading

FZambia commented Jun 23, 2024 • edited Loading

romange commented Jun 23, 2024

FZambia commented Jun 23, 2024

FZambia commented Jun 11, 2024 •

edited

Loading

FZambia commented Jun 22, 2024 •

edited

Loading

romange commented Jun 22, 2024 •

edited

Loading

FZambia commented Jun 23, 2024 •

edited

Loading