Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consumer deletion can fail without throwing an error, causing spam creation of consumers #1331

Open
Callum-A opened this issue Oct 23, 2024 · 5 comments
Labels
defect Suspected defect such as a bug or regression

Comments

@Callum-A
Copy link

Observed behavior

We've seen in our prod environment, consumers tick up into the 20,000s, OOMs one of the NATS servers and causes impact to our applications (some KVs are not replicated which will now be resolved by us). However I noticed in the following code: https://github.com/nats-io/nats.rs/blob/main/async-nats/src/jetstream/consumer/pull.rs#L2265-L2270 you allow the consumer deletion to fail which can cause this death spiral of increasing consumers.

If I am incorrect in my assumption of how this works, is there a way we can force KV consumers to use the same name so this death spiral can be avoided?

Expected behavior

If consumers fail to delete or atleast fail to delete a few times in a row the error should be thrown upwards to prevent this death spiral which can OOM hosts.

Server and client version

Server: v2.10.11
Async NATS (Rust): 0.36.0

Host environment

Rocky Linux 9.3

Steps to reproduce

Hard to give concrete steps need to get into a state where:

Consumer deletion fails on cleanup.
Consumer creation succeeds.

@Callum-A Callum-A added the defect Suspected defect such as a bug or regression label Oct 23, 2024
@Jarema
Copy link
Member

Jarema commented Oct 23, 2024

Hey.
Thanks for the report.

Unfortunately, we cannot reuse the same consumer name, as if it still exists, we could not reuse it, as it advanced in sequences beyond the point of where the client is. This is how ordered consumer (used by KV watchers) works - with disposable consumers and no ack.

Those consumers are ephemeral, so they will get cleaned up automatically.
I also do not see how create works and delete fails. If JetStream API is overloaded, statistically, both should fail and that should not lead to having 20k consumers.

Could you describe how you are using those consumers/watchers and share some code snippets?
I need to understand the case better.

Thanks!

@Callum-A
Copy link
Author

Hey Jarema,

I'll circle back with more details when I'm back at work tomorrow.

As a side question: Is there any reason we can't use a durable pull consumer on a KV subject to get our values?

Thanks!

@Jarema
Copy link
Member

Jarema commented Oct 23, 2024

When you're getting values for a watcher, you want to create a consumer, get values, and delete a consumer. There is no point in reusing it later.

Btw, consumers are only used for watchers. To retrieve kv values via get or entry, it does not use consumers, but direct get on a stream.

@Callum-A
Copy link
Author

I can confirm we're using a watcher, to subscribe for updates, it's effective getting a KV and then calling watch_all I looked at the internals and saw the consumer config and wondered if there would be any adverse effects to using a similar config but with a direct pull consumer with a given name.

@Jarema
Copy link
Member

Jarema commented Oct 24, 2024

I would like to understand what difference it would make in your opinion, as you need a new consumer when you create a new watcher, no matter if its durable or not.

Watchers should not create 30k consumers, so I would focus rather on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

No branches or pull requests

2 participants