-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition in some metrics (counter and gauge are fine) #20
Comments
Yup, I am aware of these issues and have started work to resolve them. Here is one option for exdec samples: https://github.com/boundary/folsom/tree/exdec_gen_server I haven't merged this yet as I am not convinced this is the best way to go yet. |
If you're open to dependencies, a straightforward solution would be to use gproc. This would make it super easy to register worker processes with the name of the metric that they own and do the pub/sub. Not sure that making exdec a process is the best solution, although it looks like it might solve the problem in that particular place. |
Also, which of the notify interfaces in folsom_metrics / folsom_ets are actually used? folsom_metrics:notify/1 is the only one that's documented. |
Right, I was looking at that last week when I was writing the exdec_gen_server branch. Seemed like a decent way to go. |
folsom_metrics is the API module I prefer folks use. |
What I mean is that there are notify/1, notify/2, notify/3, and notify_existing_metric/3. notify/1 and notify/2 are essentially the same thing, and notify_existing_metric/3 has similar semantics. The comments lead me to believe that notify/2 is actually preferred, but the documentation only shows notify/1. notify_existing_metric/3 seems reasonable to have, but it's not clear if it's actually worth the trouble. notify/2 could surely be made faster by essentially passing down the result of get_info(Name) to notify/4 and not calling handler_exists at all. notify/3 would be a good thing to get rid of if you could, or at least clean up all the implementation bloat it causes. I could see more use cases for it if metrics were named by terms and not atoms. |
So here's what I was thinking last night, I started sketching it out last night but didn't finish it. Have a function that maps types to a way to create metrics from them. Sometimes the state will be meaningless, sometimes it will be a pid. If it was a behaviour I'd have it look like this:
Define a behaviour to encapsulate what a metric must do:
folsom_ets should change its #metric{} to include the state and the handler module returned by the constructor. An example module might look like this:
There would be a wrapper module that constructs a metric in its own gen_server process, this would itself be a metric that has State as its pid. Metrics that it wraps would have an init/1 to return an initial {ok, State}, and notify casts would return the new state for the gen_server (will often be faster than using ets since we are already serialized).
An example that uses this might look something like this:
|
This seems like a pretty reasonable solution. I assume we also want a metric supervisor to keep an eye on each new metric gen_server that we start up (supervisor:start_child/2). |
Yeah, a simple_one_for_one or something like that |
Test message, ignore me. |
Folsom has moved, please resubmit your issue at https://github.com/folsom-project Thanks! |
These metrics are ok:
These metrics can drop data due to race conditions (interleaved get_value / insert):
And this one can grow and shrink incorrectly (interleaved ets:info(…, size) / delete / insert):
There are some similar problems with folsom_sample_exdec, folsom_sample_none, folsom_sample_uniform but those problems would go away if folsom_metrics_histogram usage was serialized by key.
The most straightforward way out of this predicament is to set up a process per metric (of these types) for update serialization. The metrics are basically determined at compile time and their count should be smallish (hundreds maybe), so a fixed size pool isn't necessary. Most reads can still go directly to the table since it's updated atomically, but some of the histogram implementations may need some kind of get_values serialization.
The text was updated successfully, but these errors were encountered: