-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingester.QueryStream: Add support for ignoring context cancellation for chunk queriers #6408
Conversation
fb959b4
to
734d491
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine as a test just to allow us to see that original error, but wouldn't the underlying query still fail?
@treid314 you're right, I actually realized the same thing last night. The drawback is that we don't get at the original error, unless it's from the stream context, but we don't want to fail the query either. I will revert to passing an independent context ( |
In Ingester.QueryStream, ignore context cancellation wrt. chunk queriers, the way it worked prior to #6085. If the context is canceled though, log it with full diagnostics. Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
734d491
to
2cffe7a
Compare
pkg/ingester/ingester.go
Outdated
@@ -1625,6 +1627,8 @@ func (i *Ingester) QueryStream(req *client.QueryRequest, stream client.Ingester_ | |||
spanlog, ctx := spanlogger.NewWithLogger(stream.Context(), i.logger, "Ingester.QueryStream") | |||
defer spanlog.Finish() | |||
|
|||
deadline, deadlineSet := ctx.Deadline() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] I would move it before the call to dumpContextError()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it into dumpContextError
since I don't see any downside to doing so (and the code gets simpler).
pkg/ingester/ingester.go
Outdated
} else { | ||
timeout = "not set" | ||
} | ||
level.Warn(spanlog).Log("msg", "query context error", "cause", context.Cause(ctx), "timeout", timeout, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The querier legitimately cancel the context once it reaches the quorum (e.g. send request to all ingester zones, cancel outstanding requests once reached successful answers from 2 out of 3 zones). I think this will generate a lot of noise in the logs. I suggest you to to put this debug info only in the tracing span (and not log it as well), if your purpose will be to investigate it through tracing anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit difficult to say before we try in practice (I'm more used to logs than traces), but obtaining this info from traces is probably enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pracucci I've changed to level.Debug
instead, since that what happens elsewhere in Ingester.QueryStream
, and it should cause only a trace to be emitted unless debug logs are enabled. That should be sufficient, or would it be too noisy even for debug logs?
@@ -2006,7 +2030,7 @@ func (i *Ingester) sendStreamingQuerySeries(ctx context.Context, q storage.Chunk | |||
|
|||
// Ensure no error occurred while iterating the series set. | |||
if err := ss.Err(); err != nil { | |||
return nil, 0, err | |||
return nil, 0, errors.Wrap(err, "iterating ChunkSeriesSet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] I think calling it "iterating" is a bit misleading. I understand this is coming from the comment above, but in practice what's happening here is that we're "sending" not "iterating".
Same comment applies above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't actually looking at the comment, I was just relating to the fact that ss.Err()
returns any error having occurred during iteration of the ss
iterator. I don't understand how "sending" would be more descriptive, since any error from ss.Err()
would be from failed ChunkSeries
iteration, not sending. A failed send should result in an error returned from client.SendQueryStream
, no?
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
100c39e
to
1885ca5
Compare
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with the understanding that this will be removed as soon as we understand the context cancelations better
…or chunk queriers (#6408) * Ingester.QueryStream: Add support for ignoring context cancellation for chunk queriers In Ingester.QueryStream, support ignoring of context cancellation wrt. chunk queriers, the way it worked prior to #6085. If the context is canceled though, (span) log it with full diagnostics. Also wrap some chunk querying errors. --------- Signed-off-by: Arve Knudsen <[email protected]> (cherry picked from commit f737a96)
…or chunk queriers (#6408) (#6440) * Ingester.QueryStream: Add support for ignoring context cancellation for chunk queriers In Ingester.QueryStream, support ignoring of context cancellation wrt. chunk queriers, the way it worked prior to #6085. If the context is canceled though, (span) log it with full diagnostics. Also wrap some chunk querying errors. --------- Signed-off-by: Arve Knudsen <[email protected]> (cherry picked from commit f737a96) Co-authored-by: Arve Knudsen <[email protected]>
since we found the root cause to be what grafana/mimir-prometheus#546 fixed, can we revert this PR @aknuds1 ? |
@dimitarvdimitrov I was planning on reverting this next week (have a reminder set), after we've tested that fix for a bit. Sound good? |
…lation for chunk queriers (#6408)" This reverts commit f737a96. Signed-off-by: Arve Knudsen <[email protected]>
…lation for chunk queriers (#6408)" (#6555) This reverts commit f737a96. Signed-off-by: Arve Knudsen <[email protected]>
What this PR does
In
Ingester.QueryStream
, if experimental flag-ingester.chunks-query-ignore-cancellation
is enabled, ignore context cancellation wrt. chunk queriers, the way it worked prior to #6085. If the context is canceled though (or the error returned from Prometheus stems from a context error), emit a debug log message/trace span with full diagnostics for troubleshooting.This is meant as a temporary "feature", while we troubleshoot unexpected
Ingester.QueryStream
cancellations.I attempted adding a test, but realized that if I call
Ingester.QueryStream
with a canceled stream context, the method fails independently of the querier since responding via the stream will fail. Therefore writing a test is really tricky, and would take some time (would need some sort of stubbing most likely).Which issue(s) this PR fixes or relates to
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]