-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MQE: Add support for histogram_quantile #9929
base: main
Are you sure you want to change the base?
Conversation
Also preps support for more classic histogram functions to come. (Will require some re-work, but the basics are there). Tidies up annotation tests and checks their results between engines. (Since sometimes we emit annotations with results, and sometimes the results are omitted when there is an annotation).
38d1a77
to
d293f3a
Compare
d89bbab
to
78fc811
Compare
78fc811
to
5abeee0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still working my way through the implementation and will keep going tomorrow - I have some suggestions for the tests in the meantime
@@ -1836,11 +1836,82 @@ func (t *timeoutTestingQueryTracker) Close() error { | |||
return nil | |||
} | |||
|
|||
func TestAnnotations(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why you've split this test in half?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like shorter tests and it felt like a good place to split it up since I wanted a function to just test the histogram annotations etc. Happy to rejoin though if you disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely against it, but it is going to cause a bunch of merge conflicts with the upcoming Prometheus 3 changes (which introduce a bunch of new annotations) unless I rebase those.
Perhaps we can keep things as they were in this PR and then split the test in a later PR?
pkg/streamingpromql/engine_test.go
Outdated
// If both results are available, compare them (sometimes we skip prometheus) | ||
if len(results) == 2 { | ||
// We do this extra comparison to ensure that we don't skip a series that may be outputted during a warning | ||
// or vice-versa where no result may be expected etc. | ||
testutils.RequireEqualResults(t, testCase.expr, results[0], results[1]) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to compare the results when we're testing annotations? I'd expect the results to be exercised elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have the results, so we may as well compare them.
The main reason though is that some annotations stop the output, and some don't. We want to make sure we are doing that consistently.
Yes, this is likely checked elsewhere, but I'm always in favour of extra checks and balances personally. We also have some annotation tests that are difficult to check elsewhere so it may be easy to miss something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a test for the case where the buckets for an output series change over time (eg. at T=1, buckets are 1, 2 and 5, but at T=2, buckets are 1, 3 and 7).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by output series here sorry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry: let's say the input series are:
metric{env="test", le="1"}
metric{env="test", le="2"}
metric{env="test", le="3"}
metric{env="test", le="5"}
metric{env="test", le="7"}
Then these all map to the one output series, {env="test"}
.
c20de2b
to
016d546
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work 🙂
I'd like to see some benchmark results for this.
memoryConsumptionTracker *limiting.MemoryConsumptionTracker | ||
timeRange types.QueryTimeRange | ||
|
||
Annotations *annotations.Annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why this is exported?
type bucketGroup struct { | ||
groupedMetricName string // The metric name of the group. May be duplicate between groups. | ||
pointBuckets []buckets // Buckets for the grouped series at each step | ||
nativeHistograms *[]promql.HPoint // Histograms should only ever exist once per group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a pointer to a slice?
// Each series belongs to two groups | ||
seriesGroupPair := make([]*bucketGroup, 2) | ||
|
||
// Store the le label. If it doesn't exist, it'll be an empty string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter if le
is present but empty? (vs. not present)
lb := labels.NewBuilder(series.Labels) | ||
g.labels = lb.Labels() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just set g.labels = series.Labels
?
} | ||
|
||
type bucketGroup struct { | ||
groupedMetricName string // The metric name of the group. May be duplicate between groups. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we store the input series index, rather than the metric name? Then we can retrieve the metric name from innerSeriesMetricNames
when we need it.
if a.upperBound < b.upperBound { | ||
return -1 | ||
} | ||
if a.upperBound > b.upperBound { | ||
return +1 | ||
} | ||
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] This could be simplified to something like a.upperBound - b.upperBound
, couldn't it?
} else { | ||
for _, f := range fPoints { | ||
pointIdx := h.timeRange.PointIndex(f.T) | ||
g.pointBuckets[pointIdx] = append( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to consider, might be something for a follow-up PR: what if we allocated g.pointBuckets[pointIdx]
once based on the expected number of buckets? As it stands, we'll keep appending to g.pointBuckets[pointIdx]
, which may require many expansions of the slice, with all the allocations and copying that entails.
We could assume that if there are any floats present at a point, then all buckets will be present at that point (which should hold true unless the bucket layout changes).
We could also then pre-sort the list of buckets by upperBound
, and then directly write to the correct bucket, reducing / eliminating any shuffling required when sorting in bucketQuantile
.
The only thing I'm not sure about is how we'd handle the case where some buckets aren't present (eg. because the bucket layout changed mid-query) - we'd need to keep track of which buckets are present somehow.
return math.NaN(), false, false | ||
} | ||
rank := q * observations | ||
b := sort.Search(len(buckets)-1, func(i int) bool { return buckets[i].count >= rank }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to use slices.BinarySearch
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any tests for the functions in this file that we can copy across from Prometheus as well?
return bucket.Lower + (bucket.Upper-bucket.Lower)*(rank/bucket.Count) | ||
} | ||
|
||
// coalesceBuckets merges buckets with the same upper bound. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does this happen in practice?
Also preps support for more classic histogram functions to come. (Will require some re-work, but the basics are there).
Tidies up annotation tests and checks their results between engines. (Since sometimes we emit annotations with results, and sometimes the results are omitted when there is an annotation).
What this PR does
Which issue(s) this PR fixes or relates to
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.