-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: limit tags and tag values search #4320
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd also like to get your thoughts on a different kind of parameter. perhaps it could be added here as well?
what about a parameter that stopped searching early if N strings were added to the combiner and none were new? so the combiner would keep a running count of the number of added strings since a new one was found and stop searching once we crossed threshold.
consider /api/v2/search/tag/resource.cluster/values?q=<some very expensive query>
. there are likely very few values for resource.cluster
but we still spend a lot of resources to exhaustively evaluate <some very expensive query>
. a parameter that allowed the ingester to just stop searching after N found values and nothing new could be a powerful way to dynamically circuit break.
}, | ||
} | ||
for _, tc := range tests { | ||
t.Run(tc.name, func(t *testing.T) { | ||
combiner := tc.factory(tc.limit) | ||
// TODO ADD LIMIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 :)
// MaxDataSize is calculated as the total length of the recorded strings. | ||
// MaxTagsPerScope controls how many tags can be added per scope. The intrinsic scope is unbounded. | ||
// For ease of use, maxDataSize=0 and maxTagsPerScope=0 are interpreted as unlimited. | ||
func NewScopedDistinctStringWithDiff(maxDataSize int, maxTagsPerScope uint32) *ScopedDistinctString { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the one concerning behavior in this PR is how it interacts with multiple scopes. if the user queries /api/v2/search/tags?limit=100
it's quite possible they don't have 100 resource tags, but have thousands of span level tags. the search would still be exhaustive b/c the 100 limit for the resource scope would never be hit.
can we come up with some rules about how to apply the limit 100 across all scopes? for instance maybe resource tags are always added first and then span tags next?
maybe we just don't add the limit parameter to the tag name endpoints? they tend to be far faster so the limit is less of a concern?
i know we discussed this behavior and agreed to this, but i'm struggling with it now that i'm looking through the PR. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that as well, if we split the limit between tags (in the same way the maxSize is working right now) we could end with an incomplete list of span tags and none of the resource tags. The way this limit works now cannot be worse than the current implementation but in the worst-case scenario it will end with an exhaustive search.
I'm not opposed of having a deterministic search by scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best pattern I can come up with for a tag names limit is to stop searching if ANY scope exceeds the limit instead of ALL scopes.
@@ -360,6 +360,8 @@ Parameters: | |||
Optional. Along with `end`, defines a time range from which tags should be returned. | |||
- `end = (unix epoch seconds)` | |||
Optional. Along with `start`, defines a time range from which tags should be returned. Providing both `start` and `end` includes blocks for the specified time range only. | |||
- `limit = (integer)` | |||
Optional. Limits the maximum number of tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional. Limits the maximum number of tags | |
Optional. Limits the maximum number of tags. |
@@ -385,6 +387,8 @@ Parameters: | |||
Optional. Along with `end` define a time range from which tags should be returned. | |||
- `end = (unix epoch seconds)` | |||
Optional. Along with `start` define a time range from which tags should be returned. Providing both `start` and `end` includes blocks for the specified time range only. | |||
- `limit = (integer)` | |||
Optional. Limits the maximum number of tags per scope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional. Limits the maximum number of tags per scope | |
Optional. Limits the maximum number of tags per scope. |
@@ -515,6 +519,8 @@ Parameters: | |||
Optional. Along with `end`, defines a time range from which tags should be returned. | |||
- `end = (unix epoch seconds)` | |||
Optional. Along with `start`, defines a time range from which tags should be returned. Providing both `start` and `end` includes blocks for the specified time range only. | |||
- `limit = (integer)` | |||
Optional. Limits the maximum number of tags values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional. Limits the maximum number of tags values | |
Optional. Limits the maximum number of tags values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for updating the docs! I have a few minor suggestions to correct missing periods.
The more I think about this idea the more I want it:
The change here and the current max_bytes_per_tag_values_query protect us when a tag value query finds A LOT of matches. We currently have no protection for an expensive tag values query that find a very low number of values. |
What this PR does:
It adds a new parameter to control the number of tag values returned. This allows the requester to limit the request, speeding up the response.
For the
tag/values
it limits the max number of values per tag:http://localhost:3200/api/v2/search/tag/.http.status_code/values?limit=1
For the
tag
endpoint, the limit controls how many tags are returned per scope. Intrinsic scope is unbounded:http://localhost:3200/api/v2/search/tags?limit=2
Which issue(s) this PR fixes:
Fixes #4288
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]