Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: limit tags and tag values search #4320

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

javiermolinar
Copy link
Contributor

@javiermolinar javiermolinar commented Nov 13, 2024

What this PR does:
It adds a new parameter to control the number of tag values returned. This allows the requester to limit the request, speeding up the response.

For the tag/values it limits the max number of values per tag:
http://localhost:3200/api/v2/search/tag/.http.status_code/values?limit=1

For the tag endpoint, the limit controls how many tags are returned per scope. Intrinsic scope is unbounded:

http://localhost:3200/api/v2/search/tags?limit=2

{
  "scopes": [
    {
      "name": "event",
      "tags": [
        "exception.escape"
      ]
    },
    {
      "name": "intrinsic",
      "tags": [
        "duration",
        "event:name",
        "event:timeSinceStart",
        "instrumentation:name",
      ....
      ]
    },
    {
      "name": "link",
      "tags": [
        "k6.97IMph1UKjxwJmE"
      ]
    },
  ....
}

Which issue(s) this PR fixes:
Fixes #4288

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@javiermolinar javiermolinar changed the title limit tags and tag values search feat: limit tags and tag values search Nov 13, 2024
@javiermolinar javiermolinar marked this pull request as ready for review November 13, 2024 09:57
@javiermolinar javiermolinar added type/docs Improvements or additions to documentation enhancement New feature or request labels Nov 13, 2024
Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd also like to get your thoughts on a different kind of parameter. perhaps it could be added here as well?

what about a parameter that stopped searching early if N strings were added to the combiner and none were new? so the combiner would keep a running count of the number of added strings since a new one was found and stop searching once we crossed threshold.

consider /api/v2/search/tag/resource.cluster/values?q=<some very expensive query>. there are likely very few values for resource.cluster but we still spend a lot of resources to exhaustively evaluate <some very expensive query>. a parameter that allowed the ingester to just stop searching after N found values and nothing new could be a powerful way to dynamically circuit break.

},
}
for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
combiner := tc.factory(tc.limit)
// TODO ADD LIMIT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 :)

// MaxDataSize is calculated as the total length of the recorded strings.
// MaxTagsPerScope controls how many tags can be added per scope. The intrinsic scope is unbounded.
// For ease of use, maxDataSize=0 and maxTagsPerScope=0 are interpreted as unlimited.
func NewScopedDistinctStringWithDiff(maxDataSize int, maxTagsPerScope uint32) *ScopedDistinctString {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the one concerning behavior in this PR is how it interacts with multiple scopes. if the user queries /api/v2/search/tags?limit=100 it's quite possible they don't have 100 resource tags, but have thousands of span level tags. the search would still be exhaustive b/c the 100 limit for the resource scope would never be hit.

can we come up with some rules about how to apply the limit 100 across all scopes? for instance maybe resource tags are always added first and then span tags next?

maybe we just don't add the limit parameter to the tag name endpoints? they tend to be far faster so the limit is less of a concern?

i know we discussed this behavior and agreed to this, but i'm struggling with it now that i'm looking through the PR. wdyt?

Copy link
Contributor Author

@javiermolinar javiermolinar Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that as well, if we split the limit between tags (in the same way the maxSize is working right now) we could end with an incomplete list of span tags and none of the resource tags. The way this limit works now cannot be worse than the current implementation but in the worst-case scenario it will end with an exhaustive search.

I'm not opposed of having a deterministic search by scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best pattern I can come up with for a tag names limit is to stop searching if ANY scope exceeds the limit instead of ALL scopes.

@@ -360,6 +360,8 @@ Parameters:
Optional. Along with `end`, defines a time range from which tags should be returned.
- `end = (unix epoch seconds)`
Optional. Along with `start`, defines a time range from which tags should be returned. Providing both `start` and `end` includes blocks for the specified time range only.
- `limit = (integer)`
Optional. Limits the maximum number of tags
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Optional. Limits the maximum number of tags
Optional. Limits the maximum number of tags.

@@ -385,6 +387,8 @@ Parameters:
Optional. Along with `end` define a time range from which tags should be returned.
- `end = (unix epoch seconds)`
Optional. Along with `start` define a time range from which tags should be returned. Providing both `start` and `end` includes blocks for the specified time range only.
- `limit = (integer)`
Optional. Limits the maximum number of tags per scope
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Optional. Limits the maximum number of tags per scope
Optional. Limits the maximum number of tags per scope.

@@ -515,6 +519,8 @@ Parameters:
Optional. Along with `end`, defines a time range from which tags should be returned.
- `end = (unix epoch seconds)`
Optional. Along with `start`, defines a time range from which tags should be returned. Providing both `start` and `end` includes blocks for the specified time range only.
- `limit = (integer)`
Optional. Limits the maximum number of tags values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Optional. Limits the maximum number of tags values
Optional. Limits the maximum number of tags values.

Copy link
Contributor

@knylander-grafana knylander-grafana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating the docs! I have a few minor suggestions to correct missing periods.

@joe-elliott
Copy link
Member

The more I think about this idea the more I want it:

what about a parameter that stopped searching early if N strings were added to the combiner and none were new? so the combiner would keep a running count of the number of added strings since a new one was found and stop searching once we crossed threshold.

The change here and the current max_bytes_per_tag_values_query protect us when a tag value query finds A LOT of matches. We currently have no protection for an expensive tag values query that find a very low number of values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request type/docs Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a limit parameter to search tags and tag values endpoints
3 participants