-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(core): Add support for Tantivy based time series index #1852
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add build integration to build Rust librarie that can be used for native methods in a module: * Opt in plugin that enables Rust integration * compile, test, lint, package steps * Support for multi-arch build allowing the final jar to support more than one platform No libraries are using this as of this commit, follow ups will add Rust code usage
In preparation for new index types separate common logic from Lucene specific logic. No functional changes are made as part of this commit, just moving code around and refactoring to accept multiple index implementations. In most cases this amounts to moving code out of PartKeyLuceneIndex and into PartKeyIndexRaw without further changes. In a few cases things that were instaniating Lucene specific types now use callbacks or other approaches to allow instantiating index specific types.
Commit 1 of adding Tantivy index support. This PR is broken down into a small subset of the overall Tantivy logic to allow for easier partial review. The index code is not usable end to end until all parts are committed. This adds the basic Rust project skeleton and supports opening, applying schema, and closing the index. Many methods are unimplemented and will be added in follow up PRs. End to end testing is not available in this PR as the index test suite requires ingestion and query support for verification.
Merge latest develop
This implements enough of the interface to add documents to the Tantivy index. This is part 2 of the series and is not usable end to end until additional PRs complete. Document additional and removal is covered in this PR. Documents are created by calling a single native method that has many of the built-in fields, such as start and end time, passed as method parameters. Dyanmic fields are passed to the Rust code via a simple binary encoding to reduce JNI overhead.
This implements enough of the interface to add documents to the Tantivy index. This is part 2 of the series and is not usable end to end until additional PRs complete. Document additional and removal is covered in this PR. Documents are created by calling a single native method that has many of the built-in fields, such as start and end time, passed as method parameters. Dyanmic fields are passed to the Rust code via a simple binary encoding to reduce JNI overhead. **Pull Request checklist** - [X] The commit(s) message(s) follows the contribution [guidelines](CONTRIBUTING.md) ? - [ ] Tests for the changes have been added (for bug fixes / features) ? - [ ] Docs have been added / updated (for bug fixes / features) ? **New behavior :** Part 2 of Tantivy index support. The functionality is still incomplete and not ready for usage yet.
This contains the Rust code needed to support all query patterns for FiloDB. This is part 3 of the series and is not usuable end to end until additional PRs complete. This PR splits the logic into a common tantivy library, where collectors and query extensions are implemented, and the filodb library, where specific JVM bridge methods and business logic exists. The majority of this PR are custom collectors and caching to better match the performance of Lucene in the FiloDB use cases. The glue logic to connect these collectors to JVM methods are for the most part very straight forward - parse a query, run it, serialize results.
This contains the Scala side of query support. This completes the basic end to end flow, so it also includes regression and performance tests. There will be some small cleanup PRs after this for things like additional metrics, but with this PR the Tantivy index can be successfully activated via config.
Add two new metrics for commit processing time and cache hit rates. These will be used for monitoring as we roll out and turn the feature on for testing.
…-9-10 feat(core): Merge latest develop
Previously the code only optimized if a flag was passed during sbt invocation or if we're building multi-arch. This has created confusing results where perf drifts because debug mode was used. After doing testing and development it was found that release is a suitable default - it does incremental build, includes symbols for debugging, and while not as fast as debug builds is still fast enough for an inner loop. You can still build with debug by setting rust.optimize=false for any cases where it is needed during local development.
alextheimer
approved these changes
Sep 20, 2024
amolnayak311
approved these changes
Sep 20, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request checklist
New behavior :
This change adds support for the Tantivy indexing library as an alternative to Lucene for time series indexing. In several cases it has been found that this is superior to Lucene performance, especially when it comes to memory usage and predictability of memory spikes.
This feature is opt-in via a configuration setting to avoid any unexpected changes during upgrade. For the moment only the raw time series index is supported. Downsample support may come in a future PR.
BREAKING CHANGES
This change requires a working Rust & C compiler to build given the Tantivy code is written in Rust. README docs have been updated to reflect this.
There are no runtime breaking changes.