feat(core): Add support for Tantivy based time series index #1852

rfairfax · 2024-09-18T14:38:22Z

Pull Request checklist

The commit(s) message(s) follows the contribution guidelines ?
Tests for the changes have been added (for bug fixes / features) ?
Docs have been added / updated (for bug fixes / features) ?

New behavior :

This change adds support for the Tantivy indexing library as an alternative to Lucene for time series indexing. In several cases it has been found that this is superior to Lucene performance, especially when it comes to memory usage and predictability of memory spikes.

This feature is opt-in via a configuration setting to avoid any unexpected changes during upgrade. For the moment only the raw time series index is supported. Downsample support may come in a future PR.

BREAKING CHANGES

This change requires a working Rust & C compiler to build given the Tantivy code is written in Rust. README docs have been updated to reflect this.

There are no runtime breaking changes.

Add build integration to build Rust librarie that can be used for native methods in a module: * Opt in plugin that enables Rust integration * compile, test, lint, package steps * Support for multi-arch build allowing the final jar to support more than one platform No libraries are using this as of this commit, follow ups will add Rust code usage

In preparation for new index types separate common logic from Lucene specific logic. No functional changes are made as part of this commit, just moving code around and refactoring to accept multiple index implementations. In most cases this amounts to moving code out of PartKeyLuceneIndex and into PartKeyIndexRaw without further changes. In a few cases things that were instaniating Lucene specific types now use callbacks or other approaches to allow instantiating index specific types.

Commit 1 of adding Tantivy index support. This PR is broken down into a small subset of the overall Tantivy logic to allow for easier partial review. The index code is not usable end to end until all parts are committed. This adds the basic Rust project skeleton and supports opening, applying schema, and closing the index. Many methods are unimplemented and will be added in follow up PRs. End to end testing is not available in this PR as the index test suite requires ingestion and query support for verification.

Merge latest develop

This implements enough of the interface to add documents to the Tantivy index. This is part 2 of the series and is not usable end to end until additional PRs complete. Document additional and removal is covered in this PR. Documents are created by calling a single native method that has many of the built-in fields, such as start and end time, passed as method parameters. Dyanmic fields are passed to the Rust code via a simple binary encoding to reduce JNI overhead.

This implements enough of the interface to add documents to the Tantivy index. This is part 2 of the series and is not usable end to end until additional PRs complete. Document additional and removal is covered in this PR. Documents are created by calling a single native method that has many of the built-in fields, such as start and end time, passed as method parameters. Dyanmic fields are passed to the Rust code via a simple binary encoding to reduce JNI overhead. **Pull Request checklist** - [X] The commit(s) message(s) follows the contribution [guidelines](CONTRIBUTING.md) ? - [ ] Tests for the changes have been added (for bug fixes / features) ? - [ ] Docs have been added / updated (for bug fixes / features) ? **New behavior :** Part 2 of Tantivy index support. The functionality is still incomplete and not ready for usage yet.

This contains the Rust code needed to support all query patterns for FiloDB. This is part 3 of the series and is not usuable end to end until additional PRs complete. This PR splits the logic into a common tantivy library, where collectors and query extensions are implemented, and the filodb library, where specific JVM bridge methods and business logic exists. The majority of this PR are custom collectors and caching to better match the performance of Lucene in the FiloDB use cases. The glue logic to connect these collectors to JVM methods are for the most part very straight forward - parse a query, run it, serialize results.

This contains the Scala side of query support. This completes the basic end to end flow, so it also includes regression and performance tests. There will be some small cleanup PRs after this for things like additional metrics, but with this PR the Tantivy index can be successfully activated via config.

Add two new metrics for commit processing time and cache hit rates. These will be used for monitoring as we roll out and turn the feature on for testing.

…-9-10 feat(core): Merge latest develop

Previously the code only optimized if a flag was passed during sbt invocation or if we're building multi-arch. This has created confusing results where perf drifts because debug mode was used. After doing testing and development it was found that release is a suitable default - it does incremental build, includes symbols for debugging, and while not as fast as debug builds is still fast enough for an inner loop. You can still build with debug by setting rust.optimize=false for any cases where it is needed during local development.

rfairfax added 18 commits July 17, 2024 11:23

Merge branch 'develop' into feat-index-rust

6e45ca3

Merge latest develop #1828

50b2484

Merge latest develop

Fix clippy config file

bf797b5

Add _type_ support

572ac53

Address feedback

2b90c68

Address feedback

4cc2606

feat(core): Add tantivy metrics (#1845)

a5e1e50

Add two new metrics for commit processing time and cache hit rates. These will be used for monitoring as we roll out and turn the feature on for testing.

Merge branch 'develop' into rfairfax/feat-index-rust-merge-9-10

54bf773

Reconcile _type_ changes between branches

ffe8818

Merge pull request #1847 from rfairfax/rfairfax/feat-index-rust-merge…

faaa9c0

…-9-10 feat(core): Merge latest develop

alextheimer approved these changes Sep 20, 2024

View reviewed changes

amolnayak311 approved these changes Sep 20, 2024

View reviewed changes

rfairfax merged commit 26ab573 into develop Sep 20, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): Add support for Tantivy based time series index #1852

feat(core): Add support for Tantivy based time series index #1852

rfairfax commented Sep 18, 2024

feat(core): Add support for Tantivy based time series index #1852

feat(core): Add support for Tantivy based time series index #1852

Conversation

rfairfax commented Sep 18, 2024