Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Starling Index to Knowhere #907

Open
aawang1999 opened this issue Oct 22, 2024 · 7 comments
Open

Adding Starling Index to Knowhere #907

aawang1999 opened this issue Oct 22, 2024 · 7 comments
Assignees

Comments

@aawang1999
Copy link

My development team is trying to add the Starling index to Knowhere. I understand that the process of adding indices is briefly outlined on the Milvus Deep Dive page (linked here), but I was wondering if more detailed instructions could be provided on how to modify the Knowhere code? Assistance would be greatly appreciated.

@liliu-z
Copy link
Collaborator

liliu-z commented Oct 22, 2024

/assign @PwzXxm

@liliu-z
Copy link
Collaborator

liliu-z commented Oct 22, 2024

My development team is trying to add the Starling index to Knowhere. I understand that the process of adding indices is briefly outlined on the Milvus Deep Dive page (linked here), but I was wondering if more detailed instructions could be provided on how to modify the Knowhere code? Assistance would be greatly appreciated.

@PwzXxm is the author of Starling, he can help on this

@PwzXxm
Copy link
Collaborator

PwzXxm commented Oct 22, 2024

Hi there, thanks for your interest on contributing to Knowhere. May I ask what is the initiative for adding Starling to Knowhere so I can assist u better? Are you planning to add it to Milvus as well?

For adding index to Knowhere alone, you might take a look at this example adding SCANN https://github.com/zilliztech/knowhere/pull/1/files

Another feasible proposal might be not adding a new index type to Knowhere, but adding parameters to DiskANN Index.

@aawang1999
Copy link
Author

Thanks for the information! We will definitely look into those.

Our team was experimenting with different vector indices and found Starling. Since Starling was created by Milvus engineers, we felt it would be appropriate to integrate it into Milvus and run experiments in terms of performance, accuracy, and stability.

Quick follow-up question: Once an index is added to Knowhere, what does the larger process for registering it in Milvus look like? Is there an analogous pull request like this? Thanks!

@PwzXxm
Copy link
Collaborator

PwzXxm commented Oct 23, 2024

Our team was experimenting with different vector indices and found Starling. Since Starling was created by Milvus engineers, we felt it would be appropriate to integrate it into Milvus and run experiments in terms of performance, accuracy, and stability.

I was wondering what is the use-case and I assume u have already checked out other in-memory indices or DiskANN?

Quick follow-up question: Once an index is added to Knowhere, what does the larger process for registering it in Milvus look like? Is there an analogous pull request like this? Thanks!

Registering it in Milvus is not a heavy load.
milvus-io/milvus#26099
milvus-io/milvus#27268
These PRs are quite old, the param checks are moving into knowhere BTW.

@gpailetnet
Copy link

Hi @PwzXxm, I'm also a part of the team that @aawang1999 is in.

The use-case would be for a high-performance index at large levels of scale, the goal being to leverage both the capability of Starling, which has much faster performance to DiskANN due to its optimizations, as well as disk-based scalability. Some questions from looking into the different code repositories:

  1. Are there any large differences between the DiskANN implementation on Knowhere versus the Github repo's version of DiskANN? I know the latter has support for in-house filtering as well as streaming support through https://harsha-simhadri.org/pubs/Filtered-DiskANN23.pdf and https://arxiv.org/pdf/2105.09613 respectively - I saw code for these aspects in Starling and was wondering if there are any issues to concern with the implementation in Knowhere - to my knowledge, Milvus will accumulate points in an open segment and then build an index once on a sealed segment, then closing it, but I want to make sure if there's anything I am missing.
  2. Are there any other 'environmental' differences I should be concerned about between the setups that DiskANN and Starling work in in which they operate over the whole database as opposed to Milvus, in which each segment has its own index for knowing what implementation constraints to meet?

@PwzXxm
Copy link
Collaborator

PwzXxm commented Nov 8, 2024

  1. Are there any large differences between the DiskANN implementation on Knowhere versus the Github repo's version of DiskANN? I know the latter has support for in-house filtering as well as streaming support through https://harsha-simhadri.org/pubs/Filtered-DiskANN23.pdf and https://arxiv.org/pdf/2105.09613 respectively - I saw code for these aspects in Starling and was wondering if there are any issues to concern with the implementation in Knowhere - to my knowledge, Milvus will accumulate points in an open segment and then build an index once on a sealed segment, then closing it, but I want to make sure if there's anything I am missing.

Filtering is approached differently in Milvus, compared to Filtered-DiskANN. In Milvus, the filter condition is evaluated before KNN search, so in Knowhere, the DiskANN only sees a bitset stating which element is valid or not. As for Fresh-DiskANN, it has some overlaps on our growing/sealed segments design, so we haven't update the DiskANN in Knowhere for a while.

  1. Are there any other 'environmental' differences I should be concerned about between the setups that DiskANN and Starling work in in which they operate over the whole database as opposed to Milvus, in which each segment has its own index for knowing what implementation constraints to meet?

Indices operate on segment-level and I think it would be fine. Keep in mind that the offset/id on the segment level needs to be preserved, so if you relayout them via Starling, mappings are needed to return the corrected IDs to Milvus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants