Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent number of matching hashes #165

Open
durrantmm opened this issue Sep 21, 2021 · 1 comment
Open

Inconsistent number of matching hashes #165

durrantmm opened this issue Sep 21, 2021 · 1 comment

Comments

@durrantmm
Copy link

I wanted to try calculating mash distances using my own code. I exported the hashes as integers for two .msh files using the mash info -d command. When I run mash dist on the two files, it says that 3/1000 minhashes match. When I do the calculation manually on the hash integers, I find that 9/1000 match. Any idea what might be going on?

@durrantmm
Copy link
Author

Oh I see, you calculate the jaccard similarity using a merge-sort approach. Couldn't you also just take the jaccard similarity of the two hash sets?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant