-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Tabix #125
Comments
@knausb I do not think
The thing is, I do not see any input parameters for |
The index created by |
Yeah. Well, we need the tabix index - how to go from genomic coordinates to a chunk of data. |
Thanks for the clarification in the indexes. I think you're correct. I found Rhtslib which I do not know how to use. But it looks like it may be a solution. |
Ah, I just contributed a pull request to samtools/htsjdk#1248 which is a Java version of htslib. I do not have experience with the C version, but if it is anything like the java one, that's definitely a way to go. Except I looked at Rhtslib and it is literally nothing but a wrapper around htslib. It has no interface, I believe you have to write C code to link to it. So it's not super useful for a casual R developer. |
Thanks for the validation, that helps! I see Rhtslib at 1.7.0 while htslib is at 1.9.0. Does that matter? According to the documentation it sounds like they would be amenable to to incrementing Rhtslib. "not super useful for a casual R developer" yeah, it looks like there's going to be a learning curve. But I think its the the way to go. File IO is a bottle neck and one of the things that inspired me to create vcfR. vcfR currently uses |
@knausb So my understanding is that you wrote |
This package might be useful for interfacing with tabix: https://github.com/zhanxw/seqminer |
This is similar to issue #107.
We would like an interface where one can specify chromosome + region, and obtains only the parts of the VCF that correspond to this query. This would most likely mean supporting tabix indexed vcf files. Reading the whole VCF is simply not going to happen.
The current
skip
andnrows
parameters are not useful to us. To know those values, we would first have to index the vcf somehow - exactly the job for Tabix.The text was updated successfully, but these errors were encountered: