-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read.vcfR() coredump with empty lines in input #198
Comments
This appears to be the topic of #141 . Is your report the same? |
I don't think that it is the same problem. I looked at the file in the previous report, but it didn't have any empty lines. The problem I'm reporting is very simple to reproduce. If you add two empty lines after #CHROM line before the data section (in the example above, the lines which start with "L103"), it core dumps. |
The file from issue #141 does include blank lines. The thread even reports line numbers. The thread explains that zero length records are not allowed in the VCF specification. My interpretation is that you do not have a file that complies with the VCF specification so it is unreasonable to expect software designed to work with valid VCF data to work with non-VCF files. The thread also includes a "work around" solution. Please review these materials and let me know if there is any reason to think that this issue is not a duplicate of #141. Thanks! |
Sorry, you were right, this was the same issue. I thought the problem was the empty lines at the border of header and data, so I just looked at the area. I looked at VCF 4.3 spec., I might have missed it, but I didn't find that zero length records/line is not allowed. Are you referring to this statement: "Zero length fields are not allowed, a dot (“.”) must be used instead."? This is stating that you can't have to indicate an empty column/field, right? |
Thanks for double checking that! It's good to know we're seeing the same thing. The line you cite makes sense. The idea that one would have blank lines in a VCF file just sounds sloppy to me. They fulfill no purpose. So I'd consider it an issue with whatever created that file (my 2 cents). |
Hi, I noticed that read.vcfR("test.vcf") caused core dump (and R quits) with the following error:
/usr/include/c++/11/bits/stl_vector.h:1045: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = std::__cxx11::basic_string; _Alloc = std::allocator<std::__cxx11::basic_string >; std::vector<_Tp, _Alloc>::reference = std::__cxx11::basic_string&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__n < this->size()' failed.
Aborted (core dumped)
This is caused by the statement:
Here is an example input vcf file which causes this problem (also I'm attaching this text file; please unzip it to get the vcf file before using it).
##fileformat=VCFv4.2
##fileDate=20220321
##source=PLINKv1.90
##contig=<ID=L103,length=476>
##contig=<ID=L105,length=689>
##INFO=<ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Car_3A11_n_n Car_3B11_n_n Car_3C11_n_n
L103 475 L103_475 t c . . PR GT 0/0 0/0 0/0
L105 688 L105_688 c t . . PR GT 0/0 ./. 0/1
NOTE: there are two empty lines after "#CHROM..." line (I accidentally put these empty lines, and I discovered this issue). If there is 0 or 1 empty line there, read.vcfR() works as intended. I checked if there is any rules about empty lines in VCF specification, but I couldn't find it. But core dumping doesn't seem to be a good behavior here, so I thought that you might want to deal with this issue.
Version info:
R version 4.12 on linux x86_64
vcfR version 1.12.0
Thank you,
Naoki
test.vcf.zip
i
The text was updated successfully, but these errors were encountered: