You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I try to run vcfR on a vcf file I have I keep running into the same error when I try to extract the GT from the Genotype Section (Error in extract.gt(x = vcf, element = format_fields[i], as.numeric = coerce_numeric[i]) : ID column contains non-unique names). When I head the file it looks fine initially but I cant seem to run any other commands on it. Can you guys help me with this.
[1] "***** Object of class 'vcfR' "
[1] " Meta section "
[1] "##fileformat=VCFv4.1"
[1] "##FILTER=<ID=PASS,Description="All filters passed">"
[1] "##filedate=2019.12.2"
[1] "##source=Minimac3"
[1] "##contig=<ID=1>"
[1] "##FILTER=<ID=GENOTYPED,Description="Marker was genotyped AND imputed">"
[1] "First 6 rows."
[1]
[1] " Fixed section "
CHROM POS ID REF ALT QUAL FILTER
[1,] "8" "11740" "rs531589080" "G" "A" NA "PASS"
[2,] "8" "11774" "rs143233250" "A" "T" NA "PASS"
[3,] "8" "11788" "rs564896271" "C" "T" NA "PASS"
[4,] "8" "11789" "rs527808609" "G" "A" NA "PASS"
[5,] "8" "11816" "rs75979472" "T" "C" NA "PASS"
[6,] "8" "11879" "rs536257851" "A" "G" NA "PASS"
[1]
[1] " Genotype section *****"
FORMAT dnl407754_icv
[1,] "GT:DS" "0|0:0.002"
[2,] "GT:DS" "1|1:1.208"
[3,] "GT:DS" "0|0:0.019"
[4,] "GT:DS" "0|0:0.007"
[5,] "GT:DS" "1|1:1.232"
[6,] "GT:DS" "0|0:0.009"
[1]
[1] "Unique GT formats:"
[1] "GT:DS"
I would upload it but the file type isn't supported
The text was updated successfully, but these errors were encountered:
Hi @bathycy , In the VCF specification v4.3 section 1.6.1 in subsection "3. ID" it states that the ID column should be 'unique identifiers' for each variant, when available. I feel that the reason for your error is that your data includes non-unique values in the ID column. This can be addressed as follows.
Here I've loaded an example data set and validated that the ID column is unique. Note that missing values (in R = NA) are valid so they are handled here as 'incomparables'. I've then used rbind2() to add a non-unique variant, and tested this again to show that the ID column is non-unique. The simplest path may be to omit the non-unique variants, as I have demonstrated, using the duplicated() function. If you feel these duplicated variants are valuable you may want to instead develop a workflow that identifies these duplicated variants and make their IDs unique somehow, such as adding a suffix (e.g., 1, 2, 3, or a, b, c, ...).
Please let me know if this resolves your issue. Thanks!
Brian
When I try to run vcfR on a vcf file I have I keep running into the same error when I try to extract the GT from the Genotype Section (Error in extract.gt(x = vcf, element = format_fields[i], as.numeric = coerce_numeric[i]) : ID column contains non-unique names). When I head the file it looks fine initially but I cant seem to run any other commands on it. Can you guys help me with this.
[1] "***** Object of class 'vcfR' "
[1] " Meta section "
[1] "##fileformat=VCFv4.1"
[1] "##FILTER=<ID=PASS,Description="All filters passed">"
[1] "##filedate=2019.12.2"
[1] "##source=Minimac3"
[1] "##contig=<ID=1>"
[1] "##FILTER=<ID=GENOTYPED,Description="Marker was genotyped AND imputed">"
[1] "First 6 rows."
[1]
[1] " Fixed section "
CHROM POS ID REF ALT QUAL FILTER
[1,] "8" "11740" "rs531589080" "G" "A" NA "PASS"
[2,] "8" "11774" "rs143233250" "A" "T" NA "PASS"
[3,] "8" "11788" "rs564896271" "C" "T" NA "PASS"
[4,] "8" "11789" "rs527808609" "G" "A" NA "PASS"
[5,] "8" "11816" "rs75979472" "T" "C" NA "PASS"
[6,] "8" "11879" "rs536257851" "A" "G" NA "PASS"
[1]
[1] " Genotype section *****"
FORMAT dnl407754_icv
[1,] "GT:DS" "0|0:0.002"
[2,] "GT:DS" "1|1:1.208"
[3,] "GT:DS" "0|0:0.019"
[4,] "GT:DS" "0|0:0.007"
[5,] "GT:DS" "1|1:1.232"
[6,] "GT:DS" "0|0:0.009"
[1]
[1] "Unique GT formats:"
[1] "GT:DS"
I would upload it but the file type isn't supported
The text was updated successfully, but these errors were encountered: