-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding genetic_diff() number of alleles output #185
Comments
Hi @EveTC , I suspect the issue is how missing data are handled. I've built the following example based on your above code.
We see that position (POS) 549 is the third variant in the file, so we can extract the genotypes and query variant (row) three). We see that there are two, phased, diploid genotypes. They are diploid because there are two integer alleles for each genotype. The genotypes are phased because the alleles are delimited with "|" instead of "/". We see a total of three genotypes even though we have 18 samples. This means the other samples were missing genotypes. Because we have 2 pf the 0|0 genotype we have a total of 4 of the 0 allele. Because we have one of the 1|1 genotypes we have a total of two of the 1 allele. I believe adegenet handles missing data as another allelic state. But I suggest you consult it's documentation. How to handle missing data is one of those important details that's easy to forget to pay attention to. Does that make sense? |
Hi Brian, Thank you for your explanation, I think I follow what you are saying. How does Thanks again, |
Hi @EveTC , |
Hi Brian, Yes it does - thank you for your explanation. I have a better understanding now. |
Hello
I am somewhat confused by the output by
geneitc_diff()
, in particular the number of alleles in each population.Given the example data below:
Output:
To get more information about the oubject
vcf
I converted to a genInd object.So from this additional information I can see that there is a range of 1-5 alleles per locus and that the data is diploid.
Therefore, from my understanding Supercontig_1.50:549 (CHROM:POS) has 2 alleles and it is 4 because (2x2(diploid)=4). Is this description correct?
If so, I am confused with my vcf output
If my vcf has a max range of 2 alleles per locus and is a diploid organism then surely the max number of alleles should be 4?? However the range of number of alleles is 0-40??
My appologies if this is a simplistic question, I am new to this sort of analysis.
Any advice to shed light on this would be greatly appreciated.
Thank you
The text was updated successfully, but these errors were encountered: