-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SKAT/CMC: Missing covariates are not imputed, but dropped #144
Comments
Can you recode the covariate file and imputed the missing covariates?Sent from my iPhoneOn Feb 6, 2023, at 2:26 PM, katyaorlova ***@***.***> wrote:
Thank you for creating and maintaining this software.
In the wiki, you state that
Note: Missing data in the covariate file can be labeled by any non-numeric value (e.g. NA). They will be automatically imputed to the mean value in the data file.
However, samples with missing covariates are simply dropped from my analysis, per the .log file when running SKAT, CMC, FamCMC, FamSKAT:
[WARN] Total [ 63 ] samples are dropped from VCF file due to missing covariate.
How should I assure that my samples with missing covariates are not dropped?
For reference, here's a simplified version of my codewhen running FamSKAT + FamCMC:
rvtest --inVcf exons.vcf.gz --pheno phenos.txt --pheno-name dft --freqUpper 0.01 --impute drop --covar cov.txt --covar-name AgeAtExam,Sex,V7,V8,V9,WV,ChipNum,CohortNum,PC1_C12,PC2_C12,PC3_C12 --geneFile refFlat_hg19.txt.gz --burden famcmc --kernel famskat --kinship C1C2.kinship --numThread 3 --out output;
(Note, I tried removing the --impute drop flag, which prevents imputation of missing genotypes, but this doesn't alter covariate dropping)
Thank you in advance,
Katya
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Yes, thank you for the quick reply; I ended up doing just that. I mostly wrote to double check whether there was a different issue that was causing this in my code, but it sounds like it is a default setting to drop samples with NA covariates. Here's the code if anyone wants to save time: `cols_to_impute <- c("V7", "V8", "V9") for (col_name in cols_to_impute) { Thanks again, |
Thank you for creating and maintaining this software.
In the wiki, you state that
However, samples with missing covariates are simply dropped from my analysis, per the .log file when running SKAT, CMC, FamCMC, FamSKAT:
How should I assure that my samples with missing covariates are not dropped?
For reference, here's a simplified version of my codewhen running FamSKAT + FamCMC:
rvtest --inVcf exons.vcf.gz --pheno phenos.txt --pheno-name dft --freqUpper 0.01 --impute drop --covar cov.txt --covar-name AgeAtExam,Sex,V7,V8,V9,WV,ChipNum,CohortNum,PC1_C12,PC2_C12,PC3_C12 --geneFile refFlat_hg19.txt.gz --burden famcmc --kernel famskat --kinship C1C2.kinship --numThread 3 --out output;
(Note, I tried removing the --impute drop flag, which prevents imputation of missing genotypes, but this doesn't alter covariate dropping)
Thank you in advance,
Katya
The text was updated successfully, but these errors were encountered: