Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotation problem #295

Open
allatarr opened this issue Sep 26, 2024 · 6 comments
Open

Annotation problem #295

allatarr opened this issue Sep 26, 2024 · 6 comments

Comments

@allatarr
Copy link

Hello,
I was able to install megSAP on our server and after a successful test run with provided panel data I am trying to run analysis of WES data. Everything runs smoothly until the VCFAnnotatefromVCF step. Annotation is taking incredible amount of time (we killed process after 75 hours). At the beginning it creates file _annotateFromVcf.vcf but than there is no progress.
I am providing the log file from friday to monday run and htop view from today's one.

analyze_20240920140712.log
Snímek obrazovky z 2024-09-26 12-46-30

Thank you in advance for your help,
Lukáš

@marc-sturm
Copy link
Member

Hi Lukáš,

I noticed that the annotation databases are not synced to the tmp drive of the server.
Do you have "copy_dbs_to_local_data" set to true in the settings.ini?
You probably should use that option to speed up the analysis.

However this should not cause the problem.
I suspect that the VcfAnnotateFromVcf has problems with 60 threads.
Can you please run the pipline with the options: -steps vc -annotation_only -threads 6?

Running the pipeline with 60 threads makes litte sense anyway.
Some of the tools are not multi-threaded or do not scale well with the number of threads.
We typically use 6 threads for a sample but analyze several samples in parallel.

Best,
Marc

@allatarr
Copy link
Author

Dear Mark,
thank for your response. I´ve tried both mentioned things (6 threads, syncing of databases), but the problem still persists. I left the annotation running over the weekend and in almost 3 days only a 15 MB file containing a few annotated variants was created in the folder, but that's it. So something is happening, but it's incredibly slow. Where could be the problem?
Best,
Lukáš

@marc-sturm
Copy link
Member

Can you share the config file passed to VcfAnnotateFromVcf, the input VCF and the produced output VCF?
I suspect that the storage containing the annotation source files is slow.

Best,
Marc

@allatarr
Copy link
Author

Of course - I have sent the link to your email.
Best,
Lukáš

@marc-sturm
Copy link
Member

I suspect that one of the annotation databases from the config file is corrupt or not indexed correctly.
Here the config:


/data2/megSAP/tmp/local_ngs_data//ensembl-vep-dbs/gnomAD_genome_v4.1_GRCh38.vcf.gz	gnomADg	AC,AF,Hom,Hemi,Het,Wt,AFR_AF,AMR_AF,EAS_AF,NFE_AF,SAS_AF		true
/data2/megSAP/tmp/local_ngs_data//ensembl-vep-dbs/gnomAD_genome_v3.1.mito_GRCh38.vcf.gz	gnomADm	AF_hom		true
/data2/megSAP/tmp/local_ngs_data//ensembl-vep-dbs/clinvar_20240805_converted_GRCh38.vcf.gz	CLINVAR	DETAILS	ID
/data2/megSAP/tmp/local_ngs_data//ensembl-vep-dbs/CADD_SNVs_1.7_GRCh38.vcf.gz	CADD	CADD=SNV	
/data2/megSAP/tmp/local_ngs_data//ensembl-vep-dbs/CADD_InDels_1.7_GRCh38.vcf.gz	CADD	CADD=INDEL	
/data2/megSAP/tmp/local_ngs_data//ensembl-vep-dbs/REVEL_1.3.vcf.gz		REVEL	
/data2/megSAP/tmp/local_ngs_data//ensembl-vep-dbs/AlphaMissense_hg38.vcf.gz		ALPHAMISSENSE	
/data2/megSAP/tmp/local_ngs_data//ensembl-vep-dbs/spliceai_scores_2024_08_26_GRCh38.vcf.gz		SpliceAI	

Can you please try and run VcfAnnotateFromVcf with each line of the config file separately and track the runtime for each line.

/data2/megSAP//data/tools/ngs-bits/bin//VcfAnnotateFromVcf -config_file [modified_config_line_xxx] -in /data2/megSAP/tmp/megSAP_user_root/an_vep_beCPbz_mes.vcf -out test_out_line_[xxx] -threads 6

@allatarr
Copy link
Author

allatarr commented Oct 3, 2024

Dear Mark,
thank you very much. You were right - there is some problem with CADD annotation. I'll try to download and process it again.
Best,
Lukáš

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants