Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify somatic HLA mutation from SpecHLA #23

Open
LongpanUPC opened this issue Feb 2, 2024 · 6 comments
Open

Identify somatic HLA mutation from SpecHLA #23

LongpanUPC opened this issue Feb 2, 2024 · 6 comments

Comments

@LongpanUPC
Copy link

Hi,
Thank you for developing this tool which is quite easy to use!
I am wondering if SpecHLA could also be used to detect somatic HLA mutations since we have paired tumor and normal WES or RNAseq data. Also I found some HLA_XXX.vcf.gz/HLA_XXX.rephase.vcf.gz/HLA_XXX.specHap.phased.vcf in the output. I don't know if SpecHLA already did this things. And maybe I just need to compare the results from tumor with that from normal to get the distinct varaints?
Could you give some comments about this and guide us how to identify the somatic (or germline if possible) HLA mutations using SpecHLA if it works?
thank you very much!
Long

@wshuai294
Copy link
Collaborator

Hello Long,

Thanks for using this method. SpecHLA does not support detecting somatic HLA mutations directly. However, as you said, the somatic mutations can be obtained by comparing the variants between tumor and normal samples. HLA_XXX.rephase.vcf.gz records the phased variants (i.e., haplotypes). To achieve this goal, we can first align the haplotypes between tumor and normal samples by counting the shared genotypes at all variant loci. Second, just identify the difference between the aligned haplotypes from tumor and normal samples. The loci where the difference occurs could be the somatic variants.

I hope it can help. Please let me know if there is any other issue.

best,
shuai

@LongpanUPC
Copy link
Author

Thank you, shuai!
I tried one case. However, it seems that it doesn't work well. In the normal tissue, I find only 10 variants while there are 35 variants from paired tumor. I found the average depth in the normal is lower than that in the tumor (less than 15 in N and average >30 in tumor). So the coverage greatly affected the findings.
By the way, do you have a plan to include HLA mutation detection as an additional pipeline in SpecHLA?
best,
Long

@wshuai294
Copy link
Collaborator

Hi Long,

To examine the reliability of the inferred somatic mutations, it is suggested to observe the Bam file $outdir/*realign.bam using IGV or samtools tview. However, by default, SpecHLA will delete the bam files. So, to maintain the bam file, we need to delete the statement bash $dir/../clear_output.sh $outdir/ in the 353-th line of SpecHLA.sh. The reference file is $db/ref/hla.ref.extend.fa.

I believe it would be useful to include an HLA mutation detection pipeline. I plan to add a script to identify the sample-specific variants given two samples. I think this may be useful. Do you have any suggestions?

best,
shuai

@LongpanUPC
Copy link
Author

Hi shuai,
fully agree. Then only using SpecHLA, one tools to get all the HLA infomation (genotype, mutations and LOH) is wonderfully, especially for cancer genomics. Taking into account the effect of coverage, it will be better to also report the total depth both in two samples (i.e, tumor and normal) to allow us to know if this variant is true mutation rather than a spurious mutation due to lower coverage in one of the samples. Besides, may be also include some qulity score or features if possible.
By the way, I would like to try once you develope this script.
thank you!
Long

@LongpanUPC
Copy link
Author

Hi Shuai,
I kindly ask the progress of script to detect the somatic mutations between two samples. Hope it will not bother you.
Before your code is released, I also did some try according to your suggestion. I directly used mutect2 to detect somatic HLA mutations. The code was as followed:
gatk Mutect2
-R $db/ref/hla.ref.extend.fa
-I /pathtoresults/XXXtumor.realign.sort.bam
-I /pathtoresults/XXXnormal.realign.sort.bam
--normal-sample XXXnormal
-O /pathtoresults/HLA.somatic.vcf.gz
And it works. We found some mutations. However, now I don't know how to annotate these variants since they use custome reference genome. My aims to map these variants to hg38 ref genome and get standard protein_change. something like that.
Do you have some suggestions? or maybe waiting for your next release?

thank you!
Long

@wshuai294
Copy link
Collaborator

wshuai294 commented Mar 3, 2024

Hello,

Sorry for not getting back to you sooner because of the Lunar New Year holiday. Your script seems very good for somatic mutation detection. As for converting the locus position, we can use the method CrossMap. CrossMap reply on the chain file, which can be obtained using the method chainNet and axtChain based on the alignment of $db/ref/hla.ref.extend.fa and hg38.

An alternative solution is to extract the surrounding sequence (e.g. 50 bp) of each somatic mutation and map the sequence to the hg38 using Minimap2.

Please let me know if it works.

best,
shuai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants