-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xTea on plant species #20
Comments
Hi, the current repeat library is only prepared for human. You need to prepare a library for the plant species that you want to work on. Generally, you need to know what type (family) of repeats you are working on, the consensus sequence of them, the reference genome of the species, and also the repeat annotation (from RepeatMasker or other tools). With this, we can generate the library for the plant species and run on the alignments. I didn't try this before. I think it should work, but need extra effort for library prepare. |
Thanks! We will give it a try see how it goes. |
It would be really helpful if there is a guide on how to generate repeat library for non-humans. @agolicz did you manage to run xTea on plants? Could you please share how did it go? |
Unfortunately I have not had the time yet. Hopefully at the beginning of next year. |
I put a readme to prepare the repeat library here: https://github.com/parklab/xTea/tree/master/xtea/rep_lib_prep. Would you like to have a try? Note, xTea only works for TE insertions of known type, and need the repatmasker annotation of the TEs you are interested, and also a consensus sequence. |
Would it also be possible to add fasta header format (>xxx) for the TE library and the repeatmasker command to be used with that library? |
What do you mean by fasta header format? For which file? |
For the file:
I just wanted to confirm that xTea expects the same. For running pipelines like that on non-model species it is very helpful to have toy datasets, so we can ensure everything is formatted as expected. Some formatting conventions are not the same for human/animal/plant genomics. |
You only need to extract the TE type you wanted to work on (each type separately). For example, if you have a repeatmasker output for the whole genome named When generate the I am thinking of having a script to automatically generate this, but it's not easy to have a fix mode. Different species/TEs are of different length and different ids (some are customized set). |
Ok, thanks that makes sense. I will try to give it a try in January. |
Hello, @agolicz! I would like to ask if you have successfully used x-Tea on plants already? If so, how did it go? Thanks. |
Hello, And I don't see -P option in the xtea help page as well. What does -P stand for here? Thanks! |
Could you try again by replacing |
I tried the following and got this error: Traceback (most recent call last): |
I am trying to prepare xTea repeat library using the chm13 genome. |
@zhuxf-lab it's based on length for the active Human retrotransposons. For example, L1, I set >5900bp as full length. |
Hi, I tried using >5900bp as the cutoff for the full length L1. I run hg38 first to see whether I can reproduce the result in the provided hg38 rep_lib_annotation data. It turned out that the result I got was much larger than the annotation file provided. For example, the hg38_FL_L1_flanks.fa file I got is 53MB (using -e 100), while the size of hg38_FL_L1_flanks_3k.fa in the provided rep_lib_annotation file is 2MB. I attached my code here, any idea where is incorrect? The hg38 reference genome and repeatmasker output file are all from UCSC. ######### python x_TEA_main.py -P -K -p ./ -r hg38.fa -a hg38.fa_L1_full_length.out -o hg38.fa_L1_full_length_with_flank_e100.fa -e 100 And is it reasonable to set cutoff for full-length Alu, SVA, HERV as 250bp, 1900bp, 8900bp? It would be super helpful if you could kindly add chm13 into the rep_lib_annotation data. Thank you! |
@zhuxf-lab I moved your question to a new issue #50, I'll work on it asap. |
@zhuxf-lab while I am working on this issue, the size difference (53M vs 2M) is because I only select L1HS (reported active L1) rather than all the L1 subfamilies.
For SVA, I set 700bp. |
Ok, Thanks! |
I'm very curious to know if the process for the custom repeat library is available now. |
@bismarck1008 it should work |
xtea -P -K -p ./ -r path-of-reference-genome.fa -a path-to-rep-lib-folder/full-length-TE-type_rmsk.out -o path-output-folder/TE_copies_with_flank.fa -e 100 |
try |
Considering that other species have other elements than just L1, Alu, SVA and HERV, would xTea identify them? In this case which would be the option for y parameter? Thanks |
@adriludwig , use "-y 32". Here is a readme: https://github.com/parklab/xTea/tree/master/xtea/rep_lib_prep (at the bottom). It's not convenient as you can only run one repeat type at a time. I'll try to write up a new version/wrapper for this. |
Thanks very much, @simoncchu. We are currently using mobster, but we would also like to test other tools. So I'll keep an eye on xTea updates. |
Respected experts! I'm new to this topic and I'm curious to know if currently the repeated library were available for plants or is there any update on XTea for transposable element analysis in plants. |
Hi,
We wanted to try xTea on a plant species. Is that possible or does it require a human reference?
Agnieszka
The text was updated successfully, but these errors were encountered: