-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
built-in adaptor trimming #146
Comments
Does it actually change the results significantly? And how much extra time does it need? |
My guess is that it doesn't significantly change the results right now. However, as Illumina read lengths increase it might become more of a problem. More specifically, because I don't think it's possible to increase the insert size without disrupting bridge amplification (perhaps not with patterned flows cells) so as read length increase the number of sequences with adapter could increase. Also, isn't trimming just generally recommended as good practice? I'll try to get around to timing the differences at some point. |
I like the implementation - fastp seems like a good package/reliable
dependency and we can specify the exact trimming sequence in the config
file.
Though, I do have some doubts whether it's entirely necessary.
On one hand, a recent report on biorxiv
<https://www.biorxiv.org/content/10.1101/833962v1 > showed that trimming
is not needed for RNA-seq data if local aligners are used. This happens
because the adapter part of a read would form a separate alignment (or,
most likely, null/non-unique alignment) which won't affect counting.
On the other hand, pairtools parse can be too smart for its own good - we
take into account the number and relative order of alignments in a read,
such that an adapter at the 3' end can effectively convert a "pair" into a
"walk". For this reason, trimming can actually improve results in extreme
cases.
On the third hand, trimming modifies sequences, so that the final bams
won't contain raw sequences anymore. This may screw over people who would
store sequencing data in bams as opposed to fastq. Tbh, I know exactly zero
labs who do that (DCIC rely on their own pipeline).
So, all in all, it's slightly complicated, but I think making it optional
is not a bad idea. George, let me know if you're interested in making a PR
- I could help with advice, if needed!
…On Fri, 3 Jan 2020 at 16:22, George Spracklin ***@***.***> wrote:
My guess is that it doesn't significantly change the results right now.
However, as Illumina read lengths increase it might become more of a
problem. More specifically, because I don't think it's possible to increase
the insert size without disrupting bridge amplification (perhaps not with
patterned flows cells) so as read length increase the number of sequences
with adapter could increase. Also, isn't trimming just generally
recommended as good practice?
I'll try to get around to timing the differences at some point.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#146>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG64CT5OR3IZ4VC5IBJOXLQ35J37ANCNFSM4KBXCMLQ>
.
|
I cannot agree completely with the statement that "the adapter part of a read would form a separate alignment (or, most likely, null/non-unique alignment) which won't affect counting." First, in the case of multiple mapping read, the presence of the adapter might easily force the mapping to one particular genomic site, although the real location is unknown. Second, the paper from Liao&Shi relies on only ~1000 genes quantified by RT-PCR, which might not include cases with multiple mappings. This result cannot be easily transferred to the mapping of whole-genome data in Hi-C-related methods, where we certainly have many more locations that ~1000 genes. Third, Hi-C-related methods with complex ligation procedures emerge and they require adapters trimming sometimes, e.g. Hi-CO https://doi.org/10.1016/j.cell.2018.12.014 , MARGI: https://dx.doi.org/10.1016%2Fj.cub.2017.01.011 |
This feature would make life easier (i.e. filter low quality and reads with adaptor). @golobor
This has worked for me so far (using fastp).
The text was updated successfully, but these errors were encountered: