built-in adaptor trimming #146

gspracklin · 2019-12-31T17:54:57Z

This feature would make life easier (i.e. filter low quality and reads with adaptor). @golobor

This has worked for me so far (using fastp).

Phlya · 2019-12-31T18:17:49Z

Does it actually change the results significantly? And how much extra time does it need?

gspracklin · 2020-01-03T15:22:38Z

My guess is that it doesn't significantly change the results right now. However, as Illumina read lengths increase it might become more of a problem. More specifically, because I don't think it's possible to increase the insert size without disrupting bridge amplification (perhaps not with patterned flows cells) so as read length increase the number of sequences with adapter could increase. Also, isn't trimming just generally recommended as good practice?

I'll try to get around to timing the differences at some point.

golobor · 2020-01-09T09:48:30Z

I like the implementation - fastp seems like a good package/reliable dependency and we can specify the exact trimming sequence in the config file. Though, I do have some doubts whether it's entirely necessary. On one hand, a recent report on biorxiv <https://www.biorxiv.org/content/10.1101/833962v1 > showed that trimming is not needed for RNA-seq data if local aligners are used. This happens because the adapter part of a read would form a separate alignment (or, most likely, null/non-unique alignment) which won't affect counting. On the other hand, pairtools parse can be too smart for its own good - we take into account the number and relative order of alignments in a read, such that an adapter at the 3' end can effectively convert a "pair" into a "walk". For this reason, trimming can actually improve results in extreme cases. On the third hand, trimming modifies sequences, so that the final bams won't contain raw sequences anymore. This may screw over people who would store sequencing data in bams as opposed to fastq. Tbh, I know exactly zero labs who do that (DCIC rely on their own pipeline). So, all in all, it's slightly complicated, but I think making it optional is not a bad idea. George, let me know if you're interested in making a PR - I could help with advice, if needed!

…

On Fri, 3 Jan 2020 at 16:22, George Spracklin ***@***.***> wrote: My guess is that it doesn't significantly change the results right now. However, as Illumina read lengths increase it might become more of a problem. More specifically, because I don't think it's possible to increase the insert size without disrupting bridge amplification (perhaps not with patterned flows cells) so as read length increase the number of sequences with adapter could increase. Also, isn't trimming just generally recommended as good practice? I'll try to get around to timing the differences at some point. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#146>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAG64CT5OR3IZ4VC5IBJOXLQ35J37ANCNFSM4KBXCMLQ> .

agalitsyna · 2020-02-20T13:47:28Z

I cannot agree completely with the statement that "the adapter part of a read would form a separate alignment (or, most likely, null/non-unique alignment) which won't affect counting."

First, in the case of multiple mapping read, the presence of the adapter might easily force the mapping to one particular genomic site, although the real location is unknown.

Second, the paper from Liao&Shi relies on only ~1000 genes quantified by RT-PCR, which might not include cases with multiple mappings. This result cannot be easily transferred to the mapping of whole-genome data in Hi-C-related methods, where we certainly have many more locations that ~1000 genes.

Third, Hi-C-related methods with complex ligation procedures emerge and they require adapters trimming sometimes, e.g. Hi-CO https://doi.org/10.1016/j.cell.2018.12.014 , MARGI: https://dx.doi.org/10.1016%2Fj.cub.2017.01.011
It might be great to account for "pair-oriented" methods like that.

agalitsyna mentioned this issue Nov 8, 2022

distiller 0.3.5 roadmap #182

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

built-in adaptor trimming #146

built-in adaptor trimming #146

gspracklin commented Dec 31, 2019

Phlya commented Dec 31, 2019

gspracklin commented Jan 3, 2020

golobor commented Jan 9, 2020 via email

agalitsyna commented Feb 20, 2020

built-in adaptor trimming #146

built-in adaptor trimming #146

Comments

gspracklin commented Dec 31, 2019

Phlya commented Dec 31, 2019

gspracklin commented Jan 3, 2020

golobor commented Jan 9, 2020 via email

agalitsyna commented Feb 20, 2020