Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

built-in adaptor trimming #146

Open
gspracklin opened this issue Dec 31, 2019 · 4 comments
Open

built-in adaptor trimming #146

gspracklin opened this issue Dec 31, 2019 · 4 comments

Comments

@gspracklin
Copy link
Member

This feature would make life easier (i.e. filter low quality and reads with adaptor). @golobor

This has worked for me so far (using fastp).

Screenshot 2019-12-31 12 49 49

@Phlya
Copy link
Member

Phlya commented Dec 31, 2019

Does it actually change the results significantly? And how much extra time does it need?

@gspracklin
Copy link
Member Author

My guess is that it doesn't significantly change the results right now. However, as Illumina read lengths increase it might become more of a problem. More specifically, because I don't think it's possible to increase the insert size without disrupting bridge amplification (perhaps not with patterned flows cells) so as read length increase the number of sequences with adapter could increase. Also, isn't trimming just generally recommended as good practice?

I'll try to get around to timing the differences at some point.

@golobor
Copy link
Member

golobor commented Jan 9, 2020 via email

@agalitsyna
Copy link
Member

I cannot agree completely with the statement that "the adapter part of a read would form a separate alignment (or, most likely, null/non-unique alignment) which won't affect counting."

First, in the case of multiple mapping read, the presence of the adapter might easily force the mapping to one particular genomic site, although the real location is unknown.

Second, the paper from Liao&Shi relies on only ~1000 genes quantified by RT-PCR, which might not include cases with multiple mappings. This result cannot be easily transferred to the mapping of whole-genome data in Hi-C-related methods, where we certainly have many more locations that ~1000 genes.

Third, Hi-C-related methods with complex ligation procedures emerge and they require adapters trimming sometimes, e.g. Hi-CO https://doi.org/10.1016/j.cell.2018.12.014 , MARGI: https://dx.doi.org/10.1016%2Fj.cub.2017.01.011
It might be great to account for "pair-oriented" methods like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants