multiple single cell samples #211

wanghlv · 2024-07-10T04:04:32Z

Hi, Thanks for writing such a complete MAN page! I have a quick question, I have a total of 6 samples, and all of them are single cell Nanopore libraries. I'd like both the transcript and gene quantification to be per cell (in the CB tag) and per sample.
Could I use --read_group file_name:tag:CB ? or I should supply the file like --read_group file:READ_TO_BARCODE_Samples.TSV:0:2

READ_TO_BARCODE_Samples.TSV, should look like:, so the first column is the READ ID, second is the cell barcodes, and the third is the sample? However, I'm not sure if the read ID is unique across all 6 samples I have.

12a5c9c3-2b73-49c0-a3fd-22d2c10832e2_0 AATCAGGAGTGAACGA Sample1
b6e8c102-e1e2-4155-bc28-7dbb5a34c857_0 CCAGCTGCATGAGCAG Sample2
...

I'm currently running it as the following:
isoquant.py -d ont -r ${FA} --complete_genedb --genedb ${GTF}
--bam ${s1bam} ${s2bam} ${s3bam} ${s4bam} ${s5bam} ${s6bam}
-o IQ_all --prefix IQ_all -l s1 s2 s3 s4 s5 s6
--sqanti_output --check_canonical --count_exons --bam_tags
-t 24 --genedb_output
--model_construction_strategy default_ont
--report_canonical auto --read_group tag:CB

Or I was thinking to add a new tag into my bam file including both the cellbarcode pending with a sample ID like AATCAGGAGTGAACGAs1, CCAGCTGCATGAGCAGs2, ... However, I haven't found a good way to do that because I have a lot of reads in my entire experiment. Thank you so much for your suggestions

Best, Hsiao-Lin

andrewprzh · 2024-07-14T21:53:56Z

Dear @wanghlv

Thanks for the feedback!

Could I use --read_group file_name:tag:CB ? or I should supply the file like --read_group file:READ_TO_BARCODE_Samples.TSV:0:2

I think both ways are identical in terms of results, although using read tags may save memory since in this case IsoQuant won't load the entire barcode table into memory.

Unfortunately, current version of IsoQuant can only group counts by one factor at a time, so either the barcode, or the sample. So if you want both, I guess you'll need to perform two runs.

However, I'm not sure if the read ID is unique across all 6 samples I have.

I highly doubt ONT reads can have identical IDs.

Or I was thinking to add a new tag into my bam file including both the cellbarcode pending with a sample ID like AATCAGGAGTGAACGAs1, CCAGCTGCATGAGCAGs2, ... However, I haven't found a good way to do that because I have a lot of reads in my entire experiment. Thank you so much for your suggestions

Adding new tag would require creating a new BAM file, so probably it's easier to create a new TSV table.

P.S. New version 3.4.2 should be more effective in term of RAM consumption, so it's better to update if possible.

Best
Andrey

wanghlv · 2024-07-15T20:15:42Z

Thank you for all the info and suggestions, and yes 3.4.2 is so much better at using RAM!! I'm wondering if you would recommend a efficient cell barcodes and UMI processing tools before using IsoQuant for mapping, for single cell nanopore data. Also, I'm wondering since I have the single cell data with also UMI. How would you factor in the quantifications, properly to avoid double counting PCR duplicates?
Thanks so much again
Hsiao-Lin

andrewprzh · 2024-07-16T08:00:28Z

@wanghlv

Currently, I'm using a barcode calling and PCR de-duplication tools of my own (https://github.com/ablab/IsoQuant/tree/sc_v3). They are not released yet, but at some point they will become a part of IsoQuant too. If you eager to test it, contact me via email, please :)

There are also some pipelines available, such as
https://github.com/nf-core/scnanoseq (also uses IsoQuant)
https://github.com/epi2me-labs/wf-single-cell
They also have a list of tools they use for barcode calling / PCR de-duplication. However, I have not tried any of those yet.

Hope that helps.

Best
Andrey

vasikara17 · 2024-09-05T09:19:29Z

Hello, I have a similar issue that I posted yesterday! In my case I have one bam file that contains all the conditions. Could you elaborate on running two times isoquant with different tags? How can I keep the barcode and the condition information?
Best,
VK

andrewprzh · 2024-09-12T12:14:28Z

Replied in #234

andrewprzh added the question Further information is requested label Jul 14, 2024

andrewprzh mentioned this issue Sep 12, 2024

Single cell data one library with different conditions #234

Open

andrewprzh closed this as completed Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multiple single cell samples #211

multiple single cell samples #211

wanghlv commented Jul 10, 2024

andrewprzh commented Jul 14, 2024

wanghlv commented Jul 15, 2024 •

edited

Loading

andrewprzh commented Jul 16, 2024

vasikara17 commented Sep 5, 2024

andrewprzh commented Sep 12, 2024

multiple single cell samples #211

multiple single cell samples #211

Comments

wanghlv commented Jul 10, 2024

andrewprzh commented Jul 14, 2024

wanghlv commented Jul 15, 2024 • edited Loading

andrewprzh commented Jul 16, 2024

vasikara17 commented Sep 5, 2024

andrewprzh commented Sep 12, 2024

wanghlv commented Jul 15, 2024 •

edited

Loading