Help replicating results from another paper #453

NikoLichi · 2024-11-14T13:34:49Z

Dear Bambu team,

First of all, I have to say that I really like this tool! and your guidance is very helpful!

Now, my issue: I am comparing bambu vs FLAIR using the published data in this article. https://www.nature.com/articles/s41467-024-48615-4#Fig1 . I am trying to reproduce the figure 1E using bambu and Squanti3.

A colleague ran the alignment part and FLAIR and I am running Bambu. We are not even close to replicating their results. Interestingly for Bambu, there are no novel transcripts/genes reported, not even forcing NDR, i.e., 0.3
. how is this possible?
For FLAIR
we are getting some novel transcripts but still need to get similar results.

How is it possible that bambu is not getting any novel transcript? could I do something to enhance my code? I am adding the bam file I am using for this. The reference we are using the human gencode.v38 files.
DRR465296.aligned.corrected_all_corrected_chr21.bam.gz

My code looks like:

library(bambu)

BAMlist <- "DRR465296.aligned.corrected_all_corrected_chr21.bam"
fa.file <- "GCA_000001405.15_GRCh38_no_alt_analysis_set.fna"
gtf.file <-  "gencode.v38.annotation.gtf"
bambuAnnotations <- prepareAnnotations(gtf.file)

se = bambu(reads = BAMlist, annotations = bambuAnnotations, genome = fa.file, ncore = 14)
seGene <- transcriptToGeneExpression(se)

writeBambuOutput(se, path = "MY_PATH", prefix = "fPMBC_DefNDR_NewRef_")

Thanks for any help here!
Kind regards,
Niko

The text was updated successfully, but these errors were encountered:

cying111 · 2024-11-15T03:31:07Z

Hi @NikoLichi ,

Thanks for reporting this to us.

I had a look at the data, it seems that the bam file is too small to have novel transcripts that pass the default filters in bambu. If you would want to obtain expected results, we would recommend you to try with a larger bam file.

If you really need to run it on a small bam file, you might want to relax a bit on the discovery threshold, for example, set NDR =1, min.readCount = 1, and fitReadClassModel = FALSE.

Let us know if this helps.
Thank you
Warm regards,
Ying

NikoLichi · 2024-11-18T11:10:55Z

Dear @cying111,

Thanks a lot for your reply and hints. I agree that the sample test is very small... However, we decided to go with this test, but see PS below.

Continuing with your input, I ran 3 bambu tests, adding each one of the parameters you mentioned. Please see the attached figure in which these 15-22 novel transcripts are about 0.01% NIC (Novel In Catalog).

Only activating NDR=1 gave 15 novel transcripts (no novel genes).
NDR=1 and min.readCount = 1, gave 22 novel transcripts (no novel genes).
3.NDR =1, min.readCount = 1, and fitReadClassModel = FALSE There was no change from the previous step.

So this is better but still not comparable with the initial graph I sent you from FLAIR. Do you have any other suggestions for increasing the number of novelties?

All the best,
NiKo

PS. I am running another test with a larger cohort (and larger BAM files ~1Millon reads per sample in 12 samples) re-training the bambu model. I hope these bambu options can help to increase the number of novelties and make a fully comparable test.

cying111 · 2024-11-19T03:16:55Z

Hi @NikoLichi ,

Thanks for getting back with these updates.

For your question to get a higher number of novel transcripts, there are a few additional parameters you can adjust, including remove.subsetTx = FALSE, min.readFractionByGene = 0.01. If these still don't yield a sufficient number of transcripts, you could also consider relaxing min.exonOverlap.

However, please note that using these relaxed parameters may work well for smaller samples, but for larger datasets, it could lead to less specificity and potentially become infeasible. I recommend to adjust these parameters accordingly.

Thank you and looking forward to hearing back from you!
Warm regards,
Ying

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help replicating results from another paper #453

Help replicating results from another paper #453

NikoLichi commented Nov 14, 2024 •

edited

Loading

cying111 commented Nov 15, 2024

NikoLichi commented Nov 18, 2024

cying111 commented Nov 19, 2024

Help replicating results from another paper #453

Help replicating results from another paper #453

Comments

NikoLichi commented Nov 14, 2024 • edited Loading

cying111 commented Nov 15, 2024

NikoLichi commented Nov 18, 2024

cying111 commented Nov 19, 2024

NikoLichi commented Nov 14, 2024 •

edited

Loading