Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help replicating results from another paper #453

Open
NikoLichi opened this issue Nov 14, 2024 · 3 comments
Open

Help replicating results from another paper #453

NikoLichi opened this issue Nov 14, 2024 · 3 comments

Comments

@NikoLichi
Copy link

NikoLichi commented Nov 14, 2024

Dear Bambu team,

First of all, I have to say that I really like this tool! and your guidance is very helpful!

Now, my issue: I am comparing bambu vs FLAIR using the published data in this article. https://www.nature.com/articles/s41467-024-48615-4#Fig1 . I am trying to reproduce the figure 1E using bambu and Squanti3.

A colleague ran the alignment part and FLAIR and I am running Bambu. We are not even close to replicating their results. Interestingly for Bambu, there are no novel transcripts/genes reported, not even forcing NDR, i.e., 0.3
Screenshot 2024-11-14 at 14 17 59. how is this possible?
For FLAIR
Screenshot 2024-11-14 at 14 19 23 we are getting some novel transcripts but still need to get similar results.

How is it possible that bambu is not getting any novel transcript? could I do something to enhance my code? I am adding the bam file I am using for this. The reference we are using the human gencode.v38 files.
DRR465296.aligned.corrected_all_corrected_chr21.bam.gz

My code looks like:

library(bambu)

BAMlist <- "DRR465296.aligned.corrected_all_corrected_chr21.bam"
fa.file <- "GCA_000001405.15_GRCh38_no_alt_analysis_set.fna"
gtf.file <-  "gencode.v38.annotation.gtf"
bambuAnnotations <- prepareAnnotations(gtf.file)

se = bambu(reads = BAMlist, annotations = bambuAnnotations, genome = fa.file, ncore = 14)
seGene <- transcriptToGeneExpression(se)

writeBambuOutput(se, path = "MY_PATH", prefix = "fPMBC_DefNDR_NewRef_")

Thanks for any help here!
Kind regards,
Niko

@cying111
Copy link
Collaborator

Hi @NikoLichi ,

Thanks for reporting this to us.

I had a look at the data, it seems that the bam file is too small to have novel transcripts that pass the default filters in bambu. If you would want to obtain expected results, we would recommend you to try with a larger bam file.

If you really need to run it on a small bam file, you might want to relax a bit on the discovery threshold, for example, set NDR =1, min.readCount = 1, and fitReadClassModel = FALSE.

Let us know if this helps.
Thank you
Warm regards,
Ying

@NikoLichi
Copy link
Author

Dear @cying111,

Thanks a lot for your reply and hints. I agree that the sample test is very small... However, we decided to go with this test, but see PS below.

Continuing with your input, I ran 3 bambu tests, adding each one of the parameters you mentioned. Please see the attached figure in which these 15-22 novel transcripts are about 0.01% NIC (Novel In Catalog).

  1. Only activating NDR=1 gave 15 novel transcripts (no novel genes).
  2. NDR=1 and min.readCount = 1, gave 22 novel transcripts (no novel genes).
    3.NDR =1, min.readCount = 1, and fitReadClassModel = FALSE There was no change from the previous step.

So this is better but still not comparable with the initial graph I sent you from FLAIR. Do you have any other suggestions for increasing the number of novelties?

Screenshot 2024-11-18 at 12 08 34

All the best,
NiKo

PS. I am running another test with a larger cohort (and larger BAM files ~1Millon reads per sample in 12 samples) re-training the bambu model. I hope these bambu options can help to increase the number of novelties and make a fully comparable test.

@cying111
Copy link
Collaborator

Hi @NikoLichi ,

Thanks for getting back with these updates.

For your question to get a higher number of novel transcripts, there are a few additional parameters you can adjust, including remove.subsetTx = FALSE, min.readFractionByGene = 0.01. If these still don't yield a sufficient number of transcripts, you could also consider relaxing min.exonOverlap.

However, please note that using these relaxed parameters may work well for smaller samples, but for larger datasets, it could lead to less specificity and potentially become infeasible. I recommend to adjust these parameters accordingly.

Thank you and looking forward to hearing back from you!
Warm regards,
Ying

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants