Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiplex_Major_Patch #452

Open
wants to merge 162 commits into
base: devel
Choose a base branch
from
Open

Multiplex_Major_Patch #452

wants to merge 162 commits into from

Conversation

andredsim
Copy link
Collaborator

@andredsim andredsim commented Oct 25, 2024

Work in progress. Do not use or run

In this PR transcripts that are normally filtered out including subset transcripts and those above the NDR threshold are placed into the metadata of the annotations in $subsetTranscripts and $lowConfidenceTranscripts respectively.
This means that if users ran bambu with the wrong NDR setting, and do not want to run discovery again, they can get the missing transcripts from the metadata.
To facilitate this this PR adds the external setNDR function which takes the extendedAnnotations and an NDR value and will switch novel bambu annotations between the main annotations and the low confidence annotations based on the threshold provided. If no threshold is provided, setNDR will recommend an NDR with the same method used during transcript discovery.
In order for this to work for annotations that have already been saved to a gtf file, bambu now outputs the NDR, txScore and txScore.noFit as attributes to the gtf file and these are also read in with prepareAnnotations.
Important to note that if annotations are written with an NDR threshold of <1, these low confidence transcripts will be missed.
Added setNDR as part of quant, which means that users can provide their extendedAnnotations alongside an NDR threshold when running bambu and it will automatically adjust the NDR used for quant. This means users do not need to manually filter the NDR value themselves.
NDR and other stats are now copied over to equal transcripts even if above the NDR threshold (previously only happened for those below the NDR threshold)
Minor change:
Warnings will no longer occur if there are seqlevels in the readGrgList that are not in the annotations or genome. This was done by setting seqlevels of the reads to only those in the reads. Warning was constantly occuring because all the scaffolds used in alignment were in the bam files, even if no reads from these scaffolds existed.

Todo - Unit Tests, Update bambu documentation to include setNDR

This reverts commit 9b865dfa334e22774a9e51fd829afd23b7e27181.
combinedTranscripts <- as_tibble(rbindlist(list(combinedSplicedTranscripts,
combinedUnsplicedTranscripts), fill = TRUE))
return(combinedTranscripts)
return(combinedSplicedTranscripts)
Copy link
Collaborator

@cying111 cying111 Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @andredsim

where is the commented code moved to? does this mean there is no unsplicedNew transcript models?

Copy link
Collaborator Author

@andredsim andredsim Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an outdated version, just double check you have pulled the latest version of this branch, as the commented code is uncommented in it. Now the unsplicedNew transcript models only appear if the user provides min.txScore.singleExon < 1. Which is essentially the same as before as we didn't output novel single exon transcripts by default.

if(length(annotationGrangesList)>0){ #only recommend an NDR if its possible to calculate an NDR
NDR = recommendNDR(rowDataCombined, baselineFDR, NDR, defaultModels, verbose)
} else {
if(is.null(NDR)) NDR = 0.5
}
filterSet = (rowDataCombined$NDR <= NDR)
filterSet = (rowDataCombined$NDR <= NDR | rowDataCombined$readClassType == "equal:compatible")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the equal read class needed here? would they just be the annotated transcripts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants