Multiplex_Major_Patch #452

andredsim · 2024-10-25T07:11:08Z

Work in progress. Do not use or run

In this PR transcripts that are normally filtered out including subset transcripts and those above the NDR threshold are placed into the metadata of the annotations in $subsetTranscripts and $lowConfidenceTranscripts respectively.
This means that if users ran bambu with the wrong NDR setting, and do not want to run discovery again, they can get the missing transcripts from the metadata.
To facilitate this this PR adds the external setNDR function which takes the extendedAnnotations and an NDR value and will switch novel bambu annotations between the main annotations and the low confidence annotations based on the threshold provided. If no threshold is provided, setNDR will recommend an NDR with the same method used during transcript discovery.
In order for this to work for annotations that have already been saved to a gtf file, bambu now outputs the NDR, txScore and txScore.noFit as attributes to the gtf file and these are also read in with prepareAnnotations.
Important to note that if annotations are written with an NDR threshold of <1, these low confidence transcripts will be missed.
Added setNDR as part of quant, which means that users can provide their extendedAnnotations alongside an NDR threshold when running bambu and it will automatically adjust the NDR used for quant. This means users do not need to manually filter the NDR value themselves.
NDR and other stats are now copied over to equal transcripts even if above the NDR threshold (previously only happened for those below the NDR threshold)
Minor change:
Warnings will no longer occur if there are seqlevels in the readGrgList that are not in the annotations or genome. This was done by setting seqlevels of the reads to only those in the reads. Warning was constantly occuring because all the scaffolds used in alignment were in the bam files, even if no reads from these scaffolds existed.

Todo - Unit Tests, Update bambu documentation to include setNDR

This reverts commit 9b865dfa334e22774a9e51fd829afd23b7e27181.

Combine the change to unspliced read mutating with the inclusion of sample id Fix bug during extendAnno where wrong variable was provided when determining the number of novel genes

…iplex_Major_Patch

cying111 · 2024-10-28T06:19:50Z

R/bambu-extendAnnotations-utilityCombine.R

-    combinedTranscripts <- as_tibble(rbindlist(list(combinedSplicedTranscripts,
-        combinedUnsplicedTranscripts), fill = TRUE))
-    return(combinedTranscripts)
+    return(combinedSplicedTranscripts)


Hi @andredsim

where is the commented code moved to? does this mean there is no unsplicedNew transcript models?

This is an outdated version, just double check you have pulled the latest version of this branch, as the commented code is uncommented in it. Now the unsplicedNew transcript models only appear if the user provides min.txScore.singleExon < 1. Which is essentially the same as before as we didn't output novel single exon transcripts by default.

cying111 · 2024-10-28T06:26:11Z

R/bambu-extendAnnotations-utilityExtend.R

  if(length(annotationGrangesList)>0){ #only recommend an NDR if its possible to calculate an NDR
      NDR = recommendNDR(rowDataCombined, baselineFDR, NDR, defaultModels, verbose)
  } else {
      if(is.null(NDR)) NDR = 0.5
  }
-  filterSet = (rowDataCombined$NDR <= NDR)
+  filterSet = (rowDataCombined$NDR <= NDR | rowDataCombined$readClassType == "equal:compatible")


why the equal read class needed here? would they just be the annotated transcripts?

small change

andredsim added 30 commits December 1, 2022 17:28

Remove SE from being combined during extendAnno

cd47663

Copy over rowdata from equal read classes above NDR threshold

cbec873

Add lowConfidenceTranscripts to metadata and speed up relSubsetCount

c7306ed

subset transcripts are added to metadata

44f76ab

Add helper function to set NDR on extendedAnnotations

7e32425

function to set NDR threshold on extendedAnno

f5d4f17

Revert "Add helper function to set NDR on extendedAnnotations"

ba7c084

This reverts commit 9b865dfa334e22774a9e51fd829afd23b7e27181.

export setNDR

46d9167

readClass warnings kept only when rcFile is made and not when loaded in

82fe064

set seqlevels of reads to only seqnames in the reads to reduce warnings

50677f4

write NDR to gtf

d17cbf1

Read in NDR if in gtf

232c160

Allow setNDR to work on gtf files that are read in

0ae5c37

Allow NDR filtering in NDR quant only mode

3f464e0

Add txScore to mcols

3043a1d

read and write the txScore to GTF

21187ce

NA's are loaded in as NAs from GTF

0c288a7

setNDR can now recommend NDR

1741376

Add documentation to setNDR and carry arguments over

8266e69

Update test cases to include metadata and new warnings

e159b48

Assign names to transcripts earler so subset and lowConf are given names

d0b3b7d

Remove unneeded argument

afc9e03

Fix includeRef bug

735a22e

Unit tests for setNDR

d242796

Unit tests for reading and writing NDR and txScore

75458de

Update documentation

5be3a63

fix setNDR test

108e8b9

Bambu now outputs 3 gtf files by default

f061a8e

4th GTF of novel transcripts only

601301a

Include options to ignore GTF outputs

49340ba

andredsim added 3 commits October 25, 2024 14:18

Merge branch 'full_NDR_output' into Multiplex_Major_Patch

b829e03

Update release history

77cae13

Merge branch 'messyForest' into Multiplex_Major_Patch

657e2b5

andredsim requested review from cying111 and jonathangoeke October 25, 2024 07:11

andredsim and others added 7 commits October 28, 2024 10:18

Fix merge

9f88889

Combine the change to unspliced read mutating with the inclusion of sample id Fix bug during extendAnno where wrong variable was provided when determining the number of novel genes

Refactor quantData input parameter and colData when not multiplexed

e133c43

update actions versions

23e0212

Merge remote-tracking branch 'origin/Multiplex_Major_Patch' into Mult…

2b08636

…iplex_Major_Patch

Restore seperateSamples argument in writeBambuOutput lost in merge

c537224

Put gtf file extension at the end of writeBambuOutput

0024c5f

Add the flags in writeBambuOutput to control which gtfs are written

51f382e

cying111 reviewed Oct 28, 2024

View reviewed changes

andredsim added 16 commits October 28, 2024 17:05

Update release history

d10f86a

Update xgboost models and include code to redo it

38cb052

Update quant test data due to assigndist changes

c1ab8af

Update remaining test data

5515c43

Update version message

75a49f7

Add in preset modes

f2d2d83

remove commas

09dcb4d

Fix mode

c55383d

Fix reccomendNDR

99d3148

small change

Fix quantification when running bulk

a605fe7

restore returnDistTable

7a62db7

Return default degbias to on

10fd905

Restore trackreads

df6b4b3

Add in descriptions for new arguments

5d0af93

Merge branch 'filter_barcodes' into Multiplex_Major_Patch

121a3c2

Clean up cluster discovery

0b455ef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiplex_Major_Patch #452

Multiplex_Major_Patch #452

andredsim commented Oct 25, 2024 •

edited

Loading

cying111 Oct 28, 2024 •

edited

Loading

andredsim Oct 28, 2024 •

edited

Loading

cying111 Oct 28, 2024

Multiplex_Major_Patch #452

Are you sure you want to change the base?

Multiplex_Major_Patch #452

Conversation

andredsim commented Oct 25, 2024 • edited Loading

cying111 Oct 28, 2024 • edited Loading

Choose a reason for hiding this comment

andredsim Oct 28, 2024 • edited Loading

Choose a reason for hiding this comment

cying111 Oct 28, 2024

Choose a reason for hiding this comment

andredsim commented Oct 25, 2024 •

edited

Loading

cying111 Oct 28, 2024 •

edited

Loading

andredsim Oct 28, 2024 •

edited

Loading