Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable chunk-wise processing for all peaks data functions #306

Merged
merged 10 commits into from
Nov 30, 2023
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: Spectra
Title: Spectra Infrastructure for Mass Spectrometry Data
Version: 1.13.1
Version: 1.13.2
Description: The Spectra package defines an efficient infrastructure
for storing and handling mass spectrometry spectra and functionality to
subset, process, visualize and compare spectra data. It provides different
Expand Down
4 changes: 4 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Generated by roxygen2: do not edit by hand

export("processingChunkSize<-")
export(MsBackendCached)
export(MsBackendDataFrame)
export(MsBackendHdf5Peaks)
Expand Down Expand Up @@ -27,6 +28,8 @@ export(plotMzDelta)
export(plotSpectra)
export(plotSpectraOverlay)
export(ppm)
export(processingChunkFactor)
export(processingChunkSize)
export(processingLog)
export(reduceSpectra)
export(scalePeaks)
Expand Down Expand Up @@ -63,6 +66,7 @@ exportMethods(addProcessing)
exportMethods(backendBpparam)
exportMethods(backendInitialize)
exportMethods(backendMerge)
exportMethods(backendParallelFactor)
exportMethods(bin)
exportMethods(c)
exportMethods(centroided)
Expand Down
12 changes: 12 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# Spectra 1.13

## Changes in 1.13.2

- Add possibility to enable and perform chunk-wise (parallel) processing to
`Spectra`: add functions `processingChunkSize`, `backendParallelFactor` and
`processingChunkFactor` to set or get definition of chunks for parallel
processing. All functions working on peaks data use this mechanism which
is implemented in the internal `.peaksapply` function. The `Spectra` object
gains a new slot `"processingChunkSize"` that is used to define the
size of the processing chunks for the `Spectra`. See also [issue
#304](https://github.com/rformassspectrometry/Spectra/issues/304).
This ensures processing also of very large data sets.

## Changes in 1.13.1

- Fix issue with `bin` function (see
Expand Down
3 changes: 3 additions & 0 deletions R/AllGenerics.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ setGeneric("backendMerge", def = function(object, ...)
standardGeneric("backendMerge"),
valueClass = "MsBackend")
#' @rdname hidden_aliases
setGeneric("backendParallelFactor", def = function(object, ...)
standardGeneric("backendParallelFactor"))
#' @rdname hidden_aliases
setMethod("bin", "numeric", MsCoreUtils::bin)
setGeneric("combinePeaks", function(object, ...)
standardGeneric("combinePeaks"))
Expand Down
21 changes: 20 additions & 1 deletion R/MsBackend.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
#' @aliases supportsSetBackend
#' @aliases backendBpparam
#' @aliases backendInitialize
#' @aliases backendParallelFactor,MsBackendMzR-method
#' @aliases backendParallelFactor,MsBackendHdf5Peaks-method
#'
#' @description
#'
Expand Down Expand Up @@ -212,7 +214,9 @@
#' because they contain a connection to a database that can not be
#' shared across processes) should extend this method to return only
#' `SerialParam()` and hence disable parallel processing for (most)
#' methods and functions.
#' methods and functions. See also `backendParallelFactor` for a
#' function to provide a preferred splitting of the backend for parallel
#' processing.
#'
#' - `backendInitialize`: initialises the backend. This method is
#' supposed to be called rights after creating an instance of the
Expand All @@ -233,6 +237,14 @@
#' instance. All objects to be merged have to be of the same type (e.g.
#' [MsBackendDataFrame()]).
#'
#' - `backendParallelFactor`: returns a `factor` defining an optimal
#' (preferred) way how the backend can be split for parallel processing
#' used for all peak data accessor or data manipulation functions.
#' The default implementation returns a factor of length 0 (`factor()`)
#' providing thus no default splitting. A `backendParallelFactor` for
#' `MsBackendMzR` on the other hand returns `factor(dataStorage(object))`
#' hence suggesting to split the object by data file.
#'
#' - `dataOrigin`: gets a `character` of length equal to the number of spectra
#' in `object` with the *data origin* of each spectrum. This could e.g. be
#' the mzML file from which the data was read.
Expand Down Expand Up @@ -849,6 +861,13 @@ setMethod("backendMerge", "MsBackend", function(object, ...) {
stop("Not implemented for ", class(object), ".")
})

#' @exportMethod backendParallelFactor
#'
#' @rdname MsBackend
setMethod("backendParallelFactor", "MsBackend", function(object, ...) {
factor()
philouail marked this conversation as resolved.
Show resolved Hide resolved
})

#' @rdname MsBackend
setMethod("export", "MsBackend", function(object, ...) {
stop(class(object), " does not support export of data; please provide a ",
Expand Down
4 changes: 4 additions & 0 deletions R/MsBackendHdf5Peaks.R
Original file line number Diff line number Diff line change
Expand Up @@ -306,3 +306,7 @@ setMethod("backendMerge", "MsBackendHdf5Peaks", function(object, ...) {
validObject(res)
res
})

setMethod("backendParallelFactor", "MsBackendHdf5Peaks", function(object) {
factor(dataStorage(object), levels = unique(dataStorage(object)))
})
4 changes: 4 additions & 0 deletions R/MsBackendMzR.R
Original file line number Diff line number Diff line change
Expand Up @@ -210,3 +210,7 @@ setMethod("export", "MsBackendMzR", function(object, x, file = tempfile(),
MoreArgs = list(format = format, copy = copy),
BPPARAM = BPPARAM)
})

setMethod("backendParallelFactor", "MsBackendMzR", function(object) {
factor(dataStorage(object), levels = unique(dataStorage(object)))
})
Loading
Loading