Skip to content

An extension of the ProteoWizard framework enabling the support of the mzDB format

License

Apache-2.0, Apache-2.0 licenses found

Licenses found

Apache-2.0
LICENSE
Apache-2.0
LICENSE.md
Notifications You must be signed in to change notification settings

AlexandreBurel/pwiz-mzdb

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

pwiz-mzdb

An extension of the ProteoWizard framework enabling the support of the mzDB format.

For details about mzDB concepts (scanSlice, runSlice...) and specifications, have a look to the related repository.

Project contributors

Roadmap

Current stable version is 0.9.9.

Work in progress

New features:

  • MS-Numpress compression algorithm support
  • Integration of the project with existing msconvert tool
  • add an option to filter spectra upon retention time

Improvements

  • add FK constraints
  • replace blobs with vectors
  • replace table bounding_blo_msn_rtree with table msn_layer
  • update proteowizard libraries ?

Bug fixes:

  • check MS3 analyses
  • add missing CvTerms (not present in Pwiz Msdata object, neither in converted mzML files)
  • see issues for more informations

Release 0.9.9

New features:

  • FITTED mode is fully functional for Thermo, AB Sciex and Bruker analysis
  • Safe mode added : fall back to centroid if requested mode is not possible (ie. centroid -> profile)
  • --cycles option in the command line to convert a subset of the input file
  • Build number is added
  • add an "--log" option to write logs to a file and/or to the console

Improvements

  • Using QTofPeakpicker algorithm for AB Sciex data
  • Added a summary at the end of the conversion
  • --dia option has been replaced by -a or --acquisition option, user can tell if the analysis is DDA, DIA or let the converter determine it
  • Better input and output file verification (convert AB Sciex data by calling .wiff or .wiff.scan files, convert Bruker data by calling .d directory)

Bug fixes:

  • Wrong data peak count
  • Algorithm to check DDA/DIA is now working on Thermo, AB Sciex and Bruker analysis
  • mzML file support is improved
  • fixed encoding issue with low resolution spectra
  • fixed encoding issue with NO_LOSS option
  • see issues for more informations

Release 0.9.8

New features:

  • AbSciex (.WIFF) files support
  • Bruker (.d) files support
  • --dia option in the command line to force DIA file creation
  • --ignore-error option to force conversion even if error occured (CRT translation to C++ exceptions makes the converter very slow)

Improvements

  • reduced time of spectrum table loading (table records stored at the end of the file)
  • improvements in exception catching
  • new columns 'mz_precision' ansd 'intensity_precision' in data-encoding table (instead of param-tree)
  • insert only used data-encoding
  • update proteowizard to the latest

Bug fixes:

  • Wrong encoding for HCD spectra (32 instead of 64 bits)
  • See fixed issues for more information

Users

Convert vendor raw files into mzDB files

Download and setup

Command line usage

Open a command line window in the directory containing raw2mzdb.exe then type:

raw2mzdb.exe -i <rawfilename> -o <outputfilename>

By defaut, the raw file will be converted in the "fitted" mode for the MS1 (MS2 is often in centroid mode and can not be converted in fitted mode). If the MS2 (or superior) are acquired in high resolution (i.e in profile mode), you could specify that you want to convert specific MS levels in the required mode:

raw2mzdb.exe -i <rawfilename> -o <outputfilename> -f 1-2 will try to convert MS and MS/MS spectra in fitted mode.

There are two other available conversion modes:

  • "profile", the command line is then: raw2mzdb.exe -i <rawfilename> -o <outputfilename> -p 1 (means you want profile mode for MS1, other MS levels will be stored as they were stored in the raw file)
  • "centroid" : raw2mzdb.exe -i <rawfilename> -o <outputfilename> -c 1 (means you want centroid mode for MS1, other MS levels will be stored as they were stored in the raw file)

Complete list of parameters:

usage: raw2mzDB.exe --input filename <parameters>

Options:

	-i, --input : specify the input rawfile path
	-o, --output : specify the output filename (must be an absolute path)
	-c, --centroid : centroidization, eg: -c 1 (centroidization msLevel 1) or -c 1-5 (centroidization msLevel 1 to msLevel 5) 
	-p, --profile : idem but for profile mode 
	-f, --fitted : idem buf for fitted mode 
	-T, --bbTimeWidth : bounding box width for ms1 in seconds, default: 15s
	-t, --bbTimeWidthMSn : bounding box width for ms > 1 in seconds, default: 0s
	-M, --bbMzWidth : bounding box height for ms1 in Da, default: 5Da 
	-m, --bbMzWidthMSn : bounding box height for msn in Da, default: 10000Da 
	-a, --acquisition : dda, dia or auto (converter will try to determine if the analysis is DIA or DDA), default: auto
	--no_loss : if present, leads to 64 bits conversion of mz and intenstites (larger ouput file)
 	--cycles : only convert the selected range of cycles, eg: 1-10 (first ten cycles) or 10- (from cycle 10 to the end) ; using this option will disable progress information
	-s, --safe_mode : use centroid mode if the requested mode is not available
    --log : console, file or both (log file will be put in the same directory as the output file), default: console
	-h --help : show help

Developers

Build from command line

Recent ongoing developement where only tested on Windows using MSVC 2010 Ultimate version. Compilation on Linux may require some code corrections for the moment. We plan to be cross-platform very soon.

After installing Visual Studio, check following points :

  • Visual Studio path is added to system environment path : C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin
  • If you are using 64-bit operating system : allow the cross compilation :
    • open commandline : Win+R, type cmd
    • go to Microsoft Visual Studio 10.0\VC : cd C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC
    • execute vcvarsall.bat x86_amd64 , you should have Setting environment for using Microsoft Visual Studio 2010 x64 cross tools. message.

In order to build with bjam:

  • Unzip pwiz-mzdb-lib.zip file (containing project dependencies as static compiled libraries) located in project_root/pwiz_mzdb/mzdb/lib directory. (You can also download it here if not exist)
  • Then run the script raw2mzDB_quickbuild.bat from the project root
  • Or else, run the following command from the project root:
    quickbuild -j8 address-model=64 pwiz_mzdb --i-agree-to-the-vendor-licenses --incremental
  • --incremental is not mandatory but it speeds up the compilation process

raw2mzdb.exe file is generated in : project_root/pwiz_mzdb/target

Project structure

See wiki

Project dependencies(headers already included in the root directory)

IDE setup

  • Visual Studio: not very well tested.
  • QtCreator: importing project with existing sources (from the menu), will provide decent code completion.

HOW TO ?

Iterate through spectra

To iterate over all spectra, simply do the following:

//build a mzdbfile
MzDBFile mzdb(filename);

mzDBReader reader(mzdb); //build a mzdbreader object
MSData msdata; // build empty Pwiz msdata object

// the following will build a custom SpectrumList, ready for iteration
reader.readMzDB(msdata);

SpectrumListPtr sl = msdata.run.spectrumListPtr; // fetch spectrumList
for (size_t i=0; i < sl.size(); ++i) {
	// fetch spectrum, second argument is for getting or not (i.e. fetch only metadata) 
	// spectrum data points, it has no effect on the actual implementation, always 
	// return a spectrum with mz/intensity arrays
	SpectrumPtr s = sl.spectrum(i, true);
	
	//...do something else
}

Warning: this is not suitable for accessing only one random spectrum. User may use the 'getSpectrum' function instead.

Iterate through run slices

Not yet implemented. You can only extract one runSlice at a time for the moment:

MzDBFile mzdb(filename);

mzDBReader reader(mzdb); //build a mzdbreader object
MSData msdata; // build empty Pwiz msdata object

reader.readMzDB(msdata);
vector<mzScan*> results; // mzScan is a simple object containing vector members 'mz' and 'intensities'
reader.extractRunSlice(mzmin, mzmax, msLevel, results);

This feature is already implemented in the java reader mzDBAccess

Query LC-MS DDA/DIA data using R*Tree queries

To extract region using R*Tree:

MzDBFile mzdb(filename);

mzDBReader reader(mzdb); //build a mzdbreader object
MSData msdata; // build empty Pwiz msdata object

reader.readMzDB(msdata);
vector<mzScan*> results; // mzScan is a simple object containing vector members 'mz' and 'intensities'

reader.extractRegion(mzmin, mzmax, rtmin, rtmax, msLevel, results);

Specifying a msLevel=1 will extract region using spectra acquired in mslevel=1, suitable for DDA analysis. Otherwise, it will request the msn R*Tree suitable to perform DIA analysis.

About

An extension of the ProteoWizard framework enabling the support of the mzDB format

Resources

License

Apache-2.0, Apache-2.0 licenses found

Licenses found

Apache-2.0
LICENSE
Apache-2.0
LICENSE.md

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 59.5%
  • C 35.9%
  • C# 3.1%
  • Python 0.8%
  • Objective-C 0.4%
  • Shell 0.3%