pwiz-mzdb

An extension of the ProteoWizard framework enabling the support of the mzDB format.

For details about mzDB concepts (scanSlice, runSlice...) and specifications, have a look to the related repository.

Project contributors

Roadmap

Current stable version is 0.9.9.

Work in progress

New features:

MS-Numpress compression algorithm support
Integration of the project with existing msconvert tool
add an option to filter spectra upon retention time

Improvements

add FK constraints
replace blobs with vectors
replace table bounding_blo_msn_rtree with table msn_layer
update proteowizard libraries ?

Bug fixes:

check MS3 analyses
~~add missing CvTerms~~ (not present in Pwiz Msdata object, neither in converted mzML files)
see issues for more informations

Release 0.9.9

New features:

FITTED mode is fully functional for Thermo, AB Sciex and Bruker analysis
Safe mode added : fall back to centroid if requested mode is not possible (ie. centroid -> profile)
--cycles option in the command line to convert a subset of the input file
Build number is added
add an "--log" option to write logs to a file and/or to the console

Improvements

Using QTofPeakpicker algorithm for AB Sciex data
Added a summary at the end of the conversion
--dia option has been replaced by -a or --acquisition option, user can tell if the analysis is DDA, DIA or let the converter determine it
Better input and output file verification (convert AB Sciex data by calling .wiff or .wiff.scan files, convert Bruker data by calling .d directory)

Bug fixes:

Wrong data peak count
Algorithm to check DDA/DIA is now working on Thermo, AB Sciex and Bruker analysis
mzML file support is improved
fixed encoding issue with low resolution spectra
fixed encoding issue with NO_LOSS option
see issues for more informations

Release 0.9.8

New features:

AbSciex (.WIFF) files support
Bruker (.d) files support
--dia option in the command line to force DIA file creation
~~--ignore-error option to force conversion even if error occured~~ (CRT translation to C++ exceptions makes the converter very slow)

Improvements

reduced time of spectrum table loading (table records stored at the end of the file)
improvements in exception catching
new columns 'mz_precision' ansd 'intensity_precision' in data-encoding table (instead of param-tree)
insert only used data-encoding
update proteowizard to the latest

Bug fixes:

Wrong encoding for HCD spectra (32 instead of 64 bits)
See fixed issues for more information

Users

Convert vendor raw files into mzDB files

Download and setup

Download the zip archive
Raw2mzDB has the same requirements as ProteoWizard, otherwise install the following: .NET Framework 3.5 SP1, .NET Framework 4.0, MSVC 2008 SP1 (x86), MSVC 2012, MSVC 2013 (http://proteowizard.sourceforge.net/user_installation_simple.shtml)

Command line usage

Open a command line window in the directory containing raw2mzdb.exe then type:

raw2mzdb.exe -i <rawfilename> -o <outputfilename>

By defaut, the raw file will be converted in the "fitted" mode for the MS1 (MS2 is often in centroid mode and can not be converted in fitted mode). If the MS2 (or superior) are acquired in high resolution (i.e in profile mode), you could specify that you want to convert specific MS levels in the required mode:

raw2mzdb.exe -i <rawfilename> -o <outputfilename> -f 1-2 will try to convert MS and MS/MS spectra in fitted mode.

There are two other available conversion modes:

"profile", the command line is then: raw2mzdb.exe -i <rawfilename> -o <outputfilename> -p 1 (means you want profile mode for MS1, other MS levels will be stored as they were stored in the raw file)
"centroid" : raw2mzdb.exe -i <rawfilename> -o <outputfilename> -c 1 (means you want centroid mode for MS1, other MS levels will be stored as they were stored in the raw file)

Complete list of parameters:

usage: raw2mzDB.exe --input filename <parameters>

Options:

	-i, --input : specify the input rawfile path
	-o, --output : specify the output filename (must be an absolute path)
	-c, --centroid : centroidization, eg: -c 1 (centroidization msLevel 1) or -c 1-5 (centroidization msLevel 1 to msLevel 5) 
	-p, --profile : idem but for profile mode 
	-f, --fitted : idem buf for fitted mode 
	-T, --bbTimeWidth : bounding box width for ms1 in seconds, default: 15s
	-t, --bbTimeWidthMSn : bounding box width for ms > 1 in seconds, default: 0s
	-M, --bbMzWidth : bounding box height for ms1 in Da, default: 5Da 
	-m, --bbMzWidthMSn : bounding box height for msn in Da, default: 10000Da 
	-a, --acquisition : dda, dia or auto (converter will try to determine if the analysis is DIA or DDA), default: auto
	--no_loss : if present, leads to 64 bits conversion of mz and intenstites (larger ouput file)
 	--cycles : only convert the selected range of cycles, eg: 1-10 (first ten cycles) or 10- (from cycle 10 to the end) ; using this option will disable progress information
	-s, --safe_mode : use centroid mode if the requested mode is not available
    --log : console, file or both (log file will be put in the same directory as the output file), default: console
	-h --help : show help

Developers

Build from command line

Recent ongoing developement where only tested on Windows using MSVC 2010 Ultimate version. Compilation on Linux may require some code corrections for the moment. We plan to be cross-platform very soon.

After installing Visual Studio, check following points :

Visual Studio path is added to system environment path : C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin
If you are using 64-bit operating system : allow the cross compilation :
- open commandline : Win+R, type cmd
- go to Microsoft Visual Studio 10.0\VC : cd C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC
- execute vcvarsall.bat x86_amd64 , you should have Setting environment for using Microsoft Visual Studio 2010 x64 cross tools. message.

In order to build with bjam:

Unzip pwiz-mzdb-lib.zip file (containing project dependencies as static compiled libraries) located in project_root/pwiz_mzdb/mzdb/lib directory. (You can also download it here if not exist)
Then run the script raw2mzDB_quickbuild.bat from the project root
Or else, run the following command from the project root:
quickbuild -j8 address-model=64 pwiz_mzdb --i-agree-to-the-vendor-licenses --incremental
--incremental is not mandatory but it speeds up the compilation process

raw2mzdb.exe file is generated in : project_root/pwiz_mzdb/target

Project structure

See wiki

Project dependencies(headers already included in the root directory)

PWIZ of course
Ceres
Eigen
glog

IDE setup

Visual Studio: not very well tested.
QtCreator: importing project with existing sources (from the menu), will provide decent code completion.

HOW TO ?

Iterate through spectra

To iterate over all spectra, simply do the following:

//build a mzdbfile
MzDBFile mzdb(filename);

mzDBReader reader(mzdb); //build a mzdbreader object
MSData msdata; // build empty Pwiz msdata object

// the following will build a custom SpectrumList, ready for iteration
reader.readMzDB(msdata);

SpectrumListPtr sl = msdata.run.spectrumListPtr; // fetch spectrumList
for (size_t i=0; i < sl.size(); ++i) {
	// fetch spectrum, second argument is for getting or not (i.e. fetch only metadata) 
	// spectrum data points, it has no effect on the actual implementation, always 
	// return a spectrum with mz/intensity arrays
	SpectrumPtr s = sl.spectrum(i, true);
	
	//...do something else
}

Warning: this is not suitable for accessing only one random spectrum. User may use the 'getSpectrum' function instead.

Iterate through run slices

Not yet implemented. You can only extract one runSlice at a time for the moment:

MzDBFile mzdb(filename);

mzDBReader reader(mzdb); //build a mzdbreader object
MSData msdata; // build empty Pwiz msdata object

reader.readMzDB(msdata);
vector<mzScan*> results; // mzScan is a simple object containing vector members 'mz' and 'intensities'
reader.extractRunSlice(mzmin, mzmax, msLevel, results);

This feature is already implemented in the java reader mzDBAccess

Query LC-MS DDA/DIA data using R*Tree queries

To extract region using R*Tree:

MzDBFile mzdb(filename);

mzDBReader reader(mzdb); //build a mzdbreader object
MSData msdata; // build empty Pwiz msdata object

reader.readMzDB(msdata);
vector<mzScan*> results; // mzScan is a simple object containing vector members 'mz' and 'intensities'

reader.extractRegion(mzmin, mzmax, rtmin, rtmax, msLevel, results);

Specifying a msLevel=1 will extract region using spectra acquired in mslevel=1, suitable for DDA analysis. Otherwise, it will request the msn R*Tree suitable to perform DIA analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
Eigen		Eigen
ceres		ceres
doc		doc
example_data		example_data
glog		glog
libraries		libraries
pwiz		pwiz
pwiz_aux		pwiz_aux
pwiz_mzdb		pwiz_mzdb
pwiz_tools		pwiz_tools
scripts		scripts
Changelog.txt		Changelog.txt
Jamroot.jam		Jamroot.jam
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NOTICE		NOTICE
README.md		README.md
SUBSET		SUBSET
clean.bat		clean.bat
clean.sh		clean.sh
quickbuild.bat		quickbuild.bat
quickbuild.sh		quickbuild.sh
raw2mzDB_quickbuild.bat		raw2mzDB_quickbuild.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

pwiz-mzdb

Project contributors

Roadmap

Work in progress

Release 0.9.9

Release 0.9.8

Users

Convert vendor raw files into mzDB files

Download and setup

Command line usage

Developers

Build from command line

Project structure

Project dependencies(headers already included in the root directory)

IDE setup

HOW TO ?

Iterate through spectra

Iterate through run slices

Query LC-MS DDA/DIA data using R*Tree queries

About

Licenses found

Releases

Packages

Languages

License

Licenses found

AlexandreBurel/pwiz-mzdb

Folders and files

Latest commit

History

Repository files navigation

pwiz-mzdb

Project contributors

Roadmap

Work in progress

Release 0.9.9

Release 0.9.8

Users

Convert vendor raw files into mzDB files

Download and setup

Command line usage

Developers

Build from command line

Project structure

Project dependencies(headers already included in the root directory)

IDE setup

HOW TO ?

Iterate through spectra

Iterate through run slices

Query LC-MS DDA/DIA data using R*Tree queries

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages