-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for mzMLb #113
base: master
Are you sure you want to change the base?
Conversation
I appreciate the PR! Unfortunately relying on an external library to be installed on the system is a non-starter. Not requiring any external dependencies (e.g. having a fully portable binary) is one of the primary goals of Sage. |
I somehow missed the part about static linking - in that case, it could be possible. How big are the binaries? Is there actually any benefit to adding mzMLb support? Is anyone actually using the format? Do other tools support it? If enough people actually are using mzMLb, I will merge it in... but I am loathe to support yet another XML based format and complicate the build process unless it will actually get used by more than 1 person. |
At the moment, we don't know how many people are using mzMLb since it's a bit of a chicken vs. egg problem. If tools don't support it, then nobody would bother using it, and if nobody uses it tools won't bother supporting it. There's a forthcoming Comet release with mzMLb support in MSToolkit, ProteoWizard supports it so presumably Skyline supports it. If Sage supports it, then there will be more motivation for users who aren't just deleting their mzML files after running a search against them to convert to mzMLb instead. Any new format has to overcome the inertia of existing formats to get people to use it. If you still don't want to add the optional dependency to your codebase, that's fine too. Getting the build time dependencies working on CI can be aggravating, especially on Windows because There's a poll going around discussing a community-driven design of a new format at https://qualtricsxm5l7r46fqr.qualtrics.com/jfe/form/SV_esUoW2IDzZalb6u and any feedback about what features it needs to have would be appreciated. If you're still working on |
Even 10 MB isn't bad - just want to make sure it's not ballooning to hundreds of MB. Let me test out the fork with mzMLb support and see how it works. I think I am most likely the person that compiles sage the most - and I regularly swap between windows, macOS and linux just to make my life more difficult. I will personally probably never use mzMLb and don't particularly feel like having to install vcpkg et al just to compile sage. But this shouldn't be a gating factor for the project if other people will use it - perhaps we could gate it behind a compile-time feature flag that is only turned on during GH Actions or if users want to manually. I think this is what you've done, right? |
That's correct. The |
There's a detail I missed about the linking in |
I've updated the PR to build the |
This PR adds a new optional dependency on mzdata to read the mzMLb file format.
mzMLb support was recently added to ProteoWizard, and it provides fast random access to compressed data, and comparable sequential access speed to mzML.
Building with the mzMLb feature statically links the HDF5 library with the binary, so there is no impact on the end-user beyond the increased binary size, but it does require that the HDF5 library is available at build time. The tests were done with
libhdf5
10.3 which can be gotten via the system package manager on Linux systems and usingvcpkg
on Windows.