Compression efficiency: some mmCIF files are smaller than their BinaryCIF counterparts when not gzipped #9

papillot · 2024-04-29T12:37:38Z

This is a question rather than an issue report.
I have downloaded the BinaryCIF file for structure 5z6y from RCSB and the corresponding mmCIF file.

format	size	gzipped
mmCIF	227kB	58kB
BinaryCIF	269kB	32 kB
mmtf	24kB	17kB

What surprises me is that the BinaryCIF file takes more space than the mmCIF file, even if most of the information is contained in the atom_site table which should be amenable to efficient compression.
This seems to contradict the claims of the original BinaryCIF publication.

I am wondering if there is an issue with the current implementation of the format which would use less efficient compression techniques?

The text was updated successfully, but these errors were encountered:

dsehnal · 2024-10-14T12:09:18Z

Hi, sorry for the late reply, I've only noticed it now.

It is indeed the case that for uncompressed files the bcif size can be larger due to extra metatada present for each category (archive CIFs have a lot of categories). The benefit is that the files are more amenable for compression due to the format being column instead of row based.
The difference becomes much more noticeable for large structures, for examples 3j3q ~230MB vs 30MB uncompressed and ~40 vs 12MB compressed.
MMTF stores much smaller subset of the data than mmCIF and also has lower coordinate precision by default. It is possible to configure the BinaryCIF encoder to use reduced precision for coordinates as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compression efficiency: some mmCIF files are smaller than their BinaryCIF counterparts when not gzipped #9

Compression efficiency: some mmCIF files are smaller than their BinaryCIF counterparts when not gzipped #9

papillot commented Apr 29, 2024

dsehnal commented Oct 14, 2024

Compression efficiency: some mmCIF files are smaller than their BinaryCIF counterparts when not gzipped #9

Compression efficiency: some mmCIF files are smaller than their BinaryCIF counterparts when not gzipped #9

Comments

papillot commented Apr 29, 2024

dsehnal commented Oct 14, 2024