Nowadays, most cytometrists apply lossless compression by storing their FCS files in ZIP archives. Unfortunately, ZIP only achieves modest space savings in cytometric data, due to DEFLATE being used as the underlying lossless compression algorithm (LCA). Presumably, other modern LCA can outperform DEFLATE, especially in terms of space savings. Twenty-one codecs (programs implementing LCA) were evaluated in 167,131 publicly available FCS files. Within floating-point data, as produced by modern instruments, most favorable compression ratios (CRs) were achieved by ZPAQ (median 0.469), BCM (median 0.523), and LZMA (median 0.545). In comparison, the DEFLATE-based codecs only achieved median CR of 0.728 under the most optimal conditions. By default, ZIP offers nine compression level (CL) settings, where lower ZIP-CL optimizes for time efficiency, while higher ZIP-CL optimizes for space efficiency. Interestingly, the third ZIP-CL already resulted in near optimal CR in 90% of the files with floating-point data, as produced by digital cytometers. LZMA is well established, widely supported, and actively maintained (in sharp contrast to ZPAQ and BCM) and therefore arguably the most attractive alternative for ZIP. Within floating-point data, by shifting from ZIP (under optimal conditions) to LZMA (at default settings), the median CR can be improved by 25%. Based on our results, cytometrists can benefit from state-of-the-art compression by choosing the appropriate codec for their situation. Our results are likely to speed-up the adaptation of modern codecs, as CR around 0.5 were beyond all expectations, and such space savings will benefit the field of cytometry.

, , , , , , ,
doi.org/10.1002/cyto.a.23879, hdl.handle.net/1765/119317
VSNU Open Access deal
Cytometry. Part A
Department of Immunology

Bras, A., & van der Velden, V. (2019). Lossless Compression of Cytometric Data. Cytometry. Part A. doi:10.1002/cyto.a.23879