What’s lossless data compression?

Print anything with Printful



Lossless data compression reduces file size without losing information. Common file types include zip and gzip archives, as well as image formats like GIF and PNG. Algorithms vary based on file type, with statistical and mapping models used. Some algorithms are open source, while others are proprietary. Mixed file archiving can degrade components, and efforts are underway to create universal compression methods.

Lossless data compression is a computerized method of archiving files and combining them into archives that takes up less physical space in memory than the files would otherwise do without losing any information that the data contains in the process. Lossy compression, in contrast, reduces the file size with approximations of the data, and the recovery is similar to the facsimile of the original file contents. The algorithms used for lossless data compression are essentially a set of simplified rules or instructions for encoding information using fewer bits of memory while retaining the ability to restore data in its original format without alteration.

Some common file types that use lossless data compression include IBM (International Business Machines) computer-based zip and Unix computer-based gzip file archives. Image file formats such as Graphic Interchange Format (GIF), Portable Network Graphics (PNG), and Bitmap (BMP) files are also used. Data compression algorithms also vary based on the type of file being compressed, with variations common for text, audio, and executable program files.

The two main categories of algorithms for lossless data compression are based on a statistical model of the input data and a mapping model of bit strings into a data file. The routine statistical algorithms used are the Burrows-Wheeler transform (BWT), the algorithm of Abraham Lempel and Jacob Ziv (LZ77) published in 1977 and the Prediction Partial Matching (PPM) method. Frequently employed mapping algorithms include the Huffman coding algorithm and arithmetic coding.

Some of the algorithms are open source tools and others are proprietary and patented, although the patents on some have also expired. This can result in applying compression methods to the wrong file format. Due to the fact that some data compression methods are incompatible with each other, mixed file archiving can often degrade a component of a file. For example, an image file with compressed text may show degraded text readability when restored. Scanners and software that use grammar induction can extract meaning from text stored alongside image files by applying so-called latent semantic analysis (LSA).

Another form of mapping algorithm method for lossless data compression is the use of the universal code. More flexible to use than Huffman coding, it doesn’t require advance knowledge of maximum integer values. However, Huffman coding and arithmetic coding produce better data compression rates. Efforts are also underway to produce universal data compression methods that create algorithms that work well for a variety of sources.




Protect your devices with Threat Protection by NordVPN


Skip to content