This tells us that for every 100 bits used by the Huffman coding scheme, more than 99 bits are used to represent information, and only less than one bit is made redundant. In contrast, for every 100 bits, the fixed-length coding scheme only spends 84.67 bits to represent information, and the rest of the bits are made redundant.
Impact of Data Characteristics on Entropy Coding#
If there are k symbols in the system and k is not an exact power of 2, we need ⌈log2k⌉ bits for fixed-length coding. For example, if we take k=24, then ⌈log2k⌉=5. By using 5 bits, the possible codes are 32, of which we only use 24. This means that 8 codes will remain unused. Let’s consider that k is an exact power of 2 and no code is unused: Should we still use Huffman coding? The answer is yes because one symbol is more frequent than others in the system. But will using Huffman coding be beneficial? Assume that k is the exact power of 2 and that all the symbols are equally probable. In that case, the Huffman tree will be a perfect binary tree, which means that all the leaf nodes will be at the same level, and every non-leaf node will have exactly two children. As every leaf node will be at the same distance from the root node, the code for every symbol would be the same length. For this scenario, using Huffman coding is the same as using fixed-length coding.
Importance of Entropy Coding in Data Compression#
Mostly, the symbols in the data have different frequency of occurrence, which favors entropy coding over the fixed-length coding scheme. Entropy coding provides a way to represent data in a more compact way as compared to the fixed-length coding scheme. This makes it a useful loss-less compression technique. All the famous image and video compression standards, like JPEG, MPEG, and H.26x, use entropy coding as a last step before generating the compressed output. Entropy coding is considered a fundamental data compression tool if the probability distribution of the source symbols is known.
The variable length codes used by any entropy coding scheme are prefix codes. This means that no code is a prefix of another code, a property important for decoding without any ambiguity.
To understand the basic concepts of computer systems and data compression, you may want to experience the following interactive course: