Home/Blog/Programming/Entropy Coding: Everything You Need to Know
Home/Blog/Programming/Entropy Coding: Everything You Need to Know

Entropy Coding: Everything You Need to Know

9 min read
Jan 22, 2024
content
What Is Entropy Coding?
Example
The Huffman Coding Scheme
Performance and Efficiency Considerations
Impact of Data Characteristics on Entropy Coding
Importance of Entropy Coding in Data Compression

Become a Software Engineer in Months, Not Years

From your first line of code, to your first day on the job — Educative has you covered. Join 2M+ developers learning in-demand programming skills.

What Is Entropy Coding?#

In this blog, we’ll take a detailed look at entropy coding, an important technique to compress or efficiently represent data without any loss. To understand the term entropy, we first need to understand a few concepts in this context.

When we’re dealing with data, there is an associated source and a set of possible symbols that the source can generate. For example, think of a system that generates alphanumeric messages or strings like email messages. Another example is a system that generates numbers in the range 02550−255, like image or video data. A similar system is shown in the figure on the right that can generate values in the range 2122−12.

Let’s represent the set of symbols by XSX_S associated with a system SS. The system SS generates some symbols more frequently than others. There is also a probability associated with each symbol. A symbol generated more frequently by the system has a higher probability, is easier to predict, and has a low amount of information. On the contrary, a symbol generated rarely has a very small probability, is harder to predict, and contains a high amount of information. The entropy of a symbol refers to the unpredictability of that symbol or the amount of information it contains. This means that the amount of information or the entropy of a symbol is inversely proportional to the probability of occurrence of that symbol. Let the probability of occurrence of the symbol xx be p(x)p(x), then the amount of information of this symbol, I(x)I(x), can be defined as follows:

We normally use the logarithm to adjust the scale. The base of the logarithm is taken to be two so that the information is represented in terms of bits. Therefore:

The entropy of a system refers to the expected unpredictability or the expected amount of information of a symbol present in the system. It is normally calculated in bits and represents the average number of bits required to store a symbol of the system. The entropy of the system SS with symbol set XSX_S is represented by H(S)H(S). It can be computed by taking the probability-wise weighted sum of all the symbols in XSX_S as follows:

If every symbol of the system is encoded using codes of a fixed length, we call it a fixed-length coding scheme. On the contrary, if the code lengths of different symbols are different, such a coding scheme is called a variable-length coding scheme. Entropy coding is a generic term that refers to any variable-length coding technique that uses shorter codes for more frequent symbols and longer codes for less frequent symbols. The average bits per symbol are at least as much as the system’s entropy through any coding scheme. Entropy, in that sense, is the lower bound on the average bits per symbol by any coding scheme. Let’s look at an example to further understand these concepts.

Example#

Let’s consider a system SS that generates symbols from the symbol set, XS={a,b,c,d,e,f}X_S = \{a, b, c, d, e, f\}, with the following probability distribution:

Probability Distribution

Symbol

Probability

a

0.130

b

0.185

c

0.165

d

0.150

e

0.250

f

0.120

The entropy of this system is computed as follows:

This means that, on average, 2.542.54 bits per symbol are required to store information generated by this system. In other words this system has, on average, 2.542.54 bits of information per symbol that we call the entropy of this system. Any coding scheme will require, on average, at least these many bits per symbol.

The Huffman Coding Scheme#

The Huffman coding scheme is a basic, widely-used entropy coding scheme. Another extensively used entropy coding scheme is the arithmetic coding scheme. The most famous variants of the arithmetic coding scheme are binary arithmetic coding (BAC) and context adaptive binary arithmetic coding (CABAC).

Huffman coding works by applying the following two steps repeatedly to form a binary tree:

1- Sorting the symbols by their probabilities

2- Merging two symbols with the smallest probabilities to make one (merged working) symbol with a probability equal to the sum of probabilities of the merged symbols

These two steps are repeated until we get the sum of all the probabilities, that is 1.01.0. The merged symbol with 1.01.0 probability is the root node of the binary tree, and the original symbols make the leaf nodes of the tree.

Using the same system SS discussed earlier in an example above, let’s see how Huffman coding works:

The construction of the Huffman tree
The construction of the Huffman tree

In the figure above, the first column shows the symbols and their probabilities. In the second column, the symbols are sorted by their probabilities. In the third column, two symbols with the least probabilities are merged, and the numbers are sorted again. The arrows relate the new position with the old position of the numbers. In the subsequent columns, merge and sort steps are repeated till we get 1.01.0 shown in the last column. This process has successfully made a binary tree. The binary tree made is shown in the following figure after detangling the edges. The internal nodes that represent the merged symbols are shown in green, while leaf nodes with the original symbols are shown in blue.

A labeled Huffman tree
A labeled Huffman tree

To get the prefix codes for each symbol, we observe the unique path of each leaf node from the root. No such path can be a prefix of another path. There are two outgoing arrows from every green node. As a matter of convention, we label the arrow going to the left child with a 11 and the arrow to the right child with a 00. It doesn’t matter if we label the arrow going to the left child with a 00 or a 11, but it should be different from the label of the arrow going to the right child. By using the stated convention to label the edges of the tree, the paths from the root to each symbol are as follows:

Huffman Codes

Symbol

Code

(From the root, Left = 1, Right = 0)

Length of the code

Probability of the symbol

a

011

3

0.130

b

00

2

0.185

c

111

3

0.165

d

110

3

0.150

e

10

2

0.250

f

010

3

0.120

Performance and Efficiency Considerations#

In an encoding mechanism, we assign a code to each symbol and write it to a file. Similarly, a decoding mechanism reads codes from the encoded file and determines the corresponding symbol. Let’s consider the example above. If we assign fixed-length codes, they will be three bits long, and the average bits per symbol will also be the same. However, in the case of the Huffman codes, the average bits per symbol will be the following:

We can measure the performance of the Huffman codes compared to fixed-length coding in two ways. Let’s look at both one by one:

This tells us that if fixed-length coding takes 100100 bits to store some information, the Huffman coding scheme will take only 85.585.5 bits to store the same information. This means the Huffman coding scheme will save us 14.514.5 bits for every 100100 bits spent by the fixed-length coding scheme. Let’s look at another way to explain the same point.

This tells us that if Huffman coding takes a 100100 bits to store some information, fixed-length coding will take 116.96116.96 bits to store the same information. Fixed-length coding will take an extra 16.9616.96 bits for every 100100 bits spent by the Huffman coding scheme.

As the entropy of the system gives the lower bound for representing information, we can compute the efficiency of the Huffman coding scheme by comparing it with the entropy of the system. The efficiency of fixed-length coding can also be computed similarly. Let’s calculate it using the same example:

This tells us that for every 100100 bits used by the Huffman coding scheme, more than 9999 bits are used to represent information, and only less than one bit is made redundant. In contrast, for every 100100 bits, the fixed-length coding scheme only spends 84.6784.67 bits to represent information, and the rest of the bits are made redundant.

Impact of Data Characteristics on Entropy Coding#

If there are kk symbols in the system and kk is not an exact power of 22, we need log2k\lceil\log_2 k \rceil bits for fixed-length coding. For example, if we take k=24k=24, then log2k=5\lceil\log_2 k \rceil = 5. By using 55 bits, the possible codes are 3232, of which we only use 2424. This means that 88 codes will remain unused. Let’s consider that kk is an exact power of 22 and no code is unused: Should we still use Huffman coding? The answer is yes because one symbol is more frequent than others in the system. But will using Huffman coding be beneficial? Assume that kk is the exact power of 22 and that all the symbols are equally probable. In that case, the Huffman tree will be a perfect binary tree, which means that all the leaf nodes will be at the same level, and every non-leaf node will have exactly two children. As every leaf node will be at the same distance from the root node, the code for every symbol would be the same length. For this scenario, using Huffman coding is the same as using fixed-length coding.

Importance of Entropy Coding in Data Compression#

Mostly, the symbols in the data have different frequency of occurrence, which favors entropy coding over the fixed-length coding scheme. Entropy coding provides a way to represent data in a more compact way as compared to the fixed-length coding scheme. This makes it a useful loss-less compression technique. All the famous image and video compression standards, like JPEG, MPEG, and H.26x, use entropy coding as a last step before generating the compressed output. Entropy coding is considered a fundamental data compression tool if the probability distribution of the source symbols is known.

The variable length codes used by any entropy coding scheme are prefix codes. This means that no code is a prefix of another code, a property important for decoding without any ambiguity.

To understand the basic concepts of computer systems and data compression, you may want to experience the following interactive course:

Information Representation in Computer Systems

Cover
Information Representation in Computer Systems

Computer systems do not understand human instructions, nor can they perceive real-life data in its raw form. Therefore, computer systems require a way to store and represent information in an accessible way. They use software and hardware components combined to help retrieve information, store it, manipulate it, and convert it back to a human-accessible format. If you want to learn how computer systems perform complex tasks such as storing and manipulating textual data, videos, music, images of your cat, and much more, this course is meant for you.

3hrs
Beginner
25 Challenges
9 Quizzes

To dive deeper and have experience implementing ideas regarding data compression, you may like to visit the following hands-on projects:

Image Compression Through Subsampling and Interpolation

Cover
Image Compression Through Subsampling and Interpolation

Learn to perform lossy compression of a colored image by subsampling its color space without a noticeable visual quality difference.

1hr
Intermediate

Image Compression Using DCT with Graceful Quality Degradation

Cover
Image Compression Using DCT with Graceful Quality Degradation

In this project, we’ll learn to perform lossy compression of a grayscale image using the Discrete Cosine Transform (DCT) with a graceful compromise on visual quality.

1hr
Advanced

Frequently Asked Questions

What is an example of entropy coding?

Entropy coding functions on a conceptual level. This enables it to work with any group of symbols, provided the probability information for each symbol is available. Take, for example, an integer-valued scalar x, which can assume values of -1, 0, and +1, with corresponding probabilities of 0.25, 0.5, and 0.25, respectively.


Written By:
Laeeq Aslam
Join 2.5 million developers at
Explore the catalog

Free Resources