What is an example of entropy coding?

Entropy coding functions on a conceptual level. This enables it to work with any group of symbols, provided the probability information for each symbol is available. Take, for example, an integer-valued scalar x, which can assume values of -1, 0, and +1, with corresponding probabilities of 0.25, 0.5, and 0.25, respectively.

Home/Blog/Programming/Entropy Coding: Everything You Need to Know

Entropy Coding: Everything You Need to Know

9 min read

Jan 22, 2024

content

What Is Entropy Coding?

Example

The Huffman Coding Scheme

Performance and Efficiency Considerations

Impact of Data Characteristics on Entropy Coding

Importance of Entropy Coding in Data Compression

Become a Software Engineer in Months, Not Years

From your first line of code, to your first day on the job — Educative has you covered. Join 2M+ developers learning in-demand programming skills.

Let’s represent the set of symbols by $X_S$ associated with a system $S$ . The system $S$ generates some symbols more frequently than others. There is also a probability associated with each symbol. A symbol generated more frequently by the system has a higher probability, is easier to predict, and has a low amount of information. On the contrary, a symbol generated rarely has a very small probability, is harder to predict, and contains a high amount of information. The entropy of a symbol refers to the unpredictability of that symbol or the amount of information it contains. This means that the amount of information or the entropy of a symbol is inversely proportional to the probability of occurrence of that symbol. Let the probability of occurrence of the symbol $x$ be $p(x)$ , then the amount of information of this symbol, $I(x)$ , can be defined as follows:

If every symbol of the system is encoded using codes of a fixed length, we call it a fixed-length coding scheme. On the contrary, if the code lengths of different symbols are different, such a coding scheme is called a variable-length coding scheme. Entropy coding is a generic term that refers to any variable-length coding technique that uses shorter codes for more frequent symbols and longer codes for less frequent symbols. The average bits per symbol are at least as much as the system’s entropy through any coding scheme. Entropy, in that sense, is the lower bound on the average bits per symbol by any coding scheme. Let’s look at an example to further understand these concepts.

Example#

Let’s consider a system $S$ that generates symbols from the symbol set, $X_S = \{a, b, c, d, e, f\}$ , with the following probability distribution:

This means that, on average, $2.54$ bits per symbol are required to store information generated by this system. In other words this system has, on average, $2.54$ bits of information per symbol that we call the entropy of this system. Any coding scheme will require, on average, at least these many bits per symbol.

The Huffman Coding Scheme#

The Huffman coding scheme is a basic, widely-used entropy coding scheme. Another extensively used entropy coding scheme is the arithmetic coding scheme. The most famous variants of the arithmetic coding scheme are binary arithmetic coding (BAC) and context adaptive binary arithmetic coding (CABAC).

Huffman coding works by applying the following two steps repeatedly to form a binary tree:

1- Sorting the symbols by their probabilities

2- Merging two symbols with the smallest probabilities to make one (merged working) symbol with a probability equal to the sum of probabilities of the merged symbols

These two steps are repeated until we get the sum of all the probabilities, that is $1.0$ . The merged symbol with $1.0$ probability is the root node of the binary tree, and the original symbols make the leaf nodes of the tree.

Using the same system $S$ discussed earlier in an example above, let’s see how Huffman coding works:

In the figure above, the first column shows the symbols and their probabilities. In the second column, the symbols are sorted by their probabilities. In the third column, two symbols with the least probabilities are merged, and the numbers are sorted again. The arrows relate the new position with the old position of the numbers. In the subsequent columns, merge and sort steps are repeated till we get $1.0$ shown in the last column. This process has successfully made a binary tree. The binary tree made is shown in the following figure after detangling the edges. The internal nodes that represent the merged symbols are shown in green, while leaf nodes with the original symbols are shown in blue.

To get the prefix codes for each symbol, we observe the unique path of each leaf node from the root. No such path can be a prefix of another path. There are two outgoing arrows from every green node. As a matter of convention, we label the arrow going to the left child with a $1$ and the arrow to the right child with a $0$ . It doesn’t matter if we label the arrow going to the left child with a $0$ or a $1$ , but it should be different from the label of the arrow going to the right child. By using the stated convention to label the edges of the tree, the paths from the root to each symbol are as follows:

This tells us that if Huffman coding takes a $100$ bits to store some information, fixed-length coding will take $116.96$ bits to store the same information. Fixed-length coding will take an extra $16.96$ bits for every $100$ bits spent by the Huffman coding scheme.

As the entropy of the system gives the lower bound for representing information, we can compute the efficiency of the Huffman coding scheme by comparing it with the entropy of the system. The efficiency of fixed-length coding can also be computed similarly. Let’s calculate it using the same example:

This tells us that for every $100$ bits used by the Huffman coding scheme, more than $99$ bits are used to represent information, and only less than one bit is made redundant. In contrast, for every $100$ bits, the fixed-length coding scheme only spends $84.67$ bits to represent information, and the rest of the bits are made redundant.

Impact of Data Characteristics on Entropy Coding#

If there are $k$ symbols in the system and $k$ is not an exact power of $2$ , we need $\lceil\log_2 k \rceil$ bits for fixed-length coding. For example, if we take $k=24$ , then $\lceil\log_2 k \rceil = 5$ . By using $5$ bits, the possible codes are $32$ , of which we only use $24$ . This means that $8$ codes will remain unused. Let’s consider that $k$ is an exact power of $2$ and no code is unused: Should we still use Huffman coding? The answer is yes because one symbol is more frequent than others in the system. But will using Huffman coding be beneficial? Assume that $k$ is the exact power of $2$ and that all the symbols are equally probable. In that case, the Huffman tree will be a perfect binary tree, which means that all the leaf nodes will be at the same level, and every non-leaf node will have exactly two children. As every leaf node will be at the same distance from the root node, the code for every symbol would be the same length. For this scenario, using Huffman coding is the same as using fixed-length coding.

Importance of Entropy Coding in Data Compression#

Mostly, the symbols in the data have different frequency of occurrence, which favors entropy coding over the fixed-length coding scheme. Entropy coding provides a way to represent data in a more compact way as compared to the fixed-length coding scheme. This makes it a useful loss-less compression technique. All the famous image and video compression standards, like JPEG, MPEG, and H.26x, use entropy coding as a last step before generating the compressed output. Entropy coding is considered a fundamental data compression tool if the probability distribution of the source symbols is known.

The variable length codes used by any entropy coding scheme are prefix codes. This means that no code is a prefix of another code, a property important for decoding without any ambiguity.

To understand the basic concepts of computer systems and data compression, you may want to experience the following interactive course:

Information Representation in Computer Systems

Information Representation in Computer Systems

Computer systems do not understand human instructions, nor can they perceive real-life data in its raw form. Therefore, computer systems require a way to store and represent information in an accessible way. They use software and hardware components combined to help retrieve information, store it, manipulate it, and convert it back to a human-accessible format. If you want to learn how computer systems perform complex tasks such as storing and manipulating textual data, videos, music, images of your cat, and much more, this course is meant for you.

3hrs

Beginner

25 Challenges

9 Quizzes

Written By:

Laeeq Aslam

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

Symbol	Probability
a	0.130
b	0.185
c	0.165
d	0.150
e	0.250
f	0.120

Symbol	Code (From the root, Left = 1, Right = 0)	Length of the code	Probability of the symbol
a	011	3	0.130
b	00	2	0.185
c	111	3	0.165
d	110	3	0.150
e	10	2	0.250
f	010	3	0.120

Entropy Coding: Everything You Need to Know

What Is Entropy Coding?#

Example#

Probability Distribution

The Huffman Coding Scheme#

Huffman Codes

Performance and Efficiency Considerations#

Impact of Data Characteristics on Entropy Coding#

Importance of Entropy Coding in Data Compression#

Frequently Asked Questions

What is an example of entropy coding?