Huffman Codes
Learn about the Huffman codes problem and its solution using greedy algorithms.
Binary codes and prefix-free codes
A binary code assigns a string of 0s and 1s to each character in the alphabet. A binary code is prefix-free if no code is a prefix of any other. (Confusingly, prefix-free codes are also commonly called prefix codes.) 7-bit ASCII and Unicode’s UTF-8 are both prefix-free binary codes. Morse code is a binary code with symbols • and —, but it is not prefix-free because the code for E ( • ) is a prefix of the codes for I ( • • ), S ( • • • ), and H ( • • • • ).
Binary trees and code words
Any prefix-free binary code can be visualized as a binary tree with the encoded characters stored at the leaves. The code word for any symbol is given by the path from the root to the corresponding leaf; 0 for left, 1 for right. Thus, the length of any symbol’s codeword is the depth of the corresponding leaf in the code tree. Although they are superficially similar, binary code trees are not binary search trees; the order of symbols on the leaves is irrelevant.
Optimizing prefix-free binary codes
Suppose we want to encode a message written in an in-character alphabet so that the encoded message is as short as possible. Specifically, given an array of frequency counts , we want to compute a prefix-free binary code that minimizes the total encoded length of the message:
This is exactly the same cost function we considered for optimizing binary search trees, but the optimization problem is different because code trees are not required to keep the keys in any particular order.
In 1951, as a Ph.D. student at MIT, David Huffman developed the following greedy algorithm to produce such an optimal code:
Create a free account to access the full course.
By signing up, you agree to Educative's Terms of Service and Privacy Policy