Prefix Free Encoding
Suppose we wish to encode a set of letters in binary such that we can unambiguously decode a binary stream into . One possibility is to use a prefix free encoding. An encoding is prefix free if the code word for any is not the prefix of the code word for any other .
Suppose we are given the set together with their frequencies . We wish to find a prefix free code such that total length of the code when used with these frequencies is minimized . Such a code is called a Huffman encoding of the set .
Suppose we assign a code word to every and let . Then we wish to minimize .
Observation: A prefix code maps to a full binary tree. One can see this by considering the tree where the left child of a node corresponds to a and the right child corresponds to a . The corresponds to a leaf in this tree and the code-word for corresponds to the labels on the unique path from the root to , while is the depth of in this tree. We will call such a tree a prefix tree. Note that the cost of is given by .
Greedy Algorithm for Finding a Huffman Encoding
Consider the following greedy algorithm for computing a Huffman encoding:
- Sort the letters according to their frequency so that .
- Pick and to have a common parent with frequency .
- Iterate till only one element is left
Lemma: There exists an optimal tree that combines and and they have maximum depth in this tree.
Proof: Let be an optimal prefix tree. Since is a full tree it has at least two leaves of maximum depth. Let and be two leaves. If is not one of them then swap with (if ). Similarly swap with if is not in . Note that the cost of this tree can only be lower than since are the least frequent items.
Lemma: Let be an optimal tree for with frequencies and let be an optimal tree for with frequencies . Then is a subtree of .
Proof: Let be the tree where we delete the leaves and and assign to the parent of and in . Note that . We claim that must be an optimal tree for . Suppose not, and let be an optimal tree for . Then add two leaves and to the node corresponding to in to get the tree . Note that . This contradicts the optimality of for .