Answer to problem 2: When the frequency of occurrence of the symbols O, +, < and > respectively is 0.5, 0.25, 0.125 and 0.125 there is a better solution, for example

0 for O
10 for +
110 for <
111 for >

If you had any assignment that allocates 1,2,3 and 3 bits to these symbols respectively (in such a way that the first so-many bits of any code do not also form the whole code for another symbol), then you have found one of the various forms of this, the best solution.

Why is it better than the 2-bits-per-character assignment from Problem 1? Well, consider a typical piece of text (of N characters) in this language. Such texts will contain, on average,

N/2 O's, encoded as N/2 * 1 bit = 0.5 N bits
N/4 +'s, encoded as N/4 * 2 bit = 0.5 N bits
N/8 <'s, encoded as N/8 * 3 bit = 0.375 N bits
N/8 >'s, encoded as N/8 * 3 bit = 0.375 N bits

Sum: on average 1.75 N bits for a typical N-character text.

Contrast this with the 2 N bits needed under the old scheme.

Note: we can actually make a stronger statement than just about the average of all N-character texts. The number of bits needed for individual texts forms a distribution of which the width goes only as N, in other words for texts of a substantial size almost all texts will be very close to this average, and (proportionally) closer as N goes up.


back to the article