In previous post, I wrote quick post about Huffman Coding. Apparently, Arithmetic Coding is replaces Huffman. It’s optional in MPEG and it’s part VP9 specs.

wiki describes it as

Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression.

The encoding works by sending a value in range of probability [0,1](as fixed-point). Basically, It is decision tree on steroids. I really like the diagram on the wiki.

Example image

For this work, we need to define Probability for each symbol. Then process each symbol reducing the number as you go. The interesting part is sending more probable symbol means less digits which means less bandwidth needed. If we keep sending less probable symbols, The fraction gets bigger and more bits to encode the symbol stream. But who knows! I could be totally worng here.

The following example is a simple implementation of arithmetic coding. I found it in course notes.

def encode(sym_id, low, high, freq):
    range_ = high - low
    high   =  low + range_ * freq[sym_id+1]
    low    =  low + range_ * freq[sym_id]
    return (range_, low, high)


STR = "eaii!"
freq = [("a", 0.0, 0.2),
        ("e", 0.2, 0.5),
        ("i", 0.5, 0.6),
        ("o", 0.6, 0.8),
        ("u", 0.8, 0.9),
        ("!", 0.9, 1.0)]

low = 0
high = 1

freq1 = [0] + [x[2] for x in freq]

print(freq1)

print("===================")
print("char range low high")
print("===================")
for i in STR:
    index = [x[0] for x in freq].index(i)
    (r, low, high) = encode(index, low, high, freq1)
    print((i, r, low, high))

And output of the above code is

[0, 0.2, 0.5, 0.6, 0.8, 0.9, 1.0]
===================
char range low high
===================
('e', 1, 0.2, 0.5)
('a', 0.3, 0.2, 0.26)
('i', 0.06, 0.23, 0.23600000000000002)
('i', 0.006000000000000005, 0.233, 0.2336)
('!', 0.0005999999999999894, 0.23354, 0.2336)