# Information theory

A mathematical theory of communication, by Claude Shannon (1948)

### Uncertainty and information

Shannon's thoughts on communication...

• A message consists of some sequence of symbols generated by a 'source',
• transmitted through a 'channel' to a receiver.
• Is there a *most compact* way to encode a message?
• What if there's noise in the channel? What can be done so that the message still arrives, despite noise?

### Uncertainty, Information, Probability

Which statement would be more *surprising*?

1. Evan will eat turkey tomorrow.
2. Evan will eat turkey on December 3.

What are the relative probabilities of the two events?

Which statement carries more/less *information*?

1. I flipped a coin and it turned up heads!
2. I rolled a dice, and a 3 came up!
3. I rolled out of bed today, and it was snowing!
4. I rolled out of bed today, and the temperature outside was above absolute zero!

A source spits out characters...

01110010011000000101101...

CURTYWWG BNMMPTYUXXIICNWKEFVOVK SOFFDDDJKDSSCJCJFGMEWUX

LITTLE RED RIDING RAN THROU....

How much uncertainty ($H$) is there about the next character in the sequence??

Summarizing Carter, 20.1-20.3:

The uncertainty of a set of possible outcomes with probabilities $\{p_1, p_2, p_3,....\}$ is $$H=-K\sum_i p_i \log_2 p_i$$ where $K$ is a positive constant.

This is also the amount of information contained in a report of which outcome actually occurred.

With $K=1$ the units of information are bits (=Binary digITS).

In what follows, use the change of base formula... $$\log_2 x = \ln x / \ln 2.$$

Trying to make this plausible, calculate $H$ for a few possible-to-calculate situations:

1. Certain events: Flip a two-headed coin, and you find a typical sequence to be...

T,T,T,T,T,T, [99 years with no change] , T,T,T,T.... What is the next symbol in the sequence? What is $H$?
2. One decimal digit: How many bits in a decimal digit?

3. Fair coin tosses:

H,T,T,H,T,H,H,H,T,T,H...
4. Eight-sided dice:

1,3,1,4,8,7,6,8,2,...
5. Loaded dice: with sides 1,2,3,2,4,2,8,2, should have *less* uncertainty than a fair dice, right??
6. 26 letters plus space?
7. Coins again: What is $H(x)$ for a coin with probability $x$ of H, and probability $1-x$ of T? [Graph]