Compression: Information Theory Greg Plaxton Theory in Programming - - PowerPoint PPT Presentation
Compression: Information Theory Greg Plaxton Theory in Programming - - PowerPoint PPT Presentation
Compression: Information Theory Greg Plaxton Theory in Programming Practice, Fall 2005 Department of Computer Science University of Texas at Austin Coding Theory Encoder Input: a message over some finite alphabet such as { 0 , 1 } or {
Coding Theory
- Encoder
– Input: a message over some finite alphabet such as {0, 1} or {a, . . . , z} – Output: encoded message
- Decoder
– Input: some encoded message produced by the encoder – Output: (a good approximation to) the associated input message
- Motivation?
Theory in Programming Practice, Plaxton, Fall 2005
Some Applications of Coding Theory
- Compression
– Goal: Produce a short encoding of the input message
- Error detection/correction
– Goal: Produce a fault-tolerant encoding of the input message
- Cryptography
– Goal: Produce an encoding of the input message that can only be decoded by the intended recipient(s) of the message
- It is desirable for the encoding and decoding algorithms to be efficient
in terms of time and space – Various tradeoffs are appropriate for different applications
Theory in Programming Practice, Plaxton, Fall 2005
Compression
- Lossless: decoder recovers the original input message
- Lossy: decoder recovers an approximation to the original input message
- The application dictates how much, if any, loss we can tolerate
– Text compression is usually required to be lossless – Image/video compression is often lossy
- We will focus on techniques for lossless compression
Theory in Programming Practice, Plaxton, Fall 2005
Text Compression
- Practical question: I’m running out of disk space; how much can I
compress my files?
- A (naive?) idea:
– Any file can be compressed to the empty string: just write a decoder that outputs the file when given the empty string as input! – A problem with this approach is that we need to store the decoder, and the naive implementation of the decoder (which simply stores the original file in some static data structure within the decoder program) is at least as large as the original file – Can this idea be salvaged?
Theory in Programming Practice, Plaxton, Fall 2005
Kolmogorov Complexity
- In some cases, a large file can be generated by a very small program
running on the empty string; e.g., a file containing a list of the first trillion prime numbers
- Your files can be compressed down to the size of the smallest program
that (when given the empty string as input) produces them as output – How do I figure out this shortest program? – Won’t it be time-consuming to write/debug/maintain?
Theory in Programming Practice, Plaxton, Fall 2005
Information Theory
- May be viewed as providing a practical way to (approximately) carry
- ut the strategy suggested by Kolmogorov complexity
- Consider a file that you would like to compress
– Assume that this file can be viewed, to a reasonable degree
- f approximation, as being drawn from a particular probability
distribution (e.g., we will see that this is true of English text) – Perhaps many other people have files drawn from this distribution,
- r from distributions in a similar class
– If so, a good encoder/decoder pair for that class of distributions may already exist; with luck, it will already be installed on your system
Theory in Programming Practice, Plaxton, Fall 2005
Example: English Text
- In what sense can we view English text as being (approximately) drawn
from a probability distribution?
- English text is one of the example applications discussed in Shannon’s
1948 paper “A Mathematical Theory of Communication” – On page 7 we find a sequence of successively more accurate probabilistic models of English text – Claude Shannon (1916–2001) is known as the “father of information theory”
Theory in Programming Practice, Plaxton, Fall 2005
Entropy in Thermodynamics
- In thermodynamics, entropy is a measure of energy dispersal
– The more “spread out” the energy of a system is, the higher the entropy – A system in which the energy is concentrated at a single point has zero entropy – A system in which the energy is uniformly distributed has reached its maximum possible entropy
- Second law of thermodynamics: The entropy of an isolated system can
- nly increase
– Bad news: The entropy of the universe can only increase as matter and energy degrade to an ultimate state of inert uniformity – Good news: This process is likely to take a while
Theory in Programming Practice, Plaxton, Fall 2005
Entropy in Information Theory (Shannon)
- A measure of the uncertainty associated with a probability distribution
– The more “spread out” the distribution is, the higher the entropy – A probability distribution in which all
- f
the probability is concentrated on a single outcome has zero entropy – For any given set of possible outcomes, the probability distribution with the maximum entropy is the uniform distribution
- Consider a distribution over a set of n outcomes in which the ith
- utcome has associated probability pi; Shannon defined the entropy of
this distribution as
- i
pi log 1 pi = −
- i
pi log pi
- The logarithm above is normally assumed to be taken base 2, in which
case the units of entropy are bits (binary digits)
Theory in Programming Practice, Plaxton, Fall 2005
Entropy of an I.I.D. Source
- Consider a message in which each successive symbol is independently
drawn from the same probability distribution over n symbols, where the probability of drawing the ith symbol is pi
- The entropy of such a source is −
i pi log pi bits per symbol
- Example: Shannon’s first-order model of English text yields an entropy
- f 4.07 bits per symbol
Theory in Programming Practice, Plaxton, Fall 2005
Discrete Markov Process
- A more general notion of a source
- Includes as special cases the kth order processes discussed earlier in
connection with Shannon’s modeling of English text
- Closely related to the concept of finite state machines to be discussed
later in this course
Theory in Programming Practice, Plaxton, Fall 2005
Entropy of a Discrete Markov Process
- Under certain (relatively mild) technical assumptions, for any k > 0
and any X in Ak where A denotes the set of symbols, the fraction of all sequences of length k in the output that are equal to X converges to a particular number p(X)
- We may then define Hk as
1 k
- X∈Ak
p(X) log 1 p(X)
- Theorem (Shannon): If a given discrete Markov process satisfies the
technical assumptions alluded to above, then its entropy is equal to limk→∞ Hk bits per symbol
Theory in Programming Practice, Plaxton, Fall 2005
Example: English Text
- Zero-order approximation: log 27 ≈ 4.75 bits per symbol
- First-order approximation: 4.07 bits per symbol
- Second-order approximation: 3.36 bits per symbol
- Third-order approximation: 2.77 bits per symbol
- Approximation based on experiments involving humans: 0.6 to 1.3 bits
per symbol
Theory in Programming Practice, Plaxton, Fall 2005
Entropy as a Measure of Compressibility
- Fundamental Theorem for a Noiseless Channel (Shannon): Let a source
have entropy H (bits per symbol) and a channel have capacity C (bits per second). Then it is possible to encode the output of the source in such a way as to transmit at the average rate C
H − symbols per
second where is arbitrarily small. It is not possible to transmit at an average rate greater than C
H.
- What does this imply regarding how much we can hope to compress a
given file containing n symbols, where n is large? – Suppose the file content is similar in structure to the output of a source with entropy H – Then we cannot hope to encode the file using fewer than about nH bits – Furthermore this bound can be achieved to within an arbitrarily small factor
Theory in Programming Practice, Plaxton, Fall 2005