Compression: Information Theory Greg Plaxton Theory in Programming - - PowerPoint PPT Presentation

compression information theory
SMART_READER_LITE
LIVE PREVIEW

Compression: Information Theory Greg Plaxton Theory in Programming - - PowerPoint PPT Presentation

Compression: Information Theory Greg Plaxton Theory in Programming Practice, Fall 2005 Department of Computer Science University of Texas at Austin Coding Theory Encoder Input: a message over some finite alphabet such as { 0 , 1 } or {


slide-1
SLIDE 1

Compression: Information Theory

Greg Plaxton Theory in Programming Practice, Fall 2005 Department of Computer Science University of Texas at Austin

slide-2
SLIDE 2

Coding Theory

  • Encoder

– Input: a message over some finite alphabet such as {0, 1} or {a, . . . , z} – Output: encoded message

  • Decoder

– Input: some encoded message produced by the encoder – Output: (a good approximation to) the associated input message

  • Motivation?

Theory in Programming Practice, Plaxton, Fall 2005

slide-3
SLIDE 3

Some Applications of Coding Theory

  • Compression

– Goal: Produce a short encoding of the input message

  • Error detection/correction

– Goal: Produce a fault-tolerant encoding of the input message

  • Cryptography

– Goal: Produce an encoding of the input message that can only be decoded by the intended recipient(s) of the message

  • It is desirable for the encoding and decoding algorithms to be efficient

in terms of time and space – Various tradeoffs are appropriate for different applications

Theory in Programming Practice, Plaxton, Fall 2005

slide-4
SLIDE 4

Compression

  • Lossless: decoder recovers the original input message
  • Lossy: decoder recovers an approximation to the original input message
  • The application dictates how much, if any, loss we can tolerate

– Text compression is usually required to be lossless – Image/video compression is often lossy

  • We will focus on techniques for lossless compression

Theory in Programming Practice, Plaxton, Fall 2005

slide-5
SLIDE 5

Text Compression

  • Practical question: I’m running out of disk space; how much can I

compress my files?

  • A (naive?) idea:

– Any file can be compressed to the empty string: just write a decoder that outputs the file when given the empty string as input! – A problem with this approach is that we need to store the decoder, and the naive implementation of the decoder (which simply stores the original file in some static data structure within the decoder program) is at least as large as the original file – Can this idea be salvaged?

Theory in Programming Practice, Plaxton, Fall 2005

slide-6
SLIDE 6

Kolmogorov Complexity

  • In some cases, a large file can be generated by a very small program

running on the empty string; e.g., a file containing a list of the first trillion prime numbers

  • Your files can be compressed down to the size of the smallest program

that (when given the empty string as input) produces them as output – How do I figure out this shortest program? – Won’t it be time-consuming to write/debug/maintain?

Theory in Programming Practice, Plaxton, Fall 2005

slide-7
SLIDE 7

Information Theory

  • May be viewed as providing a practical way to (approximately) carry
  • ut the strategy suggested by Kolmogorov complexity
  • Consider a file that you would like to compress

– Assume that this file can be viewed, to a reasonable degree

  • f approximation, as being drawn from a particular probability

distribution (e.g., we will see that this is true of English text) – Perhaps many other people have files drawn from this distribution,

  • r from distributions in a similar class

– If so, a good encoder/decoder pair for that class of distributions may already exist; with luck, it will already be installed on your system

Theory in Programming Practice, Plaxton, Fall 2005

slide-8
SLIDE 8

Example: English Text

  • In what sense can we view English text as being (approximately) drawn

from a probability distribution?

  • English text is one of the example applications discussed in Shannon’s

1948 paper “A Mathematical Theory of Communication” – On page 7 we find a sequence of successively more accurate probabilistic models of English text – Claude Shannon (1916–2001) is known as the “father of information theory”

Theory in Programming Practice, Plaxton, Fall 2005

slide-9
SLIDE 9

Entropy in Thermodynamics

  • In thermodynamics, entropy is a measure of energy dispersal

– The more “spread out” the energy of a system is, the higher the entropy – A system in which the energy is concentrated at a single point has zero entropy – A system in which the energy is uniformly distributed has reached its maximum possible entropy

  • Second law of thermodynamics: The entropy of an isolated system can
  • nly increase

– Bad news: The entropy of the universe can only increase as matter and energy degrade to an ultimate state of inert uniformity – Good news: This process is likely to take a while

Theory in Programming Practice, Plaxton, Fall 2005

slide-10
SLIDE 10

Entropy in Information Theory (Shannon)

  • A measure of the uncertainty associated with a probability distribution

– The more “spread out” the distribution is, the higher the entropy – A probability distribution in which all

  • f

the probability is concentrated on a single outcome has zero entropy – For any given set of possible outcomes, the probability distribution with the maximum entropy is the uniform distribution

  • Consider a distribution over a set of n outcomes in which the ith
  • utcome has associated probability pi; Shannon defined the entropy of

this distribution as

  • i

pi log 1 pi = −

  • i

pi log pi

  • The logarithm above is normally assumed to be taken base 2, in which

case the units of entropy are bits (binary digits)

Theory in Programming Practice, Plaxton, Fall 2005

slide-11
SLIDE 11

Entropy of an I.I.D. Source

  • Consider a message in which each successive symbol is independently

drawn from the same probability distribution over n symbols, where the probability of drawing the ith symbol is pi

  • The entropy of such a source is −

i pi log pi bits per symbol

  • Example: Shannon’s first-order model of English text yields an entropy
  • f 4.07 bits per symbol

Theory in Programming Practice, Plaxton, Fall 2005

slide-12
SLIDE 12

Discrete Markov Process

  • A more general notion of a source
  • Includes as special cases the kth order processes discussed earlier in

connection with Shannon’s modeling of English text

  • Closely related to the concept of finite state machines to be discussed

later in this course

Theory in Programming Practice, Plaxton, Fall 2005

slide-13
SLIDE 13

Entropy of a Discrete Markov Process

  • Under certain (relatively mild) technical assumptions, for any k > 0

and any X in Ak where A denotes the set of symbols, the fraction of all sequences of length k in the output that are equal to X converges to a particular number p(X)

  • We may then define Hk as

1 k

  • X∈Ak

p(X) log 1 p(X)

  • Theorem (Shannon): If a given discrete Markov process satisfies the

technical assumptions alluded to above, then its entropy is equal to limk→∞ Hk bits per symbol

Theory in Programming Practice, Plaxton, Fall 2005

slide-14
SLIDE 14

Example: English Text

  • Zero-order approximation: log 27 ≈ 4.75 bits per symbol
  • First-order approximation: 4.07 bits per symbol
  • Second-order approximation: 3.36 bits per symbol
  • Third-order approximation: 2.77 bits per symbol
  • Approximation based on experiments involving humans: 0.6 to 1.3 bits

per symbol

Theory in Programming Practice, Plaxton, Fall 2005

slide-15
SLIDE 15

Entropy as a Measure of Compressibility

  • Fundamental Theorem for a Noiseless Channel (Shannon): Let a source

have entropy H (bits per symbol) and a channel have capacity C (bits per second). Then it is possible to encode the output of the source in such a way as to transmit at the average rate C

H − symbols per

second where is arbitrarily small. It is not possible to transmit at an average rate greater than C

H.

  • What does this imply regarding how much we can hope to compress a

given file containing n symbols, where n is large? – Suppose the file content is similar in structure to the output of a source with entropy H – Then we cannot hope to encode the file using fewer than about nH bits – Furthermore this bound can be achieved to within an arbitrarily small factor

Theory in Programming Practice, Plaxton, Fall 2005