Foundations of Computing II Lecture 16: Information Theory and Data - PowerPoint PPT Presentation

CSE 312 Foundations of Computing II Lecture 16: Information Theory and Data Compression Stefano Tessaro tessaro@cs.washington.edu 1

Announcements • Office hours: I am available 1-3pm. • Please make sure to read the instructions for the midterm. • Practice midterm solutions posted in the afternoon. 2

Today How much can we compress data? How much information is really contained in data? Central topic in information theory , a discipline based on probability which has been extremely useful across electrical engineering, computer science, statistics, physics, … Claude Shannon, “A Mathematical Theory of Communication” , 1948 http://www.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf 3

Encoding Scheme % & = !"#(%) % !"# $!# $!#: + → 0,1 ∗ !"#: + → 0,1 ∗ Decodability. For all values % ∈ + : $!# !"# % = % Goal: Encoding should “compress” [We will formalize this using the language of probability theory] 4

Encoding – Example Say we need to encode a word from the set + = {hello, world, cse312} hello hello hello 0 0 0 world world world 1 10 11 cse312 cse312 cse312 11 11 100000000 !"# !"# !"# 5

Better Visualization – Trees hello hello 0 0 world world 10 1 cse312 cse312 11 11 0 0 1 1 world hello hello 0 1 1 world cse312 cse312 6

Focus – Prefix-free codes A code is prefix-free if no encoding is a prefix of another one. i.e. every encoding is a leaf 0 0 1 1 world hello hello 0 1 1 world cse312 cse312 Not prefix-free! Prefix-free!! 1 is a prefix of 11 7

Random Variables – Arbitrary Values We will consider random variables ?: Ω → + taking values from a (finite) set + . [We refer to these as a “random variable over the alphabet + .”] Example: + = {hello, world, cse312} C C C A B hello = A B world = A B cse312 = D E E 8

The Data Compression Problem Data = random variable ? over alphabet + ? F = !"#(?) ? !"# $!# $!#: + → 0,1 ∗ !"#: + → 0,1 ∗ Two goals: Decodability. For all values % ∈ + : $!# !"# % = % 1. 2. Minimal length. The length |F| of F should be as small as possible More formally: minimize H(|F|) 9

+ = {I, J, L} Expected Length – Example A B J = 1 A B I = 1 A B L = 1 A M 0 = 1 2 4 4 2 A M 10 = 1 4 A M 11 = 1 0 1 4 I = 1 2 ⋅ 1 + 1 4 ⋅ 2 + 1 4 ⋅ 2 = 3 0 1 H F 2 J L 10

+ = {I, J, L} Expected Length – Example A B J = 1 A B I = 1 A B L = 1 A M 0 = 1 2 4 4 4 A M 10 = 1 2 A M 11 = 1 0 1 4 J = 1 4 ⋅ 1 + 1 2 ⋅ 2 + 1 4 ⋅ 2 = 7 0 1 H F 4 I L 11

What is the shortest encoding? Problem. Given a random variable ? , find optimal (!"#, $!#) , i.e., H |!"# ? | is a small as possible. Next: There is an inherent limit on how short the encoding can be (in expectation). 12

Random Variables – Arbitrary Values Assume you are given a random variable ? with the following PMF: % Q R S T 15 1 1 1 A B (%) 32 64 64 16 You learn ? = I ; surprised? X I = log D 16/15 ≈ 0.09 You learn ? = W ; surprised? X W = 6 C Definition. The surprise of outcome % is X % = log D A Z [ 13

Entropy = Expected Surprise Definition. The entropy of a discrete RV ? over alphabet + is 1 ℍ ? = H X ? = a A B % ⋅ log D A B % [∈+ Weird convention: 0 log D 1/0 = 0 Intuitively: Captures how surprising outcome of random variable is. 14

Entropy = Expected Surprise Definition The entropy of a discrete RV ? over alphabet + is 1 ℍ ? = H X ? = a A B % ⋅ log D A B % [∈+ % Q R S T 1 1 1 15 A B (%) 16 32 64 64 ℍ ? = 15 16 15 + 1 32 ⋅ 5 + 1 64 ⋅ 6 + 1 16 ⋅ log D 64 ⋅ 6 = 15 16 15 + 11 16 log D 32 ≈ 0.431 … 15

Entropy = Expected Surprise Definition. The entropy of a discrete RV ? over alphabet + is 1 ℍ ? = H X ? = a A B % ⋅ log D A B % [∈+ % Q R S T % Q R S T A B (%) 1 0 0 0 A B (%) 1/4 1/4 1/4 1/4 1 ℍ ? = 4 ⋅ 1 ℍ ? = 1 ⋅ 0 + 3 ⋅ 0 log D 0 = 0 4 log D 4 = 2 16

Entropy = Expected Surprise Definition The entropy of a discrete RV ? over alphabet + is 1 ℍ ? = H X ? = a A B % ⋅ log D A B % [∈+ Proposition. 0 ≤ ℍ ? ≤ log D |+| Uniform distribution Takes one value with prob 1 17

Shannon’s Source Coding Theorem Theorem. (Source Coding Theorem) Let (!"#, $!#) be an optimal prefix-free encoding scheme for a RV ? , then ℍ ? ≤ H |!"# ? | ≤ ℍ ? + 1 • We cannot compress beyond the entropy • Corollary: ”uniform” data cannot be compressed • We can get within one bit of it. • Example of optimal code: Huffman Code (CSE 143?) • Result can be extended to uniquely decodable codes. (E.g., suffix free) 18

% Q R S T 1 1 1 15 Example A B (%) 32 64 64 16 H |!"# ? | = 15 16 ⋅ 1 + 1 32 ⋅ 2 + 2 ⋅ 1 0 1 64 ⋅ 3 I = 15 16 + 10 64 = 70 1 0 64 ≤ ℍ ? + 1 J 0 1 L W 19

Data Compression in the Real World Main issue: we do not know the distribution of ? • Universal compression: Lempel/Ziv/Welch – See http://web.mit.edu/6.02/www/f2011/handouts/3.pdf – Used in GIF, UNIX compress. – General idea: Assume data is sequence of symbols generated from a random process to be “estimated”. • Whole area of computer science dedicated to the topic. • This is lossless compression, very different from “lossy compression” used in images, videos, audio etc. – Assumes humans can be “fooled” with some loss of data 20

Foundations of Computing II Lecture 16: Information Theory and Data - PowerPoint PPT Presentation

CSE 312 Foundations of Computing II Lecture 16: Information Theory and Data Compression Stefano Tessaro tessaro@cs.washington.edu 1 Announcements Office hours: I am available 1-3pm. Please make sure to read the instructions for the

recap to this point foundations foundations foundations foundations genetics =

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD CLASS BUILDING THE

For personal use only BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD

For personal use only BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD

Outline Foundations of Data and Knowledge Systems EPCL Basic Training Camp 2012 3. Foundations

Trustworthy Computing * Reverse engineers agree on that! Trustworthy Computing Trustworthy

Cognitive Foundations Lecture 2: Experimental Methods (2) Foundations of Language Science and

Foundations Track 1: Believer T o Disciple Lesson 13: Financial Stewardship Foundations

Welcome to CSE 311: Foundations of Computing I F Instructor: Rajesh Rao (rao@cs.washington.edu) F

Foundations of Computing II Lecture 1: Welcome & Introduction Stefano Tessaro

COMPUTING COMMUNITY CONSORTIUM The mission of the Computing Research Association's Computing

THE COMPUTING COMMUNITY CONSORTIUM (CCC) COMPUTING COMMUNITY CONSORTIUM The mission of Computing

Calm Computing The Coming Age of Mark Weiser and John Seely Brown Calm Computing Whyfor, Calm

Ray Wu Presentation to School of Computing, National University of Singapore Computing Evolution

ManyCore ManyCore Computing: ManyCore ManyCore Computing: Computing: Computing: The Impact on

Hypergraph-based Coding Schemes for Two Source Coding Problems under Maximal Distortion Sourya

Coding for Everyone How your library can help anyone learn to code July 19, 2016 Kelly Smith

INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY Lecture 4 - Elements of Information

DAQ Software Management Plans Pengfei Ding DUNE DAQ Meeting November 4 th , 2019 Outline How

Lecture 2 Lossless Source Coding I-Hsiang Wang Department of Electrical Engineering National

2. Information Theory -Channel Capacity Ying Cui Department of Electronic Engineering Shanghai

Conjugate prior summary Distribution Likelihood p ( x | ) Prior p ( ) Distribution (1

The Simple English guide to human-generated secrets Computers try to tell humans apart by asking

Foundations of Computing II Lecture 16: Information Theory and Data - PowerPoint PPT Presentation

CSE 312 Foundations of Computing II Lecture 16: Information Theory and Data Compression Stefano Tessaro tessaro@cs.washington.edu 1 Announcements Office hours: I am available 1-3pm. Please make sure to read the instructions for the

recap to this point foundations foundations foundations foundations genetics =

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD CLASS BUILDING THE

For personal use only BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD

For personal use only BUILDING THE FOUNDATIONS OF A WORLD BUILDING THE FOUNDATIONS OF A WORLD

Outline Foundations of Data and Knowledge Systems EPCL Basic Training Camp 2012 3. Foundations

Trustworthy Computing * Reverse engineers agree on that! Trustworthy Computing Trustworthy

Cognitive Foundations Lecture 2: Experimental Methods (2) Foundations of Language Science and

Foundations Track 1: Believer T o Disciple Lesson 13: Financial Stewardship Foundations

Welcome to CSE 311: Foundations of Computing I F Instructor: Rajesh Rao (rao@cs.washington.edu) F

Foundations of Computing II Lecture 1: Welcome &amp; Introduction Stefano Tessaro

COMPUTING COMMUNITY CONSORTIUM The mission of the Computing Research Association's Computing

THE COMPUTING COMMUNITY CONSORTIUM (CCC) COMPUTING COMMUNITY CONSORTIUM The mission of Computing

Calm Computing The Coming Age of Mark Weiser and John Seely Brown Calm Computing Whyfor, Calm

Ray Wu Presentation to School of Computing, National University of Singapore Computing Evolution

ManyCore ManyCore Computing: ManyCore ManyCore Computing: Computing: Computing: The Impact on

Hypergraph-based Coding Schemes for Two Source Coding Problems under Maximal Distortion Sourya

Coding for Everyone How your library can help anyone learn to code July 19, 2016 Kelly Smith

INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY Lecture 4 - Elements of Information

DAQ Software Management Plans Pengfei Ding DUNE DAQ Meeting November 4 th , 2019 Outline How

Lecture 2 Lossless Source Coding I-Hsiang Wang Department of Electrical Engineering National

2. Information Theory -Channel Capacity Ying Cui Department of Electronic Engineering Shanghai

Conjugate prior summary Distribution Likelihood p ( x | ) Prior p ( ) Distribution (1

The Simple English guide to human-generated secrets Computers try to tell humans apart by asking

Foundations of Computing II Lecture 1: Welcome & Introduction Stefano Tessaro