Chapter 6: Compression and Encryption CS105: Great Insights in - - PowerPoint PPT Presentation

chapter 6
SMART_READER_LITE
LIVE PREVIEW

Chapter 6: Compression and Encryption CS105: Great Insights in - - PowerPoint PPT Presentation

Chapter 6: Compression and Encryption CS105: Great Insights in Computer Science Thermostat This program turns on the heat whenever it gets too cold. Gettysburg Address Four score and seven years ago our fathers brought forth on this


slide-1
SLIDE 1

Chapter 6:

Compression and Encryption

CS105: Great Insights in Computer Science

slide-2
SLIDE 2

Thermostat

This program turns on the heat whenever it gets too cold.

slide-3
SLIDE 3

Gettysburg Address

Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this. But, in a larger sense, we can not dedicate -- we can not consecrate -- we can not hallow -- this

  • ground. The brave men, living and dead, who struggled here, have consecrated it, far above
  • ur poor power to add or detract. The world will little note, nor long remember what we say

here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us -- that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion -- that we here highly resolve that these dead shall not have died in vain -

  • that this nation, under God, shall have a new birth of freedom -- and that government of the

people, by the people, for the people, shall not perish from the earth.

slide-4
SLIDE 4

Character Counts

For simplicity, let’s turn the uppercase letters into lowercase letters. That leaves us with: 282 <s> 4 <b> 22 , 15 - 10 . 0 ? 102 a 14 b 31 c 58 d 165 e 27 f 28 g 80 h 68 i 0 j 3 k 42 l 13 m 77 n 93 o 15 p 1 q 79 r 44 s 126 t 21 u 24 v 28 w 0 x 10 y 0 z

slide-5
SLIDE 5

Attempt #1: ASCII

  • The standard format for representing

characters uses 8 bits per character.

  • The Gettysburg Address is 1482 characters

long, so a total of 11856 bits is needed using this representation.

  • 8 bits per character
  • 11856 total bits
  • 100% the size of ASCII representation.
slide-6
SLIDE 6

Attempt #1: ASCII

  • The standard format for representing

characters uses 8 bits per character.

  • The Gettysburg Address is 1482 characters

long, so a total of 11856 bits is needed using this representation.

  • 8 bits per character
  • 11856 total bits
  • 100% the size of ASCII representation.

1482 x 8

slide-7
SLIDE 7

Attempt #2: Compact

  • Note that, at least in its lowercase form, there

are only 32 different characters needed.

  • Therefore, each can be assigned a 5-bit code

(32 different 5-bits patterns).

  • 5 bits per character
  • 7410 total bits
  • 62.5% the size of ASCII representation.
slide-8
SLIDE 8

Attempt #2: Compact

  • Note that, at least in its lowercase form, there

are only 32 different characters needed.

  • Therefore, each can be assigned a 5-bit code

(32 different 5-bits patterns).

  • 5 bits per character
  • 7410 total bits
  • 62.5% the size of ASCII representation.

1482 x 5

slide-9
SLIDE 9

5-bit Patterns

00000 <s> 00001 <b> 00010 , 00011 - 00100 . 00101 ? 00110 a 00111 b 01000 c 01001 d 01010 e 01011 f 01100 g 01101 h 01110 i 01111 j 10000 k 10001 l 10010 m 10011 n 10100 o 10101 p 10110 q 10111 r 11000 s 11001 t 11010 u 11011 v 11100 w 11101 x 11110 y 11111 z

slide-10
SLIDE 10

Attempt #3: Vary Length

  • Some characters are much more common

than others.

  • Give the 4 most common characters a 3-bit

code, and the remaining 28 a 6-bit code.

  • How many bits do we need now?
slide-11
SLIDE 11

Variable Length Patterns

000 <s> 001 e 010 t 011 a 100000 o 100001 h 100010 r 100011 n 100100 i 100101 d 100110 s 100111 l 101000 c 101001 w 101010 g 101011 f 101100 v 101101 , 101110 u 101111 - 110000 p 110001 b 110010 m 110011 . 110100 y 110101 <b> 110110 k 110111 q 111000 ? 111001 j 111010 x 111011 z

slide-12
SLIDE 12

Decodability

Note that the code was chosen so that the first bit

  • f each character tells you whether the code is

short (0) or long (1). This choice ensures that a message can actually be decoded:

100001100100000010100001001100010001110011

h i <s> t h e r e . 42 bits, not 45. But, harder to work with.

slide-13
SLIDE 13

What Gives?

  • We had assigned all 32 characters 5-bit

codes.

  • Now we’ve got 4 that have 3-bit codes and

28 that are 6-bit codes. So, more than half of the characters have actually gotten longer.

  • How can that change help?
  • Need to factor in how many of each

characters there are.

slide-14
SLIDE 14

Adding Up the Bits

  • How many bits to write down just the letter “y”?

Well, there are 10 “y”s and each takes 6 bits. So, 60 bits. (It was 50, before.)

  • How about “t”? There are 126 and each takes 3
  • bits. That’s 378 (was 630).
  • So, how do we total them all up?
  • Let c be a character, freq(c) the number of times

it appears, and len(c) its encoding length.

  • Total bits = c freq(c) x len(c)
slide-15
SLIDE 15

Variable Length Patterns

000 <s> 001 e 010 t 011 a 100000 o 100001 h 100010 r 100011 n 100100 i 100101 d 100110 s 100111 l 101000 c 101001 w 101010 g 101011 f 101100 v 101101 , 101110 u 101111 - 110000 p 110001 b 110010 m 110011 . 110100 y 110101 <b> 110110 k 110111 q 111000 ? 111001 j 111010 x 111011 z

slide-16
SLIDE 16

Summing It Up

  • 282x3 + 165x3 + 126x3 +102x3 + 93x6+

80x6 + 79x6 + ... + 0x6 + 0x6 = 6867

282 <s> 165 e 126 t 102 a 93 o 80 h 79 r 77 n 68 i 58 d 44 s 42 l 31 c 28 w 28 g 27 f 24 v 22 , 21 u 15 - 15 p 14 b 13 m 10 . 10 y 4 <b> 3 k 1 q 0 ? 0 j 0 x 0 z

slide-17
SLIDE 17

Attempt #3: Summary

  • Total for this example:
  • 4.6 bits per character (1482 characters)
  • 6867 total bits
  • 57.9% the size of ASCII representation.

Reminder: We started with 11856 total bits

slide-18
SLIDE 18

Attempt #4: Sorted

0 <s> 10 e 110 t 1110 a 11110 o ...

Total for this example:

  • 7.1 bits per character
  • 10467 total bits
  • 88.3% the size of ASCII representation.
slide-19
SLIDE 19

Attempt #5: Your Turn

  • Make sure it is decodable!

282 <s> 165 e 126 t 102 a 93 o 80 h 79 r 77 n 68 i 58 d 44 s 42 l 31 c 28 w 28 g 27 f 24 v 22 , 21 u 15 - 15 p 14 b 13 m 10 . 10 y 4 <b> 3 k 1 q 0 ? 0 j 0 x 0 z

slide-20
SLIDE 20

Can We Do Better?

  • Shannon invented information theory, which

talks about bits and randomness and encodings.

  • Fano and Shannon worked together on

finding minimal size codes. They found a good heuristic, but didn’t solve it.

  • Fano assigned the problem to his class.
  • Huffman solved it, not knowing his prof. had

unsuccessfully struggled with it.

slide-21
SLIDE 21

Tree (Prefix) Code

  • First, notice that a code can be drawn as a

tree.

  • Left = “0”, right = “1”. So, e = “001”, w =

“101001”.

  • Tree structure ensures code is decodable: Bits

tell you unambiguously which character.

a <s> e t

  • h r

n i d s l c w g f v , u

  • p

b m . y <b> k q ? j x z

slide-22
SLIDE 22

Tree (Prefix) Code

  • First, notice that a code can be drawn as a

tree.

  • Left = “0”, right = “1”. So, e = “001”, w =

“101001”.

  • Tree structure ensures code is decodable: Bits

tell you unambiguously which character.

a <s> e t

  • h r

n i d s l c w g f v , u

  • p

b m . y <b> k q ? j x z

1 1 1 1 1

slide-23
SLIDE 23

Huffman Coding

  • Make each character a subtree (”block”) with

count equal to its frequency.

  • Take two blocks with smallest counts and

“merge” them into left and right branches. The count for the new block is the sum of the counts of the blocks it is made out of.

  • Repeat until all blocks have been merged

into one big block (single tree).

  • Read the code off the branches in the tree.
slide-24
SLIDE 24

Partial Example

21 u 13 m 10 . 15 p 14 b 15

  • 10

y 4 <b> 3 k 1 q 22 ,

slide-25
SLIDE 25

Partial Example

21 u 13 m 10 . 15 p 14 b 15

  • 10

y 4 <b> 3 k 1 q 21 u 13 m 10 . 15 p 14 b 15

  • 10

y 4 <b> 4 3 k 1 q 22 , 22 ,

slide-26
SLIDE 26

Partial Example

21 u 13 m 10 . 15 p 14 b 15

  • 10

y 4 <b> 3 k 1 q 21 u 13 m 10 . 15 p 14 b 15

  • 10

y 4 <b> 4 3 k 1 q 21 u 13 m 10 . 15 p 14 b 15

  • 10

y 4 8 4 <b> 3 k 1 q 22 , 22 , 22 ,

slide-27
SLIDE 27

Partial Example

21 u 13 m 10 . 15 p 14 b 15

  • 10

y 4 <b> 3 k 1 q 21 u 13 m 10 . 15 p 14 b 15

  • 10

y 4 <b> 4 3 k 1 q 21 u 13 m 10 . 15 p 14 b 15

  • 10

y 4 8 4 <b> 3 k 1 q 21 u 13 m 10 . 15 p 14 b 15

  • 18

10 y 4 8 4 <b> 3 k 1 q 22 , 22 , 22 , 22 ,

slide-28
SLIDE 28

Partial Example

21 u 13 m 10 . 15 p 14 b 15

  • 10

y 4 <b> 3 k 1 q 21 u 13 m 10 . 15 p 14 b 15

  • 10

y 4 <b> 4 3 k 1 q 21 u 13 m 10 . 15 p 14 b 15

  • 10

y 4 8 4 <b> 3 k 1 q 21 u 13 m 10 . 15 p 14 b 15

  • 18

10 y 4 8 4 <b> 3 k 1 q 21 u 23 13 m 10 . 15 p 14 b 15

  • 18

10 y 4 8 4 <b> 3 k 1 q 22 , 22 , 22 , 22 , 22 ,

slide-29
SLIDE 29

Partial Example

21 u 13 m 10 . 15 p 14 b 15

  • 10

y 4 <b> 3 k 1 q 21 u 13 m 10 . 15 p 14 b 15

  • 10

y 4 <b> 4 3 k 1 q 21 u 13 m 10 . 15 p 14 b 15

  • 10

y 4 8 4 <b> 3 k 1 q 21 u 13 m 10 . 15 p 14 b 15

  • 18

10 y 4 8 4 <b> 3 k 1 q 21 u 23 13 m 10 . 15 p 14 b 15

  • 18

10 y 4 8 4 <b> 3 k 1 q 21 u 23 13 m 10 . 29 15 p 14 b 15

  • 18

10 y 4 8 4 <b> 3 k 1 q 22 , 22 , 22 , 22 , 22 , 22 ,

slide-30
SLIDE 30

Partial Example

21 u 13 m 10 . 15 p 14 b 15

  • 10

y 4 <b> 3 k 1 q 21 u 13 m 10 . 15 p 14 b 15

  • 10

y 4 <b> 4 3 k 1 q 21 u 13 m 10 . 15 p 14 b 15

  • 10

y 4 8 4 <b> 3 k 1 q 21 u 13 m 10 . 15 p 14 b 15

  • 18

10 y 4 8 4 <b> 3 k 1 q 21 u 23 13 m 10 . 15 p 14 b 15

  • 18

10 y 4 8 4 <b> 3 k 1 q 21 u 23 13 m 10 . 29 15 p 14 b 15

  • 18

10 y 4 8 4 <b> 3 k 1 q 21 u 23 13 m 10 . 29 15 p 14 b 15

  • 18

10 y 4 8 4 <b> 3 k 1 q 33 22 , 22 , 22 , 22 , 22 , 22 , 22 ,

slide-31
SLIDE 31

Completed Code Tree

1482 282 <s> 165 e 80 h 79 r 159 324 606 42 l 22 , 21 u 43 85 44 s 24 v 13 m 10 . 23 47 91 176 102 a 93

  • 195

371 28 g 27 f 55 28 w 15 p 14 b 29 57 112 58 d 31 c 15

  • 10

y 4 <b> 3 k 1 q 4 8 18 33 64 122 234 126 t 77 n 68 i 145 271 505 876

slide-32
SLIDE 32

Created Code

11 <s> 100 e 0001 t 0100 a 0101 o 1010 h 1011 r 00000 n 00001 i 00101 d 01101 s 01111 l 001001 c 001101 w 001110 g 001111 f 011000 v 011100 , 011101 u 0010001 - 0011000 p 0011001 b 0110010 m 0110011 . 00100000 y 001000011 <b> 0010000100 k 0010000101.q

slide-33
SLIDE 33

Huffman: Summary

Total for this example:

  • 4.1 bits per character
  • 6135 total bits
  • 51.7% the size of ASCII representation.
  • Minimal for a character-by-character code for

this passage. (No other character-by-character code leads to more compression.)

slide-34
SLIDE 34

Huffman Example

mississippi

m: 1 i: 4 p: 2 s: 4

slide-35
SLIDE 35

Huffman Example

mississippi

m 1 p 2 i 4 s 4

m: 1 i: 4 p: 2 s: 4

slide-36
SLIDE 36

Huffman Example

mississippi

m 1 p 2 i 4 s 4

m: 1 i: 4 p: 2 s: 4

3

slide-37
SLIDE 37

Huffman Example

mississippi

m 1 p 2 i 4 s 4

m: 1 i: 4 p: 2 s: 4

3 7

slide-38
SLIDE 38

Huffman Example

mississippi

m 1 p 2 i 4 s 4

m: 1 i: 4 p: 2 s: 4

3 7 11

slide-39
SLIDE 39

Huffman Example

mississippi

m 1 p 2 i 4 s 4

m: 1 i: 4 p: 2 s: 4

1 1 1

slide-40
SLIDE 40

Huffman Example

mississippi

m 1 p 2 i 4 s 4

m: 1 i: 4 p: 2 s: 4

1 1 1

What are the codes for each character?

slide-41
SLIDE 41

Huffman Example

m 1 p 2 i 4 s 4

m: 1 i: 4 p: 2 s: 4

1 1 1

What are the codes for each character?

slide-42
SLIDE 42

Huffman Example

m 1 p 2 i 4 s 4

m: 1 i: 4 p: 2 s: 4

1 1 1

What are the codes for each character?

000

slide-43
SLIDE 43

Huffman Example

m 1 p 2 i 4 s 4

m: 1 i: 4 p: 2 s: 4

1 1 1

What are the codes for each character?

000 01

slide-44
SLIDE 44

Huffman Example

m 1 p 2 i 4 s 4

m: 1 i: 4 p: 2 s: 4

1 1 1

What are the codes for each character?

000 01 001

slide-45
SLIDE 45

Huffman Example

m 1 p 2 i 4 s 4

m: 1 i: 4 p: 2 s: 4

1 1 1

What are the codes for each character?

000 01 001 1

slide-46
SLIDE 46

Huffman Example

m 1 p 2 i 4 s 4

m: 1 i: 4 p: 2 s: 4

1 1 1

How many bits does it take to write “mississippi”?

000 01 001 1

slide-47
SLIDE 47

Huffman Example

m 1 p 2 i 4 s 4

m: 1 i: 4 p: 2 s: 4

1 1 1

How many bits does it take to write “mississippi”?

000 01 001 1

1 x 3

slide-48
SLIDE 48

Huffman Example

m 1 p 2 i 4 s 4

m: 1 i: 4 p: 2 s: 4

1 1 1

How many bits does it take to write “mississippi”?

000 01 001 1

1 x 3 + 4 x 2

slide-49
SLIDE 49

Huffman Example

m 1 p 2 i 4 s 4

m: 1 i: 4 p: 2 s: 4

1 1 1

How many bits does it take to write “mississippi”?

000 01 001 1

1 x 3 + 4 x 2 + 2 x 3

slide-50
SLIDE 50

Huffman Example

m 1 p 2 i 4 s 4

m: 1 i: 4 p: 2 s: 4

1 1 1

How many bits does it take to write “mississippi”?

000 01 001 1

1 x 3 + 4 x 2 + 2 x 3 + 4 x 1

slide-51
SLIDE 51

Huffman Example

m 1 p 2 i 4 s 4

m: 1 i: 4 p: 2 s: 4

1 1 1

How many bits does it take to write “mississippi”?

000 01 001 1

1 x 3 + 4 x 2 + 2 x 3 + 4 x 1 = 21 bits

slide-52
SLIDE 52

Other Codes

  • error detecting: Know if something has been

modified (bit flip).

  • error correcting: Know which bit has been
  • modified. Can you think of a familiar

example?

  • multicharacter: Encode sequences (like “the”)

with their own codes. Can get much closer to minimum possible code length: “Shannon’s entropy”.

slide-53
SLIDE 53

Engineering as a Profession

Wikipedia: An engineer is a professional practitioner of engineering, concerned with applying scientific knowledge, mathematics and ingenuity to develop solutions for technical problems

en.wikipedia.org/wiki/Engineer

slide-54
SLIDE 54

Engineering as a Profession

Problem: We need to transport 1 million lbs of goods per day across 100 miles of desert, for the next 5 years. How should we do it?

Initial Cost Daily Cost Helicopter 100 M 500 K Truck/Road 20 B 100 K Train 30 B 50 K Rocket 2 M 10 M

slide-55
SLIDE 55
  • Govt. Perspective

“Today, only 14 percent of all undergraduate students enroll in what we call the STEM subjects – science, technology, engineering and math. We can do better than that. We must do better than that. If we’re going to make sure the good jobs of tomorrow stay in America … we need to make sure all our companies have a steady stream of skilled workers to draw from.” US President Obama

http://www.washingtonpost.com/blogs/44/post/obama-to-make-campaign-style-stops-in-nc-and-florida/2011/06/13/AGdu1mSH_blog.html

slide-56
SLIDE 56

S&P 500 CEOs

http://content.spencerstuart.com/sswebsite/pdf/lib/2005_CEO_Study_JS.pdf

slide-57
SLIDE 57

Undergrad Majors by Salary

slide-58
SLIDE 58

Deterrents

  • My GPA will be low
  • All the jobs are being outsourced
  • There are no special programs for people like

me

  • It’s too hard (but I want to be a manager)
  • I want to engage in something more noble or

community oriented