Objectives Review Huffman Codes Introducing Divide and Conquer - - PDF document

objectives
SMART_READER_LITE
LIVE PREVIEW

Objectives Review Huffman Codes Introducing Divide and Conquer - - PDF document

3/6/19 Objectives Review Huffman Codes Introducing Divide and Conquer Algorithms March 6, 2019 CSCI211 - Sprenkle 1 Towards Huffman Codes What problem are we trying to solve? Binary tree rules: Each leaf node is a letter


slide-1
SLIDE 1

3/6/19 1

Objectives

  • Review Huffman Codes
  • Introducing Divide and Conquer Algorithms

March 6, 2019 CSCI211 - Sprenkle 1

Towards Huffman Codes

  • What problem are we trying to solve?
  • Binary tree rules:

Ø Each leaf node is a letter Ø Follow path to the letter

  • Going left: 0
  • Going right: 1

March 6, 2019 CSCI211 - Sprenkle 2

Given the mapping, how do you build the binary tree for this mapping?

slide-2
SLIDE 2

3/6/19 2

Recursively Generate Tree

  • All letters are in root node
  • For all letters in node

Ø If encoding begins with 0, letter belongs in left subtree Ø Otherwise (encoding begins with 1), letter belongs in right subtree Ø If last bit of encoding, make the letter a leaf node of that subtree Ø Shift encoding one bit Ø Process left and right children

March 6, 2019 CSCI211 - Sprenkle 3

Tree Properties

  • What is the length of a letter’s encoding?
  • Define our optimal goal in tree terms

March 6, 2019 CSCI211 - Sprenkle 4

slide-3
SLIDE 3

3/6/19 3

Tree Properties

  • What is the length of a letter’s encoding?

Ø Length of path from root to leaf à its depth

  • Define our optimal goal in tree terms

Ø ABL = Σx∈Sfx |γ(x)| = Σx∈Sfx depth(x)

March 6, 2019 CSCI211 - Sprenkle 5

Tree Properties

  • What do we want our tree to look like for the
  • ptimal solution?

Ø How many leaves? Ø How many internal nodes?

  • Think about parent nodes vs. child nodes

Ø When uniform frequencies? Ø Nonuniform frequencies?

March 6, 2019 CSCI211 - Sprenkle 6

slide-4
SLIDE 4

3/6/19 4

Tree Properties

  • Claim. The binary tree T corresponding to the
  • ptimal prefix code is full, i.e., each internal node

has two children.

  • Proof?

March 6, 2019 CSCI211 - Sprenkle 7

Tree Properties

  • Claim. The binary tree T corresponding to the
  • ptimal prefix code is full, i.e., each internal node

has two children.

  • Proof. Assume that T has an internal node with
  • nly one child

Ø Without loss of generality, assume left child

March 6, 2019 CSCI211 - Sprenkle 8

u v: root of Subtree u v

? ?

slide-5
SLIDE 5

3/6/19 5

Tree Properties

  • Claim. The binary tree T corresponding to the
  • ptimal prefix code is full, i.e., each internal node

has two children.

  • Proof. Assume that T has an internal node with
  • nly one child

March 6, 2019 CSCI211 - Sprenkle 9

u v: root of Subtree u v v

Replace u with v à decrease depth à original wasn’t optimal

v: root of Subtree

Toward a Solution…

  • Two problems to solve:

Ø Creating the prefix code tree Ø Labeling the prefix code tree with alphabet/frequencies

March 6, 2019 CSCI211 - Sprenkle 10

slide-6
SLIDE 6

3/6/19 6

Simplifying: Know Optimal Prefix Code

  • Process: assume knowledge of optimal solution to

gain insight into finding solution

  • Assume we knew the tree structure of the optimal

prefix code, how would you label the leaf nodes?

March 6, 2019 CSCI211 - Sprenkle 11

Increasing frequency

Combining Our Conclusions

  • The binary tree corresponding to the optimal

prefix code is full, i.e., each internal node has two children

  • We want to label the leaf nodes of the binary

tree corresponding to the optimal prefix code such that nodes with greatest depth have least frequency

March 6, 2019 CSCI211 - Sprenkle 12

What does this mean the bottom of our tree should look like?

slide-7
SLIDE 7

3/6/19 7

Combining Our Conclusions

  • The binary tree corresponding to the optimal

prefix code is full, i.e., each internal node has two children

  • We want to label the leaf nodes of the binary

tree corresponding to the optimal prefix code such that nodes with greatest depth have least frequency

March 6, 2019 CSCI211 - Sprenkle 13

What does this mean the bottom

  • f our tree should look like?

fn-1 fn

2 letters with least frequency: Could be flipped

How Can We Use This?

  • Two letters with least frequency are definitely

going to be siblings

Ø Tie them together Ø Their parent is a “meta-letter”

  • Frequency is sum of fn + fn-1

March 6, 2019 CSCI211 - Sprenkle 14

fn + fn-1 fn-1 fn

2 letters with least frequency: Could be flipped Meta-letter:

slide-8
SLIDE 8

3/6/19 8

Constructing an Optimal Prefix Code

March 6, 2019 CSCI211 - Sprenkle 15

Huffman’s Algorithm:

Replace lowest-freq letters with meta letter R e d u c e Build up To construct a prefix code for an alphabet S with given frequencies: if S has two letters: Encode one letter as 0 and the other letter as 1 else: Let y* and z* be the two lowest-frequency letters Form a new alphabet S’ by deleted y* and z* and replacing them with a new letter w of freq fy* + fz* Recursively construct a prefix code y’ for S’ with tree T’ Define a prefix code for S as follows: Start with T’ Take the leaf labeled w and add two children below it labeled y* and z*

Constructing an Optimal Prefix Code: Alternative Description

  • 1. Create a leaf node for each symbol, labeled by

its frequency, and add to a queue

  • 2. While there is more than one node in the queue

a) Remove the two nodes of lowest frequency b) Create a new internal node with these two nodes as children and with frequency equal to the sum of the two nodes' probabilities c) Add the new node to the queue

  • 3. The remaining node is the tree’s root node

March 6, 2019 CSCI211 - Sprenkle 16

slide-9
SLIDE 9

3/6/19 9

Creating the Optimal Prefix Code

March 6, 2019 CSCI211 - Sprenkle 17

fa= .32 fb = .25 fc = .20 fd = .18 fe = .05

Creating the Optimal Prefix Code

March 6, 2019 CSCI211 - Sprenkle 18

fa = .32 fb = .25 fc = .20 fd = .18 fe = .05

e d c a b de= .23 Lowest frequencies Merge

slide-10
SLIDE 10

3/6/19 10

Creating the Optimal Prefix Code

March 6, 2019 CSCI211 - Sprenkle 19

fa = .32 fb = .25 fc = .20 fde = .23

e d c a b de= .23 Lowest frequencies Merge cde= .43

Creating the Optimal Prefix Code

March 6, 2019 CSCI211 - Sprenkle 20

fa = .32 fb = .25 fcde = .43

e d c a b

de= .23 Lowest frequencies Merge cde= .43 ab= .57

slide-11
SLIDE 11

3/6/19 11

Creating the Optimal Prefix Code

March 6, 2019 CSCI211 - Sprenkle 21

fab = .57 fcde = .43

e d c a b

de= .23

Lowest frequencies Merge

cde= .43 ab= .57 abcde =1

What are the resulting encodings? What is the ABL? fa = .32 fb = .25 fc = .20 fd = .18 fe = .05

Creating the Optimal Prefix Code

March 6, 2019 CSCI211 - Sprenkle 22

e d c a b

de= .23 cde= .43 ab= .57 abcde =1 1 1 1 1

a: 00 b: 01 c: 10 d: 110 e: 111 fa = .32 fb = .25 fc = .20 fd = .18 fe = .05

ABL=.32*2 + .25*2 + .20*2 + .18*3 + .05*3 = .64 + .5 + .4 + .54 + .15 = 2.23

I chose to build the tree this way. What if I had switched the order of the children?

slide-12
SLIDE 12

3/6/19 12

Implementation

  • What data structures do we need?

March 6, 2019 CSCI211 - Sprenkle 23

Implementation

  • What data structures do we need?

Ø Binary tree for the prefix codes Ø Priority queue for choosing the node with lowest frequency

  • Where are the costs?

March 6, 2019 CSCI211 - Sprenkle 24

slide-13
SLIDE 13

3/6/19 13

Running Time

  • Costs

Ø Inserting and extracting node into PQ: O(log n) Ø Number of insertions and extractions: O(n) Ø O(n log n)

March 6, 2019 CSCI211 - Sprenkle 25

Analysis of Algorithm’s Optimality

  • 2 page proof in book

March 6, 2019 CSCI211 - Sprenkle 26

slide-14
SLIDE 14

3/6/19 14

Real-life Compression

  • Text can be compressed well because of known

frequencies

  • Algorithms can be optimized to languages

Ø More than just “z doesn’t happen very often”

  • “z doesn’t happen after q”

March 6, 2019 CSCI211 - Sprenkle 27

DIVIDE AND CONQUER ALGORITHMS

March 6, 2019 CSCI211 - Sprenkle 28

slide-15
SLIDE 15

3/6/19 15

Divide-and-Conquer

  • Divide-and-conquer process

Ø Break up problem into several parts Ø Solve each part recursively Ø Combine solutions to sub-problems into overall solution

  • Most common usage:

Ø Break up problem of size n into two equal parts of size ½n Ø Solve two parts recursively Ø Combine two solutions into overall solution

March 6, 2019 CSCI211 - Sprenkle 29

Divide et impera. Veni, vidi, vici.

  • Julius Caesar

Discussion

  • What is a well-known divide and conquer

algorithm?

March 6, 2019 CSCI211 - Sprenkle 30

Merge Sort

slide-16
SLIDE 16

3/6/19 16

Merge Sort

  • How does Merge Sort work?
  • When do we stop?

March 6, 2019 CSCI211 - Sprenkle 31

Merge Sort

March 6, 2019 CSCI211 - Sprenkle 32

Divide list into two lists Until only 2 elements Sort elements Combine sorted lists (how?)

slide-17
SLIDE 17

3/6/19 17

RECURRENCE RELATIONS

March 6, 2019 CSCI211 - Sprenkle 33

Analyzing Merge Sort

  • Def. T(n) = number of comparisons to mergesort

an input of size n

  • Want to say a bit more about what T(n) is

Ø Break it down more…

March 6, 2019 CSCI211 - Sprenkle 34

General Template

  • Break up problem of size n into two equal parts of

size ½n

  • Solve two parts recursively
  • Combine two solutions into overall solution

What can we say about the running time w.r.t. to the different parts of the above template?

slide-18
SLIDE 18

3/6/19 18

Analyzing Merge Sort

  • Def. T(n) = number of comparisons to mergesort

an input of size n

  • Want to say a bit more about what T(n) is

Ø Break it down more…

March 6, 2019 CSCI211 - Sprenkle 35

General Template

  • Break up problem of size n into two equal parts of

size ½n

  • Solve two parts recursively
  • Combine two solutions into overall solution

O(n) T(n/2) + T(n/2) O(1) What is the base case? Its running time?

Merge Sort’s Recurrence Relation

March 6, 2019 CSCI211 - Sprenkle 36

MergeSort( L[1…n] ): if len(L) == 1: return L if len(L) == 2: compare the two entries in L, swap if necessary return L A = MergeSort MergeSort(L[:n/2]) B = MergeSort MergeSort(L[n/2+1:]) M = Merge(A, B) return M T(n/2) T(n/2) O(n)

T(n) = 2T(n/2) + O(n)

Base cases

slide-19
SLIDE 19

3/6/19 19

Looking Ahead

  • Problem Set 6 due Friday

March 6, 2019 CSCI211 - Sprenkle 37