heaps and Hufgman codes 1 priority queues: motivation dynamically - - PowerPoint PPT Presentation

heaps and hufgman codes
SMART_READER_LITE
LIVE PREVIEW

heaps and Hufgman codes 1 priority queues: motivation dynamically - - PowerPoint PPT Presentation

heaps and Hufgman codes 1 priority queues: motivation dynamically changing list of events with dates want to fjnd next event quickly list of running programs, some more important (e.g. what user will notice being slow) choose most important


slide-1
SLIDE 1

heaps and Hufgman codes

1

slide-2
SLIDE 2

priority queues: motivation

dynamically changing list of events with dates

want to fjnd next event quickly

list of running programs, some more important (e.g. what user will notice being slow)

choose most important to run fjrst want to fjnd most important quickly

list of connections, some interactive (video call), some not (download)

want quick way to choose which one to service

data structure: priority queue

2

slide-3
SLIDE 3

priority queue ADT

insert(priority, item) findMin() — return item with lowest (fjrst) priority deleteMin() — remove item with lowest (fjrst) priority

3

slide-4
SLIDE 4

priority queue implementations

structure insert fjndMin deleteMin unsorted vector Θ(1) (amortized) Θ(n) Θ(n) unsorted linked list Θ(1) Θ(n) Θ(n) sorted vector Θ(n) Θ(1) Θ(1) sorted linked list Θ(n) Θ(1) Θ(1) balanced tree Θ(log n) Θ(log n) Θ(log n) binary heap Θ(log n) Θ(1) Θ(log n) Fibannoci heap amortized Θ(1) Θ(1) amortized Θ(log n) strict Fibannoci heap Θ(1) Θ(1) Θ(log n)

4

slide-5
SLIDE 5

priority queue implementations

structure insert fjndMin deleteMin unsorted vector Θ(1) (amortized) Θ(n) Θ(n) unsorted linked list Θ(1) Θ(n) Θ(n) sorted vector Θ(n) Θ(1) Θ(1) sorted linked list Θ(n) Θ(1) Θ(1) balanced tree Θ(log n) Θ(log n) Θ(log n) binary heap Θ(log n) Θ(1) Θ(log n) Fibannoci heap amortized Θ(1) Θ(1) amortized Θ(log n) strict Fibannoci heap Θ(1) Θ(1) Θ(log n)

4

slide-6
SLIDE 6

priority queue implementations

structure insert fjndMin deleteMin unsorted vector Θ(1) (amortized) Θ(n) Θ(n) unsorted linked list Θ(1) Θ(n) Θ(n) sorted vector Θ(n) Θ(1) Θ(1) sorted linked list Θ(n) Θ(1) Θ(1) balanced tree Θ(log n) Θ(log n) Θ(log n) binary heap Θ(log n) Θ(1) Θ(log n) Fibannoci heap amortized Θ(1) Θ(1) amortized Θ(log n) strict Fibannoci heap Θ(1) Θ(1) Θ(log n)

4

slide-7
SLIDE 7

additional, optional operations

not necessary to have a priority queue, but useful…

decreaseKey — change value of key given index/pointer remove — remove value with given index/pointer

structure decreaseKey remove unsorted vector Θ(1) Θ(n) unsorted linked list Θ(1) Θ(n) sorted vector Θ(n) Θ(n) sorted linked list Θ(n) Θ(1) balanced tree Θ(log n) Θ(log n) binary heap Θ(log n) Θ(log n) Fibannoci heap amortized Θ(1) amortized Θ(1) strict Fibannoci heap Θ(1) Θ(1)

5

slide-8
SLIDE 8

additional, optional operations

not necessary to have a priority queue, but useful…

decreaseKey — change value of key given index/pointer remove — remove value with given index/pointer

structure decreaseKey remove unsorted vector Θ(1) Θ(n) unsorted linked list Θ(1) Θ(n) sorted vector Θ(n) Θ(n) sorted linked list Θ(n) Θ(1) balanced tree Θ(log n) Θ(log n) binary heap Θ(log n) Θ(log n) Fibannoci heap amortized Θ(1) amortized Θ(1) strict Fibannoci heap Θ(1) Θ(1)

5

slide-9
SLIDE 9

additional, optional operations

not necessary to have a priority queue, but useful…

decreaseKey — change value of key given index/pointer remove — remove value with given index/pointer

structure decreaseKey remove unsorted vector Θ(1) Θ(n) unsorted linked list Θ(1) Θ(n) sorted vector Θ(n) Θ(n) sorted linked list Θ(n) Θ(1) balanced tree Θ(log n) Θ(log n) binary heap Θ(log n) Θ(log n) Fibannoci heap amortized Θ(1) amortized Θ(1) strict Fibannoci heap Θ(1) Θ(1)

5

slide-10
SLIDE 10

aside: min v max

can also have ADT with fjndMax/etc. instead of fjndMin/etc. same complexities, etc. (use difgerent comparisons) terms for heaps: “min-heap” (fjndMin version) or “max-heap” (fjndMax version)

6

slide-11
SLIDE 11

binary heaps

binary heap is a binary tree binary tree is not a binary search tree structure: almost a perfect tree

  • rdering: parent < child (everywhere in tree)

7

slide-12
SLIDE 12

perfect binary trees

A B D E C F G

a binary tree is perfect or complete if

all leaves have same depth all nodes have zero children (leaf) or two children

exactly the trees that achieve 2h − 1 nodes

8

slide-13
SLIDE 13

almost perfect/complete binary trees

A B D H I E J Xb C F Xc Xd G Xe Xf

heaps are almost complete trees

  • nly missing bottom-rightmost slots

9

slide-14
SLIDE 14

almost perfect/complete binary trees

A B D H I E J Xb C F Xc Xd G Xe Xf

heaps are almost complete trees

  • nly missing bottom-rightmost slots

9

slide-15
SLIDE 15

almost complete formally

single node tree is almost complete

  • therwise: almost complete if either

left child is complete with height h and right child almost complete with height h; OR left child is almost complete with height h and right child is complete with height h − 1

A B D E C F A B D E C

10

slide-16
SLIDE 16

trees as arrays

A B D H I E J Xb C F Xc Xd G Xe Xf

node A B C D E F G H I J index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

string theTree[17] = {"", "A", "B", ....} parentIndex = index / 2 leftChild = index * 2 rightChild = index * 2 + 1

11

slide-17
SLIDE 17

trees as arrays

A B D H I E J Xb C F Xc Xd G Xe Xf

node A B C D E F G H I J index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

string theTree[17] = {"", "A", "B", ....} parentIndex = index / 2 leftChild = index * 2 rightChild = index * 2 + 1

11

slide-18
SLIDE 18

trees as arrays

A B D H I E J Xb C F Xc Xd G Xe Xf

node A B C D E F G H I J index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

string theTree[17] = {"", "A", "B", ....} parentIndex = index / 2 leftChild = index * 2 rightChild = index * 2 + 1

11

slide-19
SLIDE 19

why arrays

single array — less storage/memory allocation represent tree as single vector

12

slide-20
SLIDE 20

the heap property

heap property: parent ≤ any of its children 10 20 40 50 700 60 80 99 85

13

slide-21
SLIDE 21

a non-heap

heap property: parent ≤ any of its children 10 20 30 15 80

14

slide-22
SLIDE 22

heap code

linked ofg slides page of repo

class binary_heap { ... private: // heap[1] is root // leftChildIndex = index * 2 // rightChildIndex = index * 2 + 1 // parentIndex = index / 2 vector<int> heap; int heap_size; }

15

slide-23
SLIDE 23

heap insert

add new node as leaf node while new node < parent node: swap with parent

10 30 40 50 70 60 X 80 99 85 10 30 40 50 70 60 25 80 99 85 10 30 40 50 70 25 60 80 99 85 10 25 40 50 700 30 60 80 99 85 insert(25)

16

slide-24
SLIDE 24

heap insert

add new node as leaf node while new node < parent node: swap with parent

10 30 40 50 70 60 X 80 99 85 10 30 40 50 70 60 25 80 99 85 10 30 40 50 70 25 60 80 99 85 10 25 40 50 700 30 60 80 99 85 insert(25)

16

slide-25
SLIDE 25

heap insert

add new node as leaf node while new node < parent node: swap with parent

10 30 40 50 70 60 X 80 99 85 10 30 40 50 70 60 25 80 99 85 10 30 40 50 70 25 60 80 99 85 10 25 40 50 700 30 60 80 99 85 insert(25)

16

slide-26
SLIDE 26

heap insert

add new node as leaf node while new node < parent node: swap with parent

10 30 40 50 70 60 X 80 99 85 10 30 40 50 70 60 25 80 99 85 10 30 40 50 70 25 60 80 99 85 10 25 40 50 700 30 60 80 99 85 insert(25)

16

slide-27
SLIDE 27

heap insert

add new node as leaf node while new node < parent node: swap with parent

10 30 40 50 70 60 X 80 99 85 10 30 40 50 70 60 25 80 99 85 10 30 40 50 70 25 60 80 99 85 10 25 40 50 700 30 60 80 99 85 insert(25)

16

slide-28
SLIDE 28

insert(int)

void binary_heap::insert(int x) { ++heap_size; heap.push_back(x); percolateUp(x); }

17

slide-29
SLIDE 29

percolateUp(int)

void binary_heap::percolateUp(int index) { int newValue = heap[index]; // while not at root and // less than parent... while (index > 1 && newValue < heap[index / 2]) { // move parent down heap[index] = heap[index / 2]; // advance up the tree index /= 2; } heap[index] = newValue; }

18

slide-30
SLIDE 30

insert runtime

worst case: log2 N nodes changed

19

slide-31
SLIDE 31

insert average case?

average case is better assuming random keys:

intuition: leafs have bottom half of values (on average) …so usually don’t need to move up …and if we do, parents of leafs have 25th to 50th percentile of values …so need to move up two steps even less about 2 steps moved up on average

20

slide-32
SLIDE 32

heap deleteMin

replace root with last leaf node while node greater than children: swap with smallest child

10 30 40 85 60 80 99 81 85 30 40 X 60 80 99 81 30 85 40 60 80 99 81

30 85 80

is a heap

80 30 85

not a heap 30 40 85 60 80 99 81 30 40 50 85 70 60 80 99 81 deleteMin()

21

slide-33
SLIDE 33

heap deleteMin

replace root with last leaf node while node greater than children: swap with smallest child

10 30 40 85 60 80 99 81 85 30 40 X 60 80 99 81 30 85 40 60 80 99 81

30 85 80

is a heap

80 30 85

not a heap 30 40 85 60 80 99 81 30 40 50 85 70 60 80 99 81 deleteMin()

21

slide-34
SLIDE 34

heap deleteMin

replace root with last leaf node while node greater than children: swap with smallest child

10 30 40 85 60 80 99 81 85 30 40 X 60 80 99 81 30 85 40 60 80 99 81

30 85 80

is a heap

80 30 85

not a heap 30 40 85 60 80 99 81 30 40 50 85 70 60 80 99 81 deleteMin()

21

slide-35
SLIDE 35

heap deleteMin

replace root with last leaf node while node greater than children: swap with smallest child

10 30 40 85 60 80 99 81 85 30 40 X 60 80 99 81 30 85 40 60 80 99 81

30 85 80

is a heap

80 30 85

not a heap 30 40 85 60 80 99 81 30 40 50 85 70 60 80 99 81 deleteMin()

21

slide-36
SLIDE 36

heap deleteMin

replace root with last leaf node while node greater than children: swap with smallest child

10 30 40 85 60 80 99 81 85 30 40 X 60 80 99 81 30 85 40 60 80 99 81

30 85 80

is a heap

80 30 85

not a heap 30 40 85 60 80 99 81 30 40 50 85 70 60 80 99 81 deleteMin()

21

slide-37
SLIDE 37

heap deleteMin

replace root with last leaf node while node greater than children: swap with smallest child

10 30 40 85 60 80 99 81 85 30 40 X 60 80 99 81 30 85 40 60 80 99 81

30 85 80

is a heap

80 30 85

not a heap 30 40 85 60 80 99 81 30 40 50 85 70 60 80 99 81 deleteMin()

21

slide-38
SLIDE 38

deleteMin code

int binary_heap::deleteMin() { if (heap_size == 0) throw ...; int result = heap[1]; heap[1] = heap[heap_size−−]; heap.pop_back(); percolateDown(1); return result; }

22

slide-39
SLIDE 39

precolateDown code

int binary_heap::percolateDown(int index) { int value = heap[index]; // while left child exists while (index * 2 <= heap_size) { int left = index * 2, right = index * 2 + 1; // set child to smallest child that exists int child = left; if (right <= heap_size && heap[right] < heap[left]) child = right; // if less than smallest, done if (value < heap[child]) break; // otherwise: heap[index] = heap[child]; // move child up index = child; // and traverse down } heap[index] = value; }

23

slide-40
SLIDE 40

deleteMin runtime

worst case Θ(log N) — move nodes from root to leaf

24

slide-41
SLIDE 41
  • ther heap operations?

decreaseKey/increaseKey

change value, then percolateUp/Down slow (Θ(N)) if you have to fjnd the value fast (Θ(log N)) if you already know where value is (one method: keep track of its index) faster (amortized Θ(1)/Θ(1)) in Fibanocci/strict Fibanocci heaps

remove

decreaseKey, then deleteMin

25

slide-42
SLIDE 42

core heap operations

insert — Θ(log N) worst case, better on “average” deleteMin — Θ(log N) fjndMin — Θ(1)

26

slide-43
SLIDE 43

heap sort

void heapSort(vector<T>& values) { binary_heap<T> heap; for (T x : values) heap.insert(x); values.clear(); while (!heap.empty()) { values.push_back(heap.deleteMin()); }

Θ(N log N) sort can be done in place with more careful implementation

(use values as the max-heap’s array, place sorted elements starting at end)

mostly not as fast in practice as comparable unstable sorts

27

slide-44
SLIDE 44

compression

compression 50KB webpage as 5KB download (a lot faster!) 100MB of machine code as 50MB download? movie of 24 1MB pictures/second into 10MB/minute fjle? …

28

slide-45
SLIDE 45

lossy compression

for audio, pictures, video, lossy compression is common intuition: you won’t notice if we make the pixel 0.25% darker

…and it had “noise” from camera sensor, etc. anyways

idea: model human perception write down most important parts of audio/image/etc.

important = noticed by humans

29

slide-46
SLIDE 46

lossless compression

lossless compression — reproduce original fjle rely on patterns example: text fjle has many more ‘e’s than ’!’s

…so choose shorter encoding for ‘e’ than ‘!’

example: computer-drawn images have lots of white space

…so have a way to represent “a big white rectangle” (instead of specifying each pixel)

30

slide-47
SLIDE 47

typical compression results

ratio = original size:fjnal size note: usually a compression ratio/speed tradeofg (not shown) lossless:

for English text or source code: about 4:1 for CD-quality audio: about 2:1 for photographs: about 2:1 for computer-drawn diagrams: about 5:1 to 20:1

lossy: (making a guess at what is “close enough” in quality)

for CD-quality audio: about 4:1 for standard defjnition TV video+audio: about 1:40

31

slide-48
SLIDE 48

a prefjx code

letter code a b 100 c 101 d 11 prefjx code no code is prefjx of another (no ambiguity) shorter codes for more frequent values (hopefully) b a a a c d a 100 0 0 0 101 100 0

32

slide-49
SLIDE 49

a prefjx code

letter code a b 100 c 101 d 11 prefjx code no code is prefjx of another (no ambiguity) shorter codes for more frequent values (hopefully) b a a a c d a 100 0 0 0 101 100 0

32

slide-50
SLIDE 50

a prefjx code

letter code a b 100 c 101 d 11 prefjx code no code is prefjx of another (no ambiguity) shorter codes for more frequent values (hopefully) b a a a c d a 100 0 0 0 101 100 0

32

slide-51
SLIDE 51

prefjx codes as trees

letter code a b 100 c 101 d 11 1 1 1 a b c d

33

slide-52
SLIDE 52

prefjx code cost

letter code frequency a 5/12 b 100 1/6 c 101 1/12 d 11 1/3 cost =

  • i

piri = 5 12 · 1 + 1 6 · 3 + 1 12 · 3 + 1 3 · 2 = 11 6 (bits per symbol) pi: probability symbol i occurs ri: length of code for i versus a=00,b=01,c=10,d=11: cost = (bits per symbol) how to fjnd minimum cost prefjx code (given frequencies)?

34

slide-53
SLIDE 53

prefjx code cost

letter code frequency a 5/12 b 100 1/6 c 101 1/12 d 11 1/3 cost =

  • i

piri = 5 12 · 1 + 1 6 · 3 + 1 12 · 3 + 1 3 · 2 = 11 6 (bits per symbol) pi: probability symbol i occurs ri: length of code for i versus a=00,b=01,c=10,d=11: cost = 2 (bits per symbol) how to fjnd minimum cost prefjx code (given frequencies)?

34

slide-54
SLIDE 54

high-level compression steps

read fjle, fjnd symbol frequencies choose best prefjx code (called Hufgman code) based on frequencies

best = assuming each code maps to one symbol

write prefjx code to output read fjle, convert to preifx code, write to output

input fjle chosen prefjx code input fjle using prefjx code

35

slide-55
SLIDE 55

high-level compression steps

read fjle, fjnd symbol frequencies choose best prefjx code (called Hufgman code) based on frequencies

best = assuming each code maps to one symbol

write prefjx code to output read fjle, convert to preifx code, write to output

input fjle chosen prefjx code input fjle using prefjx code

35

slide-56
SLIDE 56

fjnding the best prefjx code

build prefjx code tree from bottom up intuition 1: least frequent thing at bottom → use it fjrst

use case for a priority queue

intuition 2: combine less frequent symbols into more frequent group

work with partial prefjx trees

36

slide-57
SLIDE 57

running example and frequencies

if it is to be, it is up to me

symbol frequency symbol frequency b 1 p 1 e 2 s 2 f 1 t 4 i 5 u 1 m 1 , (comma) 1

  • 2

␣ (space)

9

37

slide-58
SLIDE 58

building the Hufgman tree (1)

b 1 f 1 m 1 p 1 u 1 , 1 e 2

  • 2

s 2 i 5 ␣ 9

list of partial prefjx trees labelled with total frequency of contained symbols goal: combine these into one prefjx tree

m 1 p 1 u 1 , 1

1

b f 2 e 2

  • 2

s 2 i 5 ␣ 9

combine two least frequent into partial prefjx tree new frequency = sum of old frequencies

u 1 , 1

1

m p 2

1

b f 2 e 2

  • 2

s 2 i 5 ␣ 9

38

slide-59
SLIDE 59

building the Hufgman tree (1)

b 1 f 1 m 1 p 1 u 1 , 1 e 2

  • 2

s 2 i 5 ␣ 9

list of partial prefjx trees labelled with total frequency of contained symbols goal: combine these into one prefjx tree

m 1 p 1 u 1 , 1

1

b f 2 e 2

  • 2

s 2 i 5 ␣ 9

combine two least frequent into partial prefjx tree new frequency = sum of old frequencies

u 1 , 1

1

m p 2

1

b f 2 e 2

  • 2

s 2 i 5 ␣ 9

38

slide-60
SLIDE 60

building the Hufgman tree (1)

b 1 f 1 m 1 p 1 u 1 , 1 e 2

  • 2

s 2 i 5 ␣ 9

list of partial prefjx trees labelled with total frequency of contained symbols goal: combine these into one prefjx tree

m 1 p 1 u 1 , 1

1

b f 2 e 2

  • 2

s 2 i 5 ␣ 9

combine two least frequent into partial prefjx tree new frequency = sum of old frequencies

u 1 , 1

1

m p 2

1

b f 2 e 2

  • 2

s 2 i 5 ␣ 9

38

slide-61
SLIDE 61

building the Hufgman tree (2)

u 1 , 1

1

m p 2

1

b f 2 e 2

  • 2

s 2 i 5 ␣ 9

1

u , 2

1

m p 2

1

b f 2 e 2

  • 2

s 2 t 4 i 5 ␣ 9

1

b f 2 e 2

  • 2

s 2

1 1 1

u , m p 4 t 4 i 5 ␣ 9

39

slide-62
SLIDE 62

building the Hufgman tree: alternatives

1

u , 2

1

m p 2

1

b f 2 e 2

  • 2

s 2 t 4 i 5 ␣ 9

1

u , 2

1

b f 2 e 2 s 2

1

  • m

p 3 t 4 i 5 ␣ 9

multiple choices of what to combine proof not shown: produce same quality prefjx tree

40

slide-63
SLIDE 63

building the Hufgman tree (3)

1

b f 2 e 2

  • 2

s 2

1 1 1

u , m p 4 t 4 i 5 ␣ 9

1 1

b f e 4

1

  • s

4

1 1 1

u , m p 4 t 4 i 5 ␣ 9

41

slide-64
SLIDE 64

building the Hufgman tree (4)

1 1

b f e 4

1

  • s

4

1 1 1

u , m p 4 t 4 i 5 ␣ 9 i 5

1 1 1 1

b f e

  • s

8

1 1 1 1

u , m p t 9 ␣ 9

42

slide-65
SLIDE 65

building the Hufgman tree (5)

i 5

1 1 1 1

b f e

  • s

8

1 1 1 1

u , m p t 9 ␣ 9

1 1 1 1

u , m p t 9 ␣ 9

1 1 1 1 1

b f e

  • s

i 13

43

slide-66
SLIDE 66

building the Hufgman tree (6)

1 1 1 1

u , m p t 9 ␣ 9

1 1 1 1 1

b f e

  • s

i 13

1 1 1 1 1

␣ u , m p t 18

1 1 1 1 1

b f e

  • s

i 13

44

slide-67
SLIDE 67

building the Hufgman tree (7)

1 1 1 1 1 1 1 1 1 1 1

␣ u , m p t b f e

  • s

i 31

45

slide-68
SLIDE 68

the fjnal Hufgman tree

1 1 1 1 1 1 1 1 1 1 1

␣ u , m p t b f e

  • s

i

letter code

␣ 00 u 01000 , 01001 m 01010 p 01011 t 011 b 10000 f 10001 e 1001

  • 1010

s 1011 i 11

46

slide-69
SLIDE 69

tree-building pseudocode

class PrefixTree { ... PrefixTree(char c, int frequency); PrefixTree(PrefixTree rightSide, PrefixTree leftSide); PrefixTree(const PrefixTree &other); ... }; ... PriorityQueue<PrefixTree> queue; for (char c, frequency f in inputFile) { queue.insert(PrefixTree(c, f)); } while (queue.size() > 1) { PrefixTree first = queue.deleteMin(); PrefixTree second = queue.deleteMin(); queue.insert(PrefixTree(first, second)); } return queue.deleteMin(); ...

47

slide-70
SLIDE 70

storing the prefjx code

fjle format for the lab:

space 00 u 01000 , 01001 m 01010 p 01011 t 011 b 10000 f 10001 e 1001

  • 1010

s 1011 i 11 48

slide-71
SLIDE 71

real format?

does this save space? probably if input fjle is big enough… but real compression formats use a more compact encoding

not having you do in lab to ease debugging/etc.

49

slide-72
SLIDE 72

what about the data?

in lab: the text 01111110011110…

  • bviously wastes a lot of space…

real compression: sequence of bytes, 8 bits per

extra work to extract bit-by-bit, match with prefjx code

50

slide-73
SLIDE 73

last time

an application for trees and heaps: Hufgman coding goal: lossless compression divide document into symbols (e.g. characters) choose variable length encodings for symbols

prefjx code — no code prefjx of another represented by a tree

Hufgman coding — product optimal cost prefjx codes:

priority queue (ordered by frequency) of partial prefjx code trees build from bottom up, least frequent fjrst

51

slide-74
SLIDE 74

decoding

load the code into a prefjx code tree then, read bits, traversing tree until leaf psuedocode:

while (there are more bits) { PrefixTreeNode *current = root; while (current is not a leaf) { if (next bit is 0) current = current−>left; else current = current−>right; }

  • utput(current−>symbol);

}

52

slide-75
SLIDE 75

example

letter code a b 100 c 101 d 11 1 1 1 a b c d 11 100 0 101 0 0 11 = dba caad

53

slide-76
SLIDE 76

example

letter code a b 100 c 101 d 11 1 1 1 a b c d 11 100 0 101 0 0 11 = dba caad

53

slide-77
SLIDE 77

example

letter code a b 100 c 101 d 11 1 1 1 a b c d 11 100 0 101 0 0 11 = dba caad

53

slide-78
SLIDE 78

example

letter code a b 100 c 101 d 11 1 1 1 a b c d 11 100 0 101 0 0 11 = dba caad

53

slide-79
SLIDE 79

example

letter code a b 100 c 101 d 11 1 1 1 a b c d 11 100 0 101 0 0 11 = dba caad

53

slide-80
SLIDE 80

example

letter code a b 100 c 101 d 11 1 1 1 a b c d 11 100 0 101 0 0 11 = dba caad

53

slide-81
SLIDE 81

example

letter code a b 100 c 101 d 11 1 1 1 a b c d 11 100 0 101 0 0 11 = dba caad

53

slide-82
SLIDE 82

example

letter code a b 100 c 101 d 11 1 1 1 a b c d 11 100 0 101 0 0 11 = dba caad

53

slide-83
SLIDE 83

example

letter code a b 100 c 101 d 11 1 1 1 a b c d 11 100 0 101 0 0 11 = dba caad

53

slide-84
SLIDE 84

lab preview

pre-lab: compression in-lab: decompression post-lab report

54

slide-85
SLIDE 85

pre-lab

write a program to… calculate letter frequencies of input use binary heap to build hufgman tree

  • utput encoding mapping (format specifjed in lab)
  • utput encoded message

55

slide-86
SLIDE 86

pre-lab tools

heap code supplied in slides fjle I/O code provided (fileio.cpp)

  • r see getWordInTable.cpp from lab 6
  • r see http://www.cplusplus.com/doc/tutorial/files/
  • r see ifstream documentation

56

slide-87
SLIDE 87

a note on ASCII

the American standard character codes

7-bit charcters (extra bit left over in bytes) ASCII or superset used to represent English text

128 characters (95 printable, 33 non-printable) Wikipedia article as table/details

57

slide-88
SLIDE 88

ASCII codes

for lab: only worry about “printable” ASCII characters

byte values 0x20 to 0x7e

special case: 0x20 = ‘space’ no other whitespace characters used

(output character in table as itself…)

58

slide-89
SLIDE 89

heap example

linked ofg slides page as binary_heap.h binary_heap.cpp you may use for lab

59

slide-90
SLIDE 90

heap declaration: public

class binary_heap { public: binary_heap(); binary_heap(vector<int> vec); ~binary_heap(); void insert(int x); int findMin(); int deleteMin(); unsigned int size(); void makeEmpty(); bool isEmpty(); void print(); ... };

60

slide-91
SLIDE 91

heap declaration: private

class binary_heap { ... private: vector<int> heap; unsigned int heap_size; void percolateUp(int hole); void percolateDown(int hole); };

61

slide-92
SLIDE 92

vector heap

vector<int> heap — vector representing binary tree, using rules

shown before

heap[0] is unused heap[1] is root heap[i * 2] is left child of node i heap[i * 2 + 1] is right child of node i int heap_size is its size

(even though heap.size() - 1 could have been used instead…)

62

slide-93
SLIDE 93

binary_heap::binary_heap(vec)

constructor to initialize from unsorted vector equivalent to repeated insertion… recall: in-place heap sort — similar to what’s happening here…

binary_heap::binary_heap(vector<int> vec) : heap_size(vec.size()) { heap = vec; heap.push_back(heap[0]); heap[0] = 0; for ( int i = heap_size/2; i > 0; i− − ) percolateDown(i); }

63

slide-94
SLIDE 94

binary_heap::binary_heap(vec)

constructor to initialize from unsorted vector equivalent to repeated insertion… recall: in-place heap sort — similar to what’s happening here…

binary_heap::binary_heap(vector<int> vec) : heap_size(vec.size()) { heap = vec; heap.push_back(heap[0]); heap[0] = 0; for ( int i = heap_size/2; i > 0; i− − ) percolateDown(i); }

63

slide-95
SLIDE 95

fjndMin/size/etc.

int binary_heap::findMin() { if ( heap_size == 0 ) throw "findMin() ␣ called ␣

  • n

␣ empty ␣ heap"; return heap[1]; } unsigned int binary_heap::size() { return heap_size; } bool binary_heap::isEmpty() { return heap_size == 0; } void binary_heap::makeEmpty() { heap_size = 0; heap.resize(1); }

64

slide-96
SLIDE 96

print

void binary_heap::print() { cout << "(" << heap[0] << ") ␣ "; for ( int i = 1; i <= heap_size; i++ ) { cout << heap[i] << " ␣ "; // next line from from http://tinyurl.com/mf9tbgm bool isPow2 = (((i+1) & ~(i))==(i+1))? i+1 : 0; if ( isPow2 ) cout << endl << "\t"; } cout << endl; }

65