from sorting to heaps to compression
play

From Sorting to Heaps to Compression Data Compression video on - PowerPoint PPT Presentation

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg in browsers gzip, pkzip, compress, zip, ... for files (stacker?) Lossy compression, Lossless compression Huffman coding possible to


  1. From Sorting to Heaps to Compression ● Data Compression ➤ video on demand/set top box ➤ jpeg in browsers ➤ gzip, pkzip, compress, zip, ... for files (stacker?) ● Lossy compression, Lossless compression ● Huffman coding ➤ possible to implement, reasonably good ➤ uses lots of things we’ve studied, trees, priority queues, vectors, ... ➤ leads to more advanced techniques: Lempel-Ziv 19. 1 Duke CPS 100

  2. Priority Queues ● As an abstract data type (ADT) supports ➤ add/insert: put an element into the priority queue ➤ getMin: find the minimal (priority) element ➤ deleteMin: delete the minimal element ➤ ( possible to have maximal queue too ) ● Implement with different structures: ➤ sorted linked list, vector, binary search tree, heap insert getMin deleteMin linked-list vector search tree balanced tree 19. 2 Duke CPS 100

  3. Heap: a data structure for priority queues ● modeled on binary trees, but implemented with array/vector ➤ supports Insert and DeleteMin in O(log n) worst-case time ➤ supports FindMin in O(1) time and Insert in O(1) average- case time ● Consider the following sorting method, complexity? void HeapSort(Vector<string> & a, int numElts) { PQueue<string> pq; for(int k=0; k < numElts; k++) pq.insert(a[k]); for(int k=0; k < numElts; k++) pq.deleteMin(a[k]); } ● we’ll return to heap implementation to see how the performance guarantees are realized 19. 3 Duke CPS 100

  4. Towards Compression ● Each ASCII character is represented by 8 bits, one byte ➤ bit is a binary digit, byte is a binary term ➤ compress text: use fewer bits for frequent characters (does this come free?) ● 256 character values, 2 8 = 256, how many bits for 7 characters? for 38 characters? for 125 characters? go go gophers: 8 different characters ASCII 3 bits g 103 1100111 000 o 111 1101111 001 ASCII: 13 x 8 = 104 bits p 112 1110000 010 3 bit code: 13 x 3 = 39 bits h 104 1101000 011 e 101 1100101 100 compressed: ??? r 114 1110010 101 s 115 1110011 110 sp. 32 1000000 111 19. 4 Duke CPS 100

  5. Huffman coding: go go gophers ASCII 3 bits Huffman g o p h e r s * g 103 1100111 000 10 o 111 1101111 001 3 3 1 1 1 1 1 2 p 112 1110000 010 2 2 3 h 104 1101000 011 e 101 1100101 100 r 114 1110010 101 p e h r s * s 115 1110011 110 1 1 1 1 1 2 sp. 32 1000000 111 choose two fewest # occ’s ● 6 combine nodes, add occ’s ● g o repeat ● 4 3 3 2 2 How many bits? ● p h e r 1 1 1 1 19. 5 Duke CPS 100

  6. Properties of Huffman code ● Prefix property, no code is prefix of another code ● optimal per character compression ● Where do frequencies come from? a t r s e * ● decode: need tree 1000111101001110100000110101111011110001 19. 6 Duke CPS 100

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend