Source Encoding and Compression Jukka Teuhola University of Turku - - PowerPoint PPT Presentation

source encoding and compression
SMART_READER_LITE
LIVE PREVIEW

Source Encoding and Compression Jukka Teuhola University of Turku - - PowerPoint PPT Presentation

Source Encoding and Compression Jukka Teuhola University of Turku Dept. of Information Technology Spring 2014 SEAC-1 J.Teuhola 2014 1 General Self-study course, starting lecture: 14.1.2014 Extent: 5 sp (3 cu) Level: Advanced


slide-1
SLIDE 1

SEAC-1 J.Teuhola 2014 1

Source Encoding and Compression

Jukka Teuhola University of Turku

  • Dept. of Information Technology

Spring 2014

slide-2
SLIDE 2

SEAC-1 J.Teuhola 2014 2

General

Self-study course, starting lecture: 14.1.2014 Extent: 5 sp (3 cu) Level: Advanced Preliminary knowledge: Data structures and algorithms I,

basics of probability calculus

Material: Lecture notes and Powerpoint slides available

via the course homepage. No textbook is needed.

Homework: 10 small exercise tasks will be given.

Solutions must be submitted to the lecturer before taking the examination. Minimum: 5 solutions acceptably solved.

Examinations: Three attempts; March, April, May 2014

slide-3
SLIDE 3

SEAC-1 J.Teuhola 2014 3

Optional literature

  • T. C. Bell, J. G. Cleary, I. H. Witten: Text Compression,

1990.

  • R. W. Hamming: Coding and Information Theory, 2nd

ed., Prentice-Hall, 1986.

  • K. Sayood: Introduction to Data Compression, 3rd ed.,

Morgan Kaufmann, 2006.

  • K. Sayood: Lossless Compression Handbook, Academic

Press, 2003.

  • I. H. Witten, A. Moffat, T. C. Bell: Managing Gigabytes:

compressing and indexing documents and images, Morgan Kaufmann, 1999.

Miscellaneous articles

slide-4
SLIDE 4

SEAC-1 J.Teuhola 2014 4

Contents

  • 1. Basic concepts
  • 2. Coding-theoretic foundations
  • 3. Information-theoretic foundations
  • 4. Basic source coding methods
  • 5. Predictive models for text compression
  • 6. Dictionary models for text compression
  • 7. Compression of digital images
slide-5
SLIDE 5

SEAC-1 J.Teuhola 2014 5

  • 1. Basic concepts
  • Data compression:

Minimize the size of information representation. Reduce the redundancy of the original representation.

  • Purposes:

Save storage space. Reduce transmission time.

  • Basic approaches:

Lossless compression: decompression into exactly the original

form (typical for text).

Lossy compression: decompression into approximately the

  • riginal form (typical for signals and images).
slide-6
SLIDE 6

SEAC-1 J.Teuhola 2014 6

Basic concepts (cont.)

Fields of coding theory:

Source coding: purpose to minimize the size Channel coding: detection and correction of transmission errors. Also: cryptography: Encryption of private/secret information Source encoding Channel encoding Channel decoding Source decoding

Source Model Model Sink Communication channel Errors

slide-7
SLIDE 7

SEAC-1 J.Teuhola 2014 7

Basic concepts (cont.)

Phases of data compression:

Modelling of the source Source encoding (called also entropy coding), using the model

Other viewpoints:

Speed of compression / decompression Size of the model

Classification by lengths of coding units:

Fixed-to-fixed coding Variable-to-fixed coding Fixed-to-variable coding Variable-to-variable coding

slide-8
SLIDE 8

SEAC-1 J.Teuhola 2014 8

Examples of models

  • 1. Character

distribution Char Prob A 0.10 B 0.05 C 0.08 D 0.06 E 0.15 ….. …..

  • 2. Successor

distribution Char Succ Prob A A 0.01 A B 0.20 A C 0.10 A D 0.25 ….. ….. …… B A 0.15 B B 0.02 B C 0.01 B D 0.01 ….. ….. …..

  • 3. Dictionary

Word Prob ALL 0.02 ALWAYS 0.01 ARE 0.05 AS 0.03 AT 0.02 BASIC 0.01 BEGIN 0.01 ….. …..

slide-9
SLIDE 9

SEAC-1 J.Teuhola 2014 9

Basic concepts (cont.)

Main classes of text compression methods:

Dictionary methods Statistical methods

Classification based on availability of the source:

Off-line methods On-line methods

Classification based on the status of the model:

Static methods Semiadaptive methods Adaptive methods

Measurement of compression efficiency:

Compression ratio: Source size / compressed size Bits per source symbol (character, pixel, etc.)

slide-10
SLIDE 10

SEAC-1 J.Teuhola 2014 10

Illustration of a static method

Background knowledge

  • f the source data types

Model Model Source message Encoder Decoder Decoded message Derived once Send Use Use Read Write

slide-11
SLIDE 11

SEAC-1 J.Teuhola 2014 11

Illustration of a semiadaptive method

Model Model Source message Encoder Decoder Decoded message

  • 3. Send
  • 4. Send

Use

  • 5. Use
  • 2. Read
  • 6. Write
  • 1. Build
slide-12
SLIDE 12

SEAC-1 J.Teuhola 2014 12

Illustration of an adaptive method

Models are updated dynamically, based on the already processed part of the source, known to both encoder and decoder. Model Model Source message Encoder Decoder Decoded message Send Use Use Read Write Dynamic update Initial model fixed Initial model fixed Dynamic update Processed part Processed part