15-853:Algorithms in the Real World Fountain codes and Raptor codes - - PowerPoint PPT Presentation

15 853 algorithms in the real world
SMART_READER_LITE
LIVE PREVIEW

15-853:Algorithms in the Real World Fountain codes and Raptor codes - - PowerPoint PPT Presentation

15-853:Algorithms in the Real World Fountain codes and Raptor codes Start with compression 15-853 Page1 The random erasure model We will continue looking at recovering from erasures Q: Why erasure recovery is quite useful in real-world


slide-1
SLIDE 1

15-853 Page1

15-853:Algorithms in the Real World

  • Fountain codes and Raptor codes
  • Start with compression
slide-2
SLIDE 2

15-853 Page2

The random erasure model

We will continue looking at recovering from erasures Q: Why erasure recovery is quite useful in real-world applications? Hint: Internet Packets over the Internet often gets lost (or delayed) and packets have sequence numbers!

slide-3
SLIDE 3

Recap: Fountain Codes

  • Randomized construction
  • Targeting “erasures”
  • A slightly different view on codes: New metrics
  • 1. Reception overhead
  • how many symbols more than k needed to decode
  • 2. Probability of failure to decode
  • Overcoming following demerits of RS codes:
  • 1. Encoding and decoding complexity high
  • 2. Need to fix “n” beforehand

15-853 Page3

slide-4
SLIDE 4

Recap: Ideal properties of Fountain Codes

  • 1. Source can generate any number of coded symbols
  • 2. Receiver can decode message symbols from any subset

with small reception overhead and with high probability

  • 3. Linear time encoding and decoding complexity

“Digital Fountain”

15-853 Page4

slide-5
SLIDE 5

Recap: LT Codes

  • First practical construction for Fountain Codes
  • Graphical construction
  • Encoding algorithm
  • Goal: Generate coded symbols from message symbols
  • Steps:
  • Pick a degree d randomly from a “degree distribution”
  • Pick d distinct message symbols
  • Coded symbols = XOR of these d message symbols

15-853 Page5

slide-6
SLIDE 6

Recap: LT Codes Encoding

Pick a degree d randomly from a “degree distribution” Pick d distinct message symbols Coded symbols = XOR of these d message symbols

15-853 Page6

Message symbols Coded symbols

slide-7
SLIDE 7

Recap: LT Codes Decoding

Goal: Decode message symbols from the received symbols Algorithm: Repeat following steps until failure or stop successfully

1. Among received symbols, find a coded symbol of degree 1 2. Decode the corresponding message symbol 3. XOR the decoded message symbol to all other received symbols connected to it 4. Remove the decoded message symbols and all its edges from the graph 5. Repeat if there are unrecovered message symbols

15-853 Page7

slide-8
SLIDE 8

LT Codes: Decoding

15-853 Page8

Message symbols Received symbols

values

slide-9
SLIDE 9

Recap: Encoding and Decoding Complexity

Think: Number of XORs => #Edges in the graph #Edges is determined by Degree distribution

15-853 Page9

slide-10
SLIDE 10

Recap: Degree distribution

Denoted by PD(d) for d = 1,2,…,k Simplest degree distribution: “One-by-one” distribution: Pick only one source symbols for each encoding symbol. Excepted reception overhead? Coupon collector problem! Huge overhead: k=1000 => 10x overhead!!

15-853 Page10

Reception overhead: k ln k

slide-11
SLIDE 11

Degree distribution

Q: How to fix this issue? Need higher degree edges Ideal Soliton Distribution

15-853 Page11

slide-12
SLIDE 12

Peek into the analysis

Analysis proceeds as follows: Index stages by #message symbols known At each stage one message symbol is processed and removed from its neighbor coded symbols All coded symbols which subsequently have only one of the remaining message symbols as a neighbor is said to “release” that message symbol Overall release probability: r(m) : probability that a coded symbol release a msg symbol at stage m

15-853 Page12

slide-13
SLIDE 13

Peek into the analysis

Claim: Ideal soliton distribution has a uniform release probability, i.e., r(m) = 1/k for all m = 1, 2, …, k Proof: uses an interesting variant of balls and bins (we will cover it later in the course) Q: If we start with k received symbols, expected number of symbols released at stage m? One. Q: Is this good enough?

15-853 Page13

  • No. Since actual ≠ expected
slide-14
SLIDE 14

Peek into the analysis

Q: How to fix this issue? Need to boost lower degree nodes Robust Soliton distribution: Normalized sum of ideal Soliton distribution and t(d) ( t(d) boosts lower degree values )

15-853 Page14

slide-15
SLIDE 15

Peek into the analysis

Theorem: Under Robust Soliton degree distribution, the decoder fails to recover all the msg symbols with prob at most d from any set coded symbols of size: And, the number of operations on average used for encoding each coded symbol: And, the number of operations on average used for decoding:

15-853 Page15

slide-16
SLIDE 16

Peek into the analysis

So, even Robust Soliton does not achieve the goal of linear enc/dec complexity… The ln(k/d) terms comes due to the same reason why we had ln(k) in the coupon collector problem. Lets revisit that.. Q: Why do we need so many draws in the coupon collector problem when we want to collect ALL coupons? Last few coupons require a lot of draws since... probability of seeing a distinct coupons keeps decreasing.

15-853 Page16

slide-17
SLIDE 17

Peek into the analysis

Q: Is there a way to overcome this ln(k/d) hurdle? No way out if we want to decode ALL message symbols… Simple: Don’t aim to decode all message symbols! Wait a minute… what? Q: What do we do for message symbols not decoded? Encode the msg symbols using an easy to decode classical code and then perform LT encoding! “Pre-code”

15-853 Page17

slide-18
SLIDE 18

Raptor codes

Encode the msg symbols using an easy to decode classical code and then perform LT encoding! “Pre-code” Raptor Codes = Pre-code + LT encoding

15-853 Page18

slide-19
SLIDE 19

Raptor codes

Theorem: Raptor codes can generate infinite stream of coded symbols s.t. for any 𝜗 > 0

  • 1. Any subset of size k (1 + 𝜗) is sufficient to recover the
  • riginal k symbols with high prob
  • 2. Num. operations needed for each coded symbol
  • 3. Num. operations needed for decoding msg symbols

Linear encoding and decoding complexity! Included in wireless standards, multimedia communication standards as RaptorQ

15-853 Page19

slide-20
SLIDE 20

DATA COMPRESSION

We move onto the next module

15-853 Page20

slide-21
SLIDE 21

15-853 Page 21

Compression in the Real World

Generic File Compression – Files: gzip (LZ77), bzip2 (Burrows-Wheeler), BOA (PPM) – Archivers: ARC (LZW), PKZip (LZW+) – File systems: NTFS Communication – Fax: ITU-T Group 3 (run-length + Huffman) – Modems: V.42bis protocol (LZW), MNP5 (run-length+Huffman) – Virtual Connections

slide-22
SLIDE 22

15-853 Page 22

Compression in the Real World

Multimedia – Images: gif (LZW), jbig (context), jpeg-ls (residual), jpeg (transform+RL+arithmetic) – Video: Blue-Ray, HDTV (mpeg-4), DVD (mpeg-2) – Audio: iTunes, iPhone, PlayStation 3 (AAC) Other structures – Indexes: google, lycos – Meshes (for graphics): edgebreaker – Graphs – Databases

slide-23
SLIDE 23

15-853 Page 23

Encoding/Decoding

Will use “message” in generic sense to mean the data to be compressed Encoder Decoder Input Message Output Message Compressed Message

The encoder and decoder need to understand common compressed format.

slide-24
SLIDE 24

15-853 Page 24

Lossless vs. Lossy

Lossless: Input message = Output message Lossy: Input message » Output message Lossy does not necessarily mean loss of quality. In fact the

  • utput could be “better” than the input.

– Drop random noise in images (dust on lens) – Drop background in music – Fix spelling errors in text. Put into better form.

slide-25
SLIDE 25

15-853 Page 25

How much can we compress?

Q: Can we (lossless) compress any kind of messages? No! For lossless compression, assuming all input messages are valid, if even one string is compressed, some other must expand. Q: So what we do need in order to be able to compress? Can compress only if some messages are more likely than

  • ther.

That is, there needs to be bias in the probability distribution.

slide-26
SLIDE 26

15-853 Page 26

Model vs. Coder

To compress we need a bias on the probability of

  • messages. The model determines this bias

Model Coder Probs. Bits Messages Encoder Example models: – Simple: Character counts, repeated strings – Complex: Models of a human face

slide-27
SLIDE 27

15-853 Page 27

Quality of Compression

For Lossless? Runtime vs. Compression vs. Generality For Lossy? Loss metric (in addition to above) For reference: Several standard corpuses to compare algorithms.

  • 1. Calgary Corpus
  • 2. The Archive Comparison Test and the Large Text

Compression Benchmark maintain a comparison of a broad set of compression algorithms.

slide-28
SLIDE 28

INFORMATION THEORY BASICS

15-853 Page 28

slide-29
SLIDE 29

Information Theory

  • Quantifies and investigates “information”
  • Fundamental limits on representation and transmission of

information – What’s the minimum number of bits needed to represent data? – What’s the minimum number of bits needed to communicate data? – What’s the minimum number of bits needed to secure data?

15-853 Page 29

slide-30
SLIDE 30

Information Theory

Claude E. Shannon – Landmark 1948 paper: mathematical framework – Proposed and solved key questions – Gave birth to information theory

slide-31
SLIDE 31

15-853 Page 31

Information Theory

In the context of compression: An interface between modeling and coding Entropy – A measure of information content Suppose a message can take n values from S = {s1,…,sn} with a probability distribution p(s). One of the n values will be chosen. “How much choice” is involved? OR “How much information is needed to convey the value chosen?

slide-32
SLIDE 32

Entropy

Q: Should it depend on the values {s1,…,sn}? (e.g., American names vs. European names) No. Q: Should it depend on p(s)? Yes. If P(s1)=1 and rest are all 0? No choice. Entropy = 0 More the bias lower the entropy

15-853 Page 32

slide-33
SLIDE 33

15-853 Page 33

Entropy

For a set of messages S with probability p(s), s ÎS, the self information of s is: Measured in bits if the log is base 2. Entropy is the weighted average of self information.

H S p s p s

s S

( ) ( )log ( ) =

Î

å

1 i s p s p s ( ) log ( ) log ( ) = = - 1

slide-34
SLIDE 34

Entropy

Shannon (1948 paper) lists key properties that an entropy function should satisfy and shows that log is the only function. Intuition for the log function:

  • When p(s) is low, entropy should be high
  • Suppose two independent messages are being picked then

entropy should add up <board>

15-853 Page 34

slide-35
SLIDE 35

Entropy Example

Binary random variable (i.e., taking two values) with probability p and 1-p Denoted as H2(p): <board> Highest entropy when equiprobable (true for n >2 as well)

15-853 Page 35

slide-36
SLIDE 36

15-853 Page 36

Entropy Example

p S ( ) {. ,. ,. ,. ,. } = 25 25 25 125 125

25 . 2 8 log 125 . 2 4 log 25 . 3 ) ( = ´ + ´ = S H

p S ( ) {. ,. ,. ,. ,. } = 5 125 125 125 125 p S ( ) {. ,. ,. ,. ,. } = 75 0625 0625 0625 0625

2 8 log 125 . 4 2 log 5 . ) ( = ´ + = S H

3 . 1 16 log 0625 . 4 ) 3 4 log( 75 . ) ( = ´ + = S H

slide-37
SLIDE 37

15-853 Page 37

Conditional Entropy

Conditional entropy: Information content based on a context The conditional probability p(s|c) is the probability of s in a context c. The conditional self information is

i s c p s c ( | ) log ( | ) = -

slide-38
SLIDE 38

15-853 Page 38

Conditional Entropy

The conditional entropy is the weighted average of the conditional self information

å å

Î Î

÷ ÷ ø ö ç ç è æ =

C c S s

c s p c s p c p C S H ) | ( 1 log ) | ( ) ( ) | (

slide-39
SLIDE 39

Types of “sources”

  • Sources generate the messages (to be compressed)
  • Sources can be modelled in multiple ways
  • Independent and identically distributed (i.i.d) source

– Prob. of each msg is independent of the previous msg

  • Markov source

– message sequence follows a Markov model (specifically Discrete Time Markov Chain, aka DTMC)

15-853 Page 39

slide-40
SLIDE 40

15-853 Page 40

Example of a Markov Chain

w b p(b|w) p(w|b) p(w|w) p(b|b) .9 .1 .2 .8

slide-41
SLIDE 41

15-853 Page 41

Entropy of the English Language

How can we measure the information per character? ASCII code = 7 Entropy = 4.5 (based on character probabilities) Huffman codes (average) = 4.7 Unix Compress = 3.5 Gzip = 2.6 Bzip = 1.9 Entropy = 1.3 (for “text compression test”) Must be less than 1.3 for the English language.

slide-42
SLIDE 42

15-853 Page 42

Shannon’s experiment

Asked humans to predict the next character given the whole previous text. He used these as conditional probabilities to estimate the entropy of the English Language. The number of guesses required for right answer: From the experiment he predicted H(English) = .6 - 1.3

# of guesses 1 2 3 4 5 > 5 Probability .79 .08 .03 .02 .02 .05