From the Past to the Present Vitaly Skachek Institute of Computer - - PowerPoint PPT Presentation

from the past to the present
SMART_READER_LITE
LIVE PREVIEW

From the Past to the Present Vitaly Skachek Institute of Computer - - PowerPoint PPT Presentation

Coding Theory: From the Past to the Present Vitaly Skachek Institute of Computer Science University of Tartu Some used images are courtesy of Wikipedia/Wikimedia Commons Communications Model k bits R = k/n n bits c x Source Encoder 0101


slide-1
SLIDE 1

Coding Theory: From the Past to the Present

Vitaly Skachek

Institute of Computer Science University of Tartu

Some used images are courtesy of Wikipedia/Wikimedia Commons

slide-2
SLIDE 2

Communications Model

Source Channel Decoder Encoder Destination x c y c

0101 0101100 0111000 0101100

k bits n bits R = k/n

slide-3
SLIDE 3

Communications Channels

1 ? 1 1 1

1- p 1- p 1- p 1- p p p p p

Binary Symmetric Channel Binary Erasure Channel

slide-4
SLIDE 4

Shannon’s Channel Coding Theorems

  • A code is a mapping from the

set of all vectors of length k to a set of vectors of length n (over alphabet Σ)

  • Given a channel S, there is a

quantity C(S) called channel capacity

Claude Shannon (1916-2001)

slide-5
SLIDE 5

Shannon’s Channel Coding Theorems

For any rate R < C(S), there exists an infinite sequence of block codes 𝐷𝑗 of growing lengths 𝑜𝑗 such that

𝑙𝑗 𝑜𝑗 ≥ 𝑆 , and there exists

a coding scheme for those codes such that the decoding error probability approaches 0 as 𝑗 → ∞.

slide-6
SLIDE 6

Shannon’s Channel Coding Theorems

For any rate R < C(S), there exists an infinite sequence of block codes 𝐷𝑗 of growing lengths 𝑜𝑗 such that

𝑙𝑗 𝑜𝑗 ≥ 𝑆 , and there exists

a coding scheme for those codes such that the decoding error probability approaches 0 as 𝑗 → ∞. Let R > C(S). For any infinite sequence of block codes 𝐷𝑗 of growing lengths 𝑜𝑗 such that

𝑙𝑗 𝑜𝑗 ≥ 𝑆 , and for any coding scheme

for those codes, the decoding error probability is bounded away from 0 as 𝑗 → ∞.

slide-7
SLIDE 7

Communications Channels

1 ? 1 1 1

1- p 1- p 1- p 1- p p p p p

Binary Symmetric Channel Binary Erasure Channel

slide-8
SLIDE 8

Communications Channels

1 ? 1 1 1

1- p 1- p 1- p 1- p p p p p

Binary Symmetric Channel Binary Erasure Channel C(S)=1-ℎ2 𝑞 C(S)=1-𝑞

slide-9
SLIDE 9

Communications Channels

1 ? 1 1 1

1- p 1- p 1- p 1- p p p p p

Binary Symmetric Channel Binary Erasure Channel C(S)=1-ℎ2 𝑞

ℎ2 𝑦 = −𝑦 log𝑦 − 1 − 𝑦 𝑚𝑝𝑕(1 − 𝑦)

C(S)=1-𝑞

slide-10
SLIDE 10

Communications Model

Source Channel Decoder Encoder Destination x c y c

0101 0101100 0111000 0101100

k bits n bits R = k/n

slide-11
SLIDE 11

Parameters in Consideration

  • Target: optimize the code rate R = k/n.

Other parameters in considerations:

  • Speed of convergence Pr (err) → 0 as 𝑜 → ∞.

Low error probability for short lengths is needed!

  • Time complexity of encoding and decoding
  • algorithms. Structured codes are needed!
slide-12
SLIDE 12

Distance

  • The Hamming distance between

𝑦 = 𝑦1, 𝑦2, … , 𝑦𝑜 and 𝑧 = 𝑧1, 𝑧2, … , 𝑧𝑜 , 𝑒 𝑦, 𝑧 , is the number of pairs of symbols (𝑦𝑗, 𝑧𝑗), such that 𝑦𝑗 ≠ 𝑧𝑗.

  • The minimum distance of a code C is

𝑒 = min

𝑦,𝑧∈𝐷,𝑦≠𝑧 𝑒 𝑦, 𝑧

slide-13
SLIDE 13

Linear Codes

  • A code 𝐷 over field F is a linear [n, k, d] code if there

exists a matrix 𝐼 with n columns and rank n − k such that 𝐼 ⋅ 𝑑𝑈 = 0𝑈 ⟺ 𝑑 ∈ 𝐷.

  • The matrix H is called a parity-check matrix.
  • The value k is called the dimension of the code 𝐷.
  • The ratio R = k/n is called the rate of the code 𝐷.
  • All words of 𝐷 are exactly all linear combinations of

rows of a generating k × n matrix G.

slide-14
SLIDE 14

Sphere-packing idea

slide-15
SLIDE 15

Sphere-packing idea

slide-16
SLIDE 16

Sphere-packing idea

𝑒 − 1 2

slide-17
SLIDE 17

Sphere-packing idea

Decoding

slide-18
SLIDE 18

Reed-Solomon Codes

  • Let 𝛽1, 𝛽2, … , 𝛽𝑜 ∈ 𝐺 be n distinct elements.
  • The generator matrix:

𝐻 = 1 1 … 1 𝛽1 𝛽2 … 𝛽𝑜 𝛽1

2

𝛽2

2

… 𝛽𝑜

2

⋮ ⋮ ⋱ ⋮ 𝛽1

𝑙−1

𝛽2

𝑙−1

… 𝛽𝑜𝑙−1

  • Satisfies the Singleton bound: n = d + k – 1
  • Optimal trade-off between the parameters
slide-19
SLIDE 19

Reed-Solomon Codes (cont.)

  • Encoding:

𝑦0𝑦1 … 𝑦𝑙−1 ⋅ 1 1 … 1 𝛽1 𝛽2 … 𝛽𝑜 𝛽1

2

𝛽2

2

… 𝛽𝑜

2

⋮ ⋮ ⋱ ⋮ 𝛽1

𝑙−1

𝛽2

𝑙−1

… 𝛽𝑜𝑙−1

slide-20
SLIDE 20

Polynomial Interpolation Viewpoint

  • Input vector [𝑦0𝑦1 … 𝑦𝑙−1] is associated with

polynomial 𝑄 𝑨 = 𝑦𝑙−1𝑨𝑙−1 + 𝑦𝑙−2𝑨𝑙−2 + 𝑦1𝑨 + 𝑦0

  • Encoding is a substitution:

𝑄 𝛽1 , 𝑄 𝛽2 , … , 𝑄 𝛽𝑜

  • Decoding is an interpolation by degree ≤ 𝑙 − 1

polynomial

slide-21
SLIDE 21

Reed-Solomon Codes are Used in:

  • Wired and wireless

communications

  • Satellite communications
  • Hard drives and

compact disks

  • Flash memory devices
slide-22
SLIDE 22

Application of Reed-Solomon Codes

  • Shamir’s Secret-Sharing Scheme ’79
  • n users
  • 1 key (number in F)
  • Any coalition of < 𝑢 users

does not have any information about the key

  • Any coalition of ≥ 𝑢 users

can recover the key

Adi Shamir

slide-23
SLIDE 23

Shamir’s Secret Sharing Scheme

slide-24
SLIDE 24

Shamir’s Secret Sharing Scheme

slide-25
SLIDE 25

Shamir’s Secret Sharing Scheme

slide-26
SLIDE 26

Shamir’s Secret Sharing Scheme (cont.)

  • Select randomly 𝑦1, 𝑦2, … , 𝑦𝑙−1. Let 𝑦0 be a

secret key. Construct polynomial 𝑄 𝑨 = 𝑦𝑙−1𝑨𝑙−1 + 𝑦𝑙−2𝑨𝑙−2 + 𝑦1𝑨 + 𝑦0

  • Give (𝛽𝑗, 𝑄 𝛽𝑗 ) to user 𝑗
  • Large coalition has enough points to

reconstruct the polynomial

  • Small coalition has no information about the

polynomial

slide-27
SLIDE 27

List-decoding of Reed-Solomon Codes

slide-28
SLIDE 28

List-decoding of Reed-Solomon Codes

slide-29
SLIDE 29

List-decoding of Reed-Solomon Codes

slide-30
SLIDE 30

List-decoding of Reed-Solomon Codes

  • Sudan ‘97, Guruswami ‘99, Vardy-Parvaresh ‘05,

Guruswami-Rudra ‘06

Madhu Sudan Venkatesan Guruswami

slide-31
SLIDE 31

List Decoding of RS Codes

Voyager 1 – the first manmade

  • bject to leave the Solar System.

Launched in 1977.

slide-32
SLIDE 32

Turbo Codes

Berrou, Glavieux and Thitimajshima (Telecom Bretagne) ’93

  • Non-algebraic codes!
  • “Killer” of algebraic

coding theory

slide-33
SLIDE 33

Low-Density Parity-Check Codes

  • Gallager ’62
  • Urbanke, Richardson and Shokrollahi ’01
  • Parity-check matrix H is sparse
  • Performance extremely close

to channel capacity

  • Decoding complexity

linear in n

1 2 3 4 5 1 2 3

H =

1 1 1 1 1 1 1 1

Tanner graph:

slide-34
SLIDE 34

Low-Density Parity-Check Codes

  • Belief-propagation decoding algorithm

(message-passing algorithm)

(Pr(0),Pr(1))

slide-35
SLIDE 35

Low-Density Parity-Check Codes

Pr(0) = 0.2, Pr(1) = 0.8 Pr(0) = 0.4, Pr(1) = 0.6

slide-36
SLIDE 36

Low-Density Parity-Check Codes

Pr(0) = 0.2, Pr(1) = 0.8 Pr(0) = 0.4, Pr(1) = 0.6 Pr(0) = 0.56, Pr(1) = 0.44

slide-37
SLIDE 37

Reed-Solomon Codes are Used in:

  • Wired and wireless

communications

  • Satellite communications
  • Hard drives and

compact disks

  • Flash memory devices
slide-38
SLIDE 38

LDPC Codes are Used in:

  • Wired and wireless

communications

  • Satellite communications
  • Hard drives and

compact disks

  • Flash memory devices
slide-39
SLIDE 39

Emerging Applications

  • f Coding Theory
slide-40
SLIDE 40

Flash memories

  • Easy to add electric charge,

hard to remove

  • The charge “leaks” with

the time

  • Neighboring cells influence

each other

Flash memory cell     

slide-41
SLIDE 41

Flash memories

  • Rank modulation
  • The information is represented using relative

levels of charge, invariant to leakage

  • Coding over permutations

Jiang, Mateescu, Schwartz, Bruck ‘2006

slide-42
SLIDE 42

Flash memories

slide-43
SLIDE 43

Networking

  • Raptor Codes
  • A. Shokrollahi ‘2004
  • Used in DVB-H standard for IP datacast for handheld devices
slide-44
SLIDE 44

Networking

  • Raptor Codes
slide-45
SLIDE 45

Networking

  • Raptor Codes
slide-46
SLIDE 46

Networking

  • Raptor Codes
  • Possible solution: ARQs (retransmissions) – slow!
  • Alternative: large error-correcting code
slide-47
SLIDE 47

Networking

  • Raptor Codes
slide-48
SLIDE 48

Network coding

  • Butterfly network

x

y

Ahlswede, Cai, Li and Yeung, 2000

slide-49
SLIDE 49

Network coding

  • Butterfly network

x

y

x

y y y

slide-50
SLIDE 50

Network coding

  • Butterfly network

x

y

x x x

y

slide-51
SLIDE 51

Network coding

  • Butterfly network

x

y

x x+y

y

x

y

x+y x+y

slide-52
SLIDE 52

Network coding

  • The number of bits deliverable

to each destination is equal to min-cut between source and each of destinations

  • Avalanche P2P Network

(Microsoft, 2005)

  • Experiments for use in

mobile communications

x

y

slide-53
SLIDE 53

Gossip Algorithms

  • n users in the network
  • k of them possess a rumor (packet of data) – each

rumor is different

  • Each users “calls” another user randomly and

sends a rumor to him

  • Purpose: to distribute all rumors to all users
  • Using coding: send a random linear combination
  • f all rumors in your possession

– Facilitates convergence of the algorithm

Deb, Medard and Choute 2006

slide-54
SLIDE 54

Gossip Algorithms

  • Rumor spreading problem
slide-55
SLIDE 55

Gossip Algorithms

  • n users in the network
  • k of them possess a rumor (packet of data) – each

rumor is different

  • Each users “calls” another user randomly and

sends a rumor to him

  • Purpose: to distribute all rumors to all users
  • Using coding: send a random linear combination
  • f all rumors in your possession

– Facilitates convergence of the algorithm

Deb, Medard and Choute 2006

slide-56
SLIDE 56

Distributed Storage

  • Huge amounts of data stored by big data companies

(Google, Amazon, Facebook, Dropbox)

Facebook data center in Oregon Server room at Wikipedia data center

slide-57
SLIDE 57

Distributed data storage

x y x+ y

Dimakis, Godfrey, Wu, Wainwright, Ramchandran ‘2008

slide-58
SLIDE 58

Distributed data storage

x y x+ y

slide-59
SLIDE 59

Distributed data storage

x y x+ y

slide-60
SLIDE 60

Distributed data storage

slide-61
SLIDE 61

Distributed data storage

  • Classical error-correcting codes can be employed
  • Local correction is needed (using few other servers) to

facilitate the correction

slide-62
SLIDE 62

DNA Analysis

slide-63
SLIDE 63

String Reconstruction Problem

  • Four amino acids: A, F, G, C
  • The composition of each protein can be deduced

from its weight

  • Each protein-sequence bond is cut independently

with the same probability

Acharya, Das, Milenkovic, Orlitsky, and Pan '2011

AFGCCGA

AFGC CCG CGA GCCA

slide-64
SLIDE 64

String Reconstruction Problem

  • Binary alphabet {0,1}

Acharya, Das, Milenkovic, Orlitsky, and Pan '2011

0010011

0010 100 011 001