Coding and Applications in Sensor Networks Why coding? Information - - PowerPoint PPT Presentation

coding and applications in sensor networks why coding
SMART_READER_LITE
LIVE PREVIEW

Coding and Applications in Sensor Networks Why coding? Information - - PowerPoint PPT Presentation

Coding and Applications in Sensor Networks Why coding? Information compression Robustness to errors (error correction codes) Two categories: Source coding Channel coding Source coding Compression. What is the


slide-1
SLIDE 1

Coding and Applications in Sensor Networks

slide-2
SLIDE 2

Why coding?

  • Information compression
  • Robustness to errors (error correction

codes)

  • Two categories:

– Source coding – Channel coding

slide-3
SLIDE 3

Source coding

  • Compression.
  • What is the minimum number of bits to represent

certain information? What is a measure of information?

  • Entropy, Information theory.
slide-4
SLIDE 4

Channel coding

  • Achieve fault tolerance.
  • Transmit information through a noisy channel.
  • Storage on a disk. Certain bits may be flipped.
  • Goal: recover the original information.
  • How? duplicate information.
slide-5
SLIDE 5

Source coding and Channel coding

  • Source coding and channel coding can be

separately optimized without hurting the performance. Source Coding Channel Coding 01100011 0110 Noisy Channel Decode 01100 11100 Decompress 0110 01100011

slide-6
SLIDE 6

Coding in sensor networks

  • Compression

– Sensors generate too much data. – Nearby sensor readings are correlated.

  • Fault tolerance

– Communication failures. Corrupted messages by a noisy

  • channel. Interference.

– Node failures – fault tolerance storage. – Adversary inject false information.

slide-7
SLIDE 7

Channels

  • The media through which information is passed

from a sender to a receiver.

  • Binary symmetric channel: each symbol is flipped

with probability p.

  • Erasure channel: each symbol is replaced by a “?”

with probability p.

  • We first focus on binary symmetric channel.
slide-8
SLIDE 8

Encoding and decoding

  • Encoding:
  • Input: a string of length k, “data”.
  • Output: a string of length n>k, “codeword”.
  • Decoding:
  • Input: some string of length n (might be corrupted).
  • Output: the original data of length k.
slide-9
SLIDE 9

Error detection and correction

  • Error detection: detect whether a string is a valid

codeword.

  • Error correction: correct it to a valid codeword.
  • Maximum likelihood Decoding: find the codeword

that is “closest” in Hamming distance, I.e., with minimum # flips.

  • How to find it?
  • For small size code, store a codebook. Do table

lookup.

  • NP-hard in general.
slide-10
SLIDE 10

Scheme 1: repetition

  • Simplest coding scheme one can come up with.
  • Input data: 0110010
  • Repeat each bit 11 times.
  • Now we have
  • 00000000000111111111111111111111100000000

000000000000001111111111100000000000

  • Decoding: do majority vote.
  • Detection: when the 10 bits don’t agree with each
  • ther.
  • Correction: 5 bits of error.
slide-11
SLIDE 11

Scheme 2: Parity-check

  • Add one bit to do parity check.
  • Sum up the number of “1”s in the string. If it is even,

then set the parity check bit to 0; otherwise set the parity check bit to 1.

  • Eg. 001011010, 111011111.
  • Sum of 1’s in the codeword is even.
  • 1-bit parity check can detect 1-bit error. If one bit is

flipped, then the sum of 1s is odd.

  • But can not detect 2 bits error, nor can correct 1-bit

error.

slide-12
SLIDE 12

More on parity-check

  • Encode a piece of data into codeword.
  • Not every string is a codeword.
  • After 1 bit parity check, only strings with even 1s

are valid codeword.

  • Thus we can detect error.
  • Minimum Hamming distance between any two

codewords is 2.

  • Suppose we make the min Hamming distance

larger, then we can detect more errors and also correct errors.

slide-13
SLIDE 13

Scheme 3: Hamming code

  • Intuition: generalize the parity bit and organize

them in a nice way so that we can detect and correct more errors.

  • Lower bound: If the minimum Hamming distance

between two code words is k, then we can detect at most k-1 bits error and correct at most k/2 bits error.

  • Hamming code (7,4): adds three additional check

bits to every four data bits of the message to correct any single-bit error, and detect all two-bit errors.

slide-14
SLIDE 14

Hamming code (7, 4)

  • Coding: multiply the data with the encoding matrix.
  • Decoding: multiply the codeword with the decoding

matrix.

slide-15
SLIDE 15

An example: encoding

  • Input data:
  • Codeword:

Original data is preserved Systematic code: the first k bits is the data.

slide-16
SLIDE 16

An example: decoding

  • Decode:
  • Now suppose there is an error at the ith bit.
  • We received
  • Now decode:
  • This picks up the ith column of the decoding vector!
slide-17
SLIDE 17

An example: decoding

  • Suppose
  • Decode:
  • Data more than 4 bits? Break it into chunks and

encode each chunk.

Second bit is wrong!

slide-18
SLIDE 18

Linear code

  • Most common category.
  • Succinct specification, efficient encoding and error-

detecting algorithms – simply matrix multiplication.

  • Code space: a linear space with dimension k.
  • By linear algebra, we find a set of basis
  • Code space:
  • Generator matrix
slide-19
SLIDE 19

Linear code

  • Null space of dimension n-k:
  • Parity check matrix.
  • Error detection: check
  • Hamming code is a linear code on alphabet {0,1}. It

corrects 1 bit and detects 2 bits error.

slide-20
SLIDE 20

Linear code

  • A linear code is called systematic if the first k bits is

the data.

  • Generation matrix G:
  • If n=2k and P is invertible, then the code is called

invertible.

  • A message m maps to
  • Parity bits can be used to recover m.
  • Detect more errors? Bursty errors?

Ik×k Pk×(n-k)

m Pm Parity bits

slide-21
SLIDE 21

Reed Solomon codes

  • Most commonly used code, in CDs/DVDs.
  • Handles bursty errors.
  • Use a large alphabet and algebra.
  • Take an alphabet of size q>n and n distinct

elements

  • Input message of length k:
  • Define the polynomial
  • The codeword is
slide-22
SLIDE 22

Reed Solomon codes

  • Rephrase the encoding scheme.
  • Unknowns (variables): the message of length k
  • What we know: some equations on the unknowns.
  • Each of the coded bit gives a linear equation on the

k unknowns. A linear system.

  • How many equations do we need to solve it?
  • We only need length k coded information to solve

all the unknowns.

slide-23
SLIDE 23

Reed Solomon codes

  • Write the linear system by matrix form:
  • This is the Van de Ment matrix. So it’s invertible.
  • This code can tolerate n-k errors.
  • Any k bits can recover the original message.

2 1 1 1 1 1 2 1 1 2 2 2 2 2 1 1

( ) 1 ( ) 1 ... ... ... ... ... ... ( ) 1

k k k k k k k k

c C c C c C α α α α α α α α α α α α

− − − −

                  =                        

slide-24
SLIDE 24

Plan

  • Network coding
  • Coding in wireless communication
  • Coding in storage systems
slide-25
SLIDE 25

Part I: Network Coding

slide-26
SLIDE 26

Existing network

  • Independent data stream sharing the

same network resources

– Packets over the Internet – Signals in a phone network – An analog: cars sharing a highway.

  • Information flows are separated.
  • What about we mix them?
slide-27
SLIDE 27

Why do we want to mix information flows?

  • The core notion of network coding is to

allow and encourage mixing of data at intermediate network nodes.

  • R. Ahlswede, N. Cai, S.-Y. R. Li, and R. W. Yeung,

"Network Information Flow", IEEE Transactions on Information Theory, IT-46, pp. 1204-1216, 2000.

slide-28
SLIDE 28

Network coding increases throughput

  • Butterfly network
  • Multi-cast: throughput increases from 1 to

2.

⊕ ⊕

slide-29
SLIDE 29

Network coding saves energy & delay in wireless networks

  • A wants to send packet a to C.
  • C wants to send packet b to A.
  • B performs coding

slide-30
SLIDE 30

Linear coding is enough

  • Linear code: basically take linear

combinations of input packets.

– Not concatenation! – 3a+5b: has the same length as a, b. – + is xor in a field of 2.

  • Even better: random linear coding is

enough.

– Choose coding coefficients randomly.

slide-31
SLIDE 31

Encode

  • Original packets: M1, M2, , Mn.
  • An incoming packet is a linear combination
  • f the original packets
  • X=g1M1+g2M2++gnMn.
  • g=(g1, g2, , gn) is the encoding vector.
  • Encoding can be done recursively.
slide-32
SLIDE 32

An example

  • At each node: do linear encoding of the

incoming packets.

  • Y=h1X1+h2X2+h3X3
  • Encoding vector is attached with the

packet.

slide-33
SLIDE 33

Decode

  • To recover the original packets M1, M2, ,

Mn.

  • Receive m (scrambled) packets.
  • How to recover the n unknowns?

– First, m ≥ n. – The good thing is, m=n is sufficient.

  • Received packets: Y1, Y2, , Yn.
slide-34
SLIDE 34

Coding scheme

  • To decode, we have the linear system:
  • Yi=ai1M1+ai2M2++ainMn
  • As long as the coefficients are

independent we can solve the linear system.

  • Theorem: (1) There is a deterministic

encoding algorithm; (2) Random linear coding is good, with high probability.

slide-35
SLIDE 35

Practical considerations (1)

  • Decoding: receiver keeps a decoding

matrix recording the packets it received so far.

  • When a new packet comes in, its coding

vector is inserted at the bottom of the matrix, then perform Gaussian elimination.

  • When the matrix is solvable, we are done.
slide-36
SLIDE 36

Practical consideration (2)

  • Control the decoding effort (size of the

matrix).

  • Group packets into generations. Packets

in the same generation are encoded.

  • Delay: in practice typically not much higher.
slide-37
SLIDE 37

Implication of network coding

  • Successful reception of information

– does not depend on the receiving packet content. – But rather depend on receiving a sufficient number of independent packets.

slide-38
SLIDE 38

What next?

  • Benefits of network coding
  • Applications of network coding
slide-39
SLIDE 39

Throughput (capacity) with coding

  • Multi-cast problem: one source, N receivers in a

directed network. Each edge has a maximum capacity.

  • Without coding, the maximum throughput routing

is NP-hard.

  • Proof: reduction from Steiner tree packing.
  • Nodes share network resources.
slide-40
SLIDE 40

Throughput (capacity) with coding

  • With coding each destination can ignore

the other destinations.

  • Receiving data rate = min-cut.
  • As if the user is using the entire network

by itself.

  • Offer throughput benefit for unicast as well.
slide-41
SLIDE 41

Example: butterfly network

  • 2 uni-cast flow. Data rate=2.
slide-42
SLIDE 42

Example: butterfly network

  • S1R2, S2R1. Data rate=1.
slide-43
SLIDE 43

Summary on throughput gain

  • In directed graph, the throughput gain by

network coding can be arbitrarily large.

  • In undirected graph, the throughput gain is at

most 2.

  • Without coding, the max throughput routing is

NP-hard.

  • With coding, the max throughput coding is

achievable by linear programming.

– Just decide on the rate of each edge. – Ignore the content.

slide-44
SLIDE 44

Robustness and stability

  • Each encoded packet is “equally important”
  • pportunistic routing.
  • A, C may go sleep randomly without telling
  • B. B sends a⊕b, so whoever wakes up

can get new information.

slide-45
SLIDE 45

Example I: application in gossip algorithm

  • Assume n nodes each holding a packet, all of

them want all packets.

  • Gossip algorithm: in each round each node

picks randomly another node and exchange 1 message.

  • Question: what is the number of rounds (in

expectation)?

  • Answer: O(n logn).
  • Why? --- coupon collection problem
slide-46
SLIDE 46

Gossip with coding

  • Aka. algebraic gossip.
  • Each node encodes with random linear

combination of incoming (received) messages.

  • O(n) is enough with high probability.
  • This is optimal.
slide-47
SLIDE 47

Example II: packet erasure networks

  • Want 2 things: low delay and high rate.
  • But, packets get dropped in the middle.
  • Two approaches in the literature:

– Repeat: low delay, low rate. – Error correction: max rate, high delay.

  • With coding, we can achieve OPT rate & delay.
slide-48
SLIDE 48

Example II: packet erasure networks

  • Model packet loss

– Congestion, buffer overflow, fading in wireless channels.

1. End-to-end acknowledgement & Retransmission (TCP style)

– Retransmissions use up resources. – Multicast: possibly only a subset of nodes need retransmission.

2. Erasure error correction codes

  • First code the original k data items into n pieces.
  • Recover the original data from any k pieces.
slide-49
SLIDE 49

3-node example

  • Erasure channel: with probability e(AB),

e(BC) packet disappears on the link.

  • A retransmit, B simply forward.

– Throughput: # (re)-transmissions for a packet to reach C = 1/[(1-e(AB))(1-e(BC))]. – Why?

slide-50
SLIDE 50

3-node example

  • Erasure coding on each link separately.

– Node A encode k data into k/(1-e(AB)) pieces. – Roughly B gets k data pieces, reconstruct. – Do the same at link BC. – Extra delay for reconstruction at B. – # transmissions for one packet to reach C is 1/(1-e(AB)) + 1/(1-e(BC))

slide-51
SLIDE 51

3-node example: why coding helps

  • Node B does not bother decode or reconstruct,

instead, send random linear combination of what is received

– No delay for decoding in the middle. – Node A encode k data into k/(1-e(AB)) pieces. – Roughly B gets k data pieces. – B again boost up to k/(1-e(BC)) pieces. – C is able to reconstruct.

slide-52
SLIDE 52

Applications of network coding: P2P

  • Avalanche (http://research.microsoft.com/~pablo/avalanche.aspx)
  • BitTorrent-style P2P sharing with network coding.

– Big file gets chopped into small pieces. – Randomly coded. – Participants share their coded pieces.

  • Why use coding?

– Topology of the P2P users is hard to know. – Optimal packet scheduling for large files is difficult. – Robustness to user join/leave. – Easy to incorporate incentive mechanisms (prevent free-riding).

slide-53
SLIDE 53

Wireless networks

  • Wireless links are broadcast nature.
  • Bidirectional traffic for a path:

– First alternate for two directions. – Then use coding/wireless broadcast. – Double the capacity.

slide-54
SLIDE 54

A perfect match: Wireless networks + network coding

  • Wireless channels are lousy. Make use of
  • verhearing for opportunistic routing.

– Residential wireless mesh network – Many-to-many broadcast (or gossip algorithm with broadcast).

  • Network coding is good with

– Large network – No global topology information – Unreliable links/nodes.

slide-55
SLIDE 55

Sensor networks + network coding

  • Radio calibration

– Tuning them to the same channel is energy costly. – Channel assignment to maximize throughput is highly non-trivial.

  • Untuned (non-calibrated) radios

– Two devices may not be able to communicate. – With a group of them, the change that there exists two with the same channel is high. (Birthday paradox) – But we don’t know which pair can communicate. – Send coded packets blindly. – Even multi-hop works (without discovering the path).

slide-56
SLIDE 56

Birthday paradox

  • In a room of n people, Prob{No two people

have the same birthday}.

slide-57
SLIDE 57

Birthday paradox

2 people with the same birthday There is one guy with the same birthday as you.

slide-58
SLIDE 58

Network Tomography

  • Network diagnosis of loss rate of links.
  • In a multi-cast tree, the receivers that miss the

same packet can derive the failed link.

  • With coding one can get more detailed

information about the failure links from the pattern of the received codes.

– Active diagnosis – Passive network monitoring.

slide-59
SLIDE 59

Security

  • Information gets smashed.
  • Protection from eavesdroppers.

– It is difficult to interpret and short-term overhear does not work.

  • Packet modification is harder too.

– Need to fake data that makes sense – Challenge: no idea about the original data packets.

  • Jamming

– Less of a problem. Jamming a few packets does not affect a large set of data packets much.

slide-60
SLIDE 60

Summary

  • Network coding is good for

– Scalability – Limited topological information – Highly dynamic network

  • Key insights

– Treat the packets equally – No need to read the content, just do counting – Anything helps. – Don’t need to know what is where.

slide-61
SLIDE 61

References

  • A 6-page network coding introduction.
  • C. Fragouli, J. Le Boudec, Jorg Widmer,

Network coding: an instant primer.

  • Network coding webpage:

http://tesla.csl.uiuc.edu/~koetter/NWC/

  • A book: Network coding theory.
slide-62
SLIDE 62

Coding in storage

slide-63
SLIDE 63

Use coding for fault tolerance

  • If a sensor die, we lose the data.
  • For fault tolerance, we have to duplicate data s.t.

we can recover the data from other sensors.

  • Straight-forward solution: duplicate it at other

places.

  • Storage size goes up!
  • Use coding to keep storage size as the same.
  • What we pay: decoding cost.
slide-64
SLIDE 64

Problem setup

  • Setup: we have k data nodes, and n>k storage

nodes (data nodes may also be storage nodes).

  • Each data node generates one piece of data.
  • Each storage node only stores one piece of (coded)

data.

  • We want to recover data by using any k storage

nodes.

  • Sounds familiar? Reed Solomon code.
  • But it is centralized -- we need all the k inputs to

generate the coded information.

slide-65
SLIDE 65

Distributed random linear code

  • Each node sends its data

to m=O(lnk) random storage nodes.

  • A storage node may

receive multiple pieces of data c1, c2, … ck, but it stores a random combination of them. E.g., a1c1+a2c2+…+akck, where a’s are random coefficients.

slide-66
SLIDE 66

Coding and decoding

  • Storage size is kept almost the same as before.
  • The random coefficients can be generated by a pseudo-

random generator. Even if we store the coefficients, the size is not much.

  • Claim: we can recover the original k pieces of data from any k

storage nodes.

  • Think of the original data as unknowns (variables).
  • Each storage node gives a linear equation on the unknowns

a1c1+a2c2+…+akck = s.

  • Now we take k storage nodes and look at the linear system.
slide-67
SLIDE 67

Coding and decoding

  • Take k storage nodes at random.

n by k matrix

Storage nodes Data nodes Each column has m non-zeros placed randomly

k by k k =Coded info Need to argue that this matrix has full rank, I..e, invertible.

slide-68
SLIDE 68

Main theorem

  • A bipartite graph G=(X, Y), |X|=k, |Y|=k.
  • X: the data nodes; Y: the k storage nodes.
  • Edmond’s theorem: the matrix has full rank if the

bipartite graph has a perfect matching.

  • Now, we only need to show that the bipartite graph

G has a perfect matching with high probability.

slide-69
SLIDE 69

Main theorem

  • Upper bound: if each data node picks O(lnk)

storage randomly, the bipartite graph G has a perfect matching with high probability.

  • Lower bound: Ω(lnk) is necessary.
  • Proof:

– Any storage node has to have at least one piece of data. – Otherwise, the matrix has a zero row! – Throw data randomly to cover all the storage nodes. – Coupon collector problem: each time get a random

  • coupon. In order to collect all n different types of coupon,

with high probability one has to get in total Ω(nln n) coupons.

slide-70
SLIDE 70

Perimeter storage

  • Potential users outside the network have easy access to

perimeter nodes; Gateway nodes are positioned on the perimeter.

slide-71
SLIDE 71

Pros and Cons

  • No extra infrastructure, only a point-to-point routing

scheme is needed.

  • Robust to errors – just take k good copies.
  • Fault tolerance – sensors die? Fine…
  • No centralized processing, no routing table or

global knowledge of any sort.

  • Very resilient to packet loss due to the random

nature of the scheme.

  • Achieves certain data privacy. If the coding scheme

(the random coefficients) is kept from the adversary, the adversary only sees random data.

slide-72
SLIDE 72

Pros and Cons

  • Information is coded, in other words, scrambled.
  • Have to decode the whole k pieces, even only 1

piece of data is desired.

  • Doesn’t explore locality – usually we don’t go to

arbitrary k storage nodes, we go to the closest k nodes.