A Digital Fountain Approach to Reliable Distribution of Bulk Data - - PowerPoint PPT Presentation
A Digital Fountain Approach to Reliable Distribution of Bulk Data - - PowerPoint PPT Presentation
A Digital Fountain Approach to Reliable Distribution of Bulk Data John Byers, ICSI Michael Luby, ICSI Michael Mitzenmacher, Compaq SRC Ashu Rege, ICSI Application: Software Distribution New release of widely used software. Hundreds of
Application: Software Distribution
- New release of widely used software.
- Hundreds of thousands of clients or more.
- Bulk data: tens or hundreds of MB
- Heterogeneous clients:
– Modem users: hours – Well-connected users: minutes
Primary Objectives
- Scale to vast numbers of clients
– No ARQs or NACKs – Minimize use of network bandwidth
- Minimize overhead at receivers:
– Computation time – Useless packets
- Compatibility
– Networks: Internet, satellite, wireless – Scheduling policies, i.e. congestion control
Impediments
- Packet loss
– wired networks: congestion – satellite networks, mobile receivers
- Receiver heterogeneity
– packet loss rates – end-to-end throughput
- Receiver access patterns
– asynchronous arrivals and departures – overlapping access intervals
Digital Fountain
Encoding Stream Received Message Source k k k Can recover file from any set of k encoding packets. Transmission Instantaneous Instantaneous
Digital Fountain Solution
User 1 User 2 Transmission
5 hours 4 hours 3 hours 2 hours 1 hour 0 hours
File
Is FEC Inherently Bad?
- Faulty Reasoning
– FEC adds redundancy – Redundancy increases congestion and losses – More losses necessitate more transmissions – FEC consumes more overall bandwidth
- But…
– Each and every packet can be useful to all clients – Each client consumes minimum bandwidth possible – FEC consumes less overall bandwidth by compressing bandwidth across clients
DF Solution Features
- Users can initiate the download at their discretion.
- Users can continue download seamlessly after
temporary interruption.
- Tolerates moderate packet loss.
- Low server load - simple protocol.
- Does scale well.
- Low network load.
Approximating a Digital Fountain
Source Received Message Decoding Time k (1 + c) k k Encoding Time Encoding Stream
Approximating a DF: Performance Measures
- Time Overhead:
– Time to decode (or encode) as a function of k.
- Decoding Inefficiency:
packets needed to decode
k
Work on Erasure Codes
- Standard Reed-Solomon Codes
– Dense systems of linear equations. – Poor time overhead (quadratic in k) – Optimal decoding inefficiency of 1
- Tornado Codes [LMSSS ‘97]
– Sparse systems of equations. – Fast encoding and decoding (linear in k) – Suboptimal decoding inefficiency
Tornado Z: Encoding Structure
stretch factor = 2 k = 16,000 nodes = source data
= redundancy Irregular bipartite graph Irregular bipartite graph
k
Encoding/Decoding Process
a b f ⊕ ⊕
a b c d g ⊕ ⊕ ⊕ ⊕
c e g h ⊕ ⊕ ⊕ b d e f g h ⊕ ⊕ ⊕ ⊕ ⊕
a
b c d e f
g h
Timing Comparison
Encoding time, 1K packets Reed-Solomon Tornado Z Size 250 K 500 K 1 MB 2 MB 4 MB 8 MB 16 MB 4.6 sec. 19 sec. 93 sec. 442 sec. 30 min. 2 hrs. 8 hrs. 0.11 sec. 0.18 sec. 0.29 sec. 0.57 sec. 1.01 sec. 1.99 sec. 3.93 sec. Decoding time, 1K packets Reed-Solomon Tornado Z Size 250 K 500 K 1 MB 2 MB 4 MB 8 MB 16 MB 2.06 sec. 8.4 sec. 40.5 sec. 199 sec. 13 min. 1 hr. 4 hrs. 0.18 sec. 0.24 sec. 0.31 sec. 0.44 sec. 0.74 sec. 1.28 sec. 2.27 sec.
Tornado Z: Average inefficiency = 1.055 Both codes: Stretch factor = 2
Cyclic Interleaving
File Interleaved Encoding
Transmission Encoding Copy 1 Encoding Copy 2
Blocks Encoded Blocks Tornado Encoding
Cyclic Interleaving: Drawbacks
- The Coupon Collector’s Problem
– Waiting for packets from the last blocks: – More blocks: faster decoding, larger inefficiency
T B blocks
Scalability over File Size
Decoding Inefficiency, 500 Receivers, p = 0.1
1 1.1 1.2 1.3 1.4 1.5 1.6 100 1000 10000
File Size, KB Decoding Inefficiency
Interleaved, T = 20, Max. Interleaved, T = 20, Avg. Interleaved, T = 50, Max. Interleaved, T = 50, Avg. Tornado Z, Max. Tornado Z, Avg.
Scalability over Receivers
Decoding Inefficiency on a 1MB File, p = 0.1
1 1.2 1.4 1.6 1.8 1 10 100 1000 10000
Receivers Decoding Inefficiency
Interleaved, T = 20 Interleaved, T = 50 Tornado Z
Digital Fountain Prototype
- Built on top of IP Multicast.
- Tolerating heterogeneity:
– Layered multicast – Congestion control [VRC ‘98]
- Experimental results over MBONE.
Research Directions
- Other applications for digital fountains
– Dispersity routing – Accessing data from multiple mirror sites in parallel
- Improving the codes
- Implementation and deployment