Wide-area Dissemina-on under Strict Timeliness, Reliability, and - - PowerPoint PPT Presentation

wide area dissemina on under strict timeliness
SMART_READER_LITE
LIVE PREVIEW

Wide-area Dissemina-on under Strict Timeliness, Reliability, and - - PowerPoint PPT Presentation

Wide-area Dissemina-on under Strict Timeliness, Reliability, and Cost Constraints Amy Babay, Emily Wagner, Yasamin Nazari, Michael Dinitz, and Yair Amir Distributed Systems and Networks Lab www.dsn.jhu.edu Problem: Combining Timeliness and


slide-1
SLIDE 1

Distributed Systems and Networks Lab

www.dsn.jhu.edu

Wide-area Dissemina-on under Strict Timeliness, Reliability, and Cost Constraints

Amy Babay, Emily Wagner, Yasamin Nazari, Michael Dinitz, and Yair Amir

slide-2
SLIDE 2

Problem: Combining Timeliness and Reliability over the Internet

  • Internet na-vely supports end-to-end reliable

(e.g. TCP) or best-effort -mely (e.g. UDP) communica-on

  • Our goal: support applica-ons with extremely

demanding combina-ons of -meliness and reliability requirements in a cost-effec-ve manner

  • Applica-ons have emerged over the past few

years that require both -meliness guarantees and high reliability

– e.g. VoIP, broadcast-quality live TV transport

2 March 30, 2017 Algorithms in the Field PI Mee-ng

slide-3
SLIDE 3

State-of-the-art: Combining Timeliness and Reliability over the Internet

200ms one-way latency requirement, 99.999% reliability guarantee 40ms one-way propaga-on delay across North America

3 March 30, 2017 Algorithms in the Field PI Mee-ng

slide-4
SLIDE 4

New Challenges: Combining Timeliness and Reliability

4 March 30, 2017 Algorithms in the Field PI Mee-ng

130ms round-trip latency requirement

slide-5
SLIDE 5

New Challenges: Combining Timeliness and Reliability

130ms round-trip latency requirement 80ms round-trip propaga-on delay across North America

5 March 30, 2017 Algorithms in the Field PI Mee-ng

slide-6
SLIDE 6

State-of-the-art: Combining Timeliness and Reliability over the Internet

  • Overlay networks enable specialized rou-ng

and recovery protocols

6 March 30, 2017 Algorithms in the Field PI Mee-ng SJC LAX DEN DFW CHI ATL WAS SVG NYC JHU

Client' Client' Client' Client' SJC LAX DEN DFW CHI ATL WAS SVG NYC JHU

slide-7
SLIDE 7

Addressing New Challenges: Dissemina-on Graph Approach

  • Stringent latency requirements give less

flexibility for buffering and recovery

  • Core idea: Send packets redundantly over a

subgraph of the network (a dissemina-on graph) to maximize the probability that at least one copy arrives on -me How do we select the subgraph (subset of

  • verlay links) on which to send each packet?

7 March 30, 2017 Algorithms in the Field PI Mee-ng

slide-8
SLIDE 8

Ini-al Approaches to Selec-ng a Dissemina-on Graph

  • Overlay Flooding: send on all overlay links

– Op-mal in -meliness and reliability but expensive

8 March 30, 2017 Algorithms in the Field PI Mee-ng

64 (directed) edges

DEN DFW ATL WAS LON FRA LAX JHU HKG CHI NYC SJC

slide-9
SLIDE 9

Ini-al Approaches to Selec-ng a Dissemina-on Graph

  • Time-Constrained Flooding: flood only on

edges that can reach the des-na-on within the latency constraint

9 March 30, 2017 Algorithms in the Field PI Mee-ng

DEN DFW ATL WAS LON FRA LAX JHU HKG CHI NYC SJC

slide-10
SLIDE 10

Ini-al Approaches to Selec-ng a Dissemina-on Graph

  • Disjoint Paths: send on several paths that do

not share any nodes (or edges)

– Good trade-off between cost and -meliness/reliability – Uniformly invests resources across the network

10 March 30, 2017 Algorithms in the Field PI Mee-ng

DEN DFW ATL WAS LON FRA LAX JHU HKG CHI NYC SJC

slide-11
SLIDE 11

Selec-ng an Op-mal Dissemina-on Graph

Can we use knowledge of the network characteris-cs to do befer?

Invest more resources in more problema-c regions:

11 March 30, 2017 Algorithms in the Field PI Mee-ng

DEN DFW ATL WAS LON FRA LAX JHU HKG CHI NYC SJC

slide-12
SLIDE 12

Problem Defini-on: Selec-ng an Op-mal Dissemina-on Graph

  • We want to find the best trade-off between cost

and reliability (subject to -meliness)

– Cost: # of -mes a packet is sent (= # of edges used) – Reliability: probability that a packet reaches its des-na-on within its applica-on-specific latency constraint (e.g. 65ms)

  • Client perspecAve: maximize reliability achieved

for a fixed budget

  • Service provider perspecAve: minimize cost of

providing an agreed upon level of reliability (SLA)

12 March 30, 2017 Algorithms in the Field PI Mee-ng

slide-13
SLIDE 13

Selec-ng an Op-mal Dissemina-on Graph

  • Solving the proposed problems is NP-hard

– Without the latency constraint, compu-ng reliability is the two-terminal reliability problem (which is #P-complete) – Compu-ng op-mal dissemina-on graphs in terms

  • f cost and reliability is also NP-hard
  • We expand on this later in the talk

13 March 30, 2017 Algorithms in the Field PI Mee-ng

slide-14
SLIDE 14

Data-Informed Dissemina-on Graphs

  • Goal: Learn about the types of problems that occur

in the field and tailor dissemina-on graphs to address common problem types

  • Collected data on a commercial overlay topology

(www.ltnglobal.com) over 4 months

  • Analyzed how different dissemina-on-graph-based

rou-ng approaches (-me-constrained flooding, single path, two disjoint paths) would perform (Playback Network Simulator)

14 March 30, 2017 Algorithms in the Field PI Mee-ng

slide-15
SLIDE 15

Data-Informed Dissemina-on Graphs

  • Key findings:
  • Two disjoint paths provide rela-vely high reliability overall

– Good building block for most cases

  • Almost all problems not addressed by two disjoint paths

involve either:

– A problem at the source – A problem at the des-na-on – A problem at both the source and the des-na-on

15 March 30, 2017 Algorithms in the Field PI Mee-ng

slide-16
SLIDE 16

Dissemina-on Graphs with Targeted Redundancy

  • Our approach:
  • Pre-compute four graphs per flow (more on this later):

– Two disjoint paths (sta-c) – Source-problem graph – Des-na-on-problem graph – Robust source-des-na-on problem graph

  • Use two disjoint paths graph in the normal case
  • If a problem is detected at the source and/or des-na-on
  • f a flow, switch to the appropriate pre-computed

dissemina-on graph

  • Converts op-miza-on problem to classifica-on problem

16 March 30, 2017 Algorithms in the Field PI Mee-ng

slide-17
SLIDE 17

Dissemina-on Graphs with Targeted Redundancy: Case Study

  • Case study: Atlanta -> Los Angeles

17 March 30, 2017 Algorithms in the Field PI Mee-ng

DEN DFW ATL WAS LON FRA LAX JHU HKG CHI NYC SJC

Two node-disjoint paths dissemina-on graph (4 edges)

slide-18
SLIDE 18

Dissemina-on Graphs with Targeted Redundancy: Case Study

  • Case study: Atlanta -> Los Angeles

18 March 30, 2017 Algorithms in the Field PI Mee-ng

DEN DFW ATL WAS LON FRA LAX JHU HKG CHI NYC SJC

Des-na-on-problem dissemina-on graph (8 edges)

slide-19
SLIDE 19

Dissemina-on Graphs with Targeted Redundancy: Case Study

  • Case study: Atlanta -> Los Angeles

19 March 30, 2017 Algorithms in the Field PI Mee-ng

DEN DFW ATL WAS LON FRA LAX JHU HKG CHI NYC SJC

Source-problem dissemina-on graph (10 edges)

slide-20
SLIDE 20

Dissemina-on Graphs with Targeted Redundancy: Case Study

  • Case study: Atlanta -> Los Angeles

20 March 30, 2017 Algorithms in the Field PI Mee-ng

DEN DFW ATL WAS LON FRA LAX JHU HKG CHI NYC SJC

Robust source-des-na-on-problem dissemina-on graph (12 edges)

slide-21
SLIDE 21

Dissemina-on Graphs with Targeted Redundancy: Case Study

  • Case study: Atlanta -> Los Angeles; August 15, 2016

21 March 30, 2017 Algorithms in the Field PI Mee-ng

Packets received and dropped over a 110-second interval using (adap-ve) two disjoint paths (3982 lost/late packets, 20 packets with latency over 120ms not shown)

slide-22
SLIDE 22

Dissemina-on Graphs with Targeted Redundancy: Case Study

  • Case study: Atlanta -> Los Angeles; August 15, 2016

22 March 30, 2017 Algorithms in the Field PI Mee-ng

Packets received and dropped over a 110-second interval using our dissemina-on-graph-based approach to add targeted redundancy at the des-na-on (299 lost/late packets)

slide-23
SLIDE 23

Dissemina-on Graphs with Targeted Redundancy: Results

  • 4 weeks of data collected over 4 months
  • Packets sent on each link in the overlay topology every

10ms

  • Analyzed 16 transcon-nental flows
  • All combina-ons of 4 ci-es on the East Coast of the US

(NYC, JHU, WAS, ATL) and 2 ci-es on the West Coast of the US (SJC, LAX)

  • 1 packet/ms simulated sending rate

23 March 30, 2017 Algorithms in the Field PI Mee-ng

slide-24
SLIDE 24

Dissemina-on Graphs with Targeted Redundancy: Results

  • results

24 March 30, 2017 Algorithms in the Field PI Mee-ng

RouAng Approach Availability (%) Unavailability (seconds per flow per week) Reliability (%) Reliability (packets lost/ late per million) Time-Constrained Flooding 99.995887% 24.88 99.999854% 1.46 Dissemina-on Graphs with Targeted Redundancy 99.995886% 24.88 99.999848% 1.52 Dynamic Two Disjoint Paths 99.995866% 25.00 99.998913% 10.87 Sta-c Two Disjoint Paths 99.995521% 27.09 99.998453% 15.47 Redundant Single Path 99.995764% 25.62 99.998535% 14.65 Single Path 99.994206% 35.04 99.997605% 23.95

slide-25
SLIDE 25

Results: % of Performance Gap Covered (between TCF and Single Path)

  • results

25 March 30, 2017 Algorithms in the Field PI Mee-ng

RouAng Approach Week 1 2016-07-19 Week 2 2016-08-08 Week 3 2016-09-01 Week 4 2016-10-13 Overall Scaled Cost Time-Constrained Flooding 100.00% 100.00% 100.00% 100.00% 100.00% 15.75

  • Dissem. Graphs

with Targeted Redundancy 99.05% 99.73% 98.53% 99.94% 99.81% 2.098 Dynamic Two Disjoint Paths 73.63% 67.73% 94.75% 69.69% 69.65% 2.059 Sta-c Two Disjoint Paths 37.89% 43.18%

  • 175.13%

51.63% 44.58% 2.059 Redundant Single Path 67.06% 47.72% 43.12% 58.00% 54.59% 2.000 Single Path 0.00% 0.00% 0.00% 0.00% 0.00% 1.000

slide-26
SLIDE 26

Applica-ons: Remote Manipula-on

March 30, 2017 Algorithms in the Field PI Mee-ng 26

Video demonstra-on: www.dsn.jhu.edu/~babay/Robot_video.mp4

slide-27
SLIDE 27

Applica-ons: Remote Robo-c Ultrasound

  • Collabora-on with JHU/TUM CAMP lab (hfps://camp.lcsr.jhu.edu/)

March 30, 2017 Algorithms in the Field PI Mee-ng 27

slide-28
SLIDE 28

Part II: Theory

  • Compu-ng op-mal dissemina-on graphs:

– Formaliza-on of problem – Hardness – Limited Progress

  • Targeted redundancy:

– Problem at source or des-na-on – Problem at both – Which graphs to compute?

March 30, 2017 Algorithms in the Field PI Mee-ng 28

slide-29
SLIDE 29

Op-mizing Dissemina-on Graphs

  • Input:

– G = (V, E), s,t ∈ V – p : E → [0,1] – d: E → R+; c : E → R+ – L ∈ R; B ∈ R

  • Gp: subgraph where each e fails w.p. p(e)
  • Find subgraph H with minimum # edges s.t.

Pr[s,t at distance at most L in Hp] ≥ B

March 30, 2017 Algorithms in the Field PI Mee-ng 29

slide-30
SLIDE 30

Op-mizing Dissemina-on Graphs

  • Bad news: compu-ng Pr[s,t connected in Gp]

is #P-hard [Valiant]

– So can’t even tell if purported solu-on is feasible

March 30, 2017 Algorithms in the Field PI Mee-ng 30

Reliability

  • But how hard is it really?

– If reliability not incredibly close to 0, Monte Carlo sampling + Chernoff bound give (1+𝜁)-approx – For us, need reliability to be very large: maybe can s-ll approximate op-mal dissemina-on graph?

slide-31
SLIDE 31

Ideas & Results

  • Want prac-cal, fast algorithms, so try greedy,

local search, etc.

– Counterexamples to everything

  • Try 2: write (exponen-al-size) LP

– Imprac-cal , fine in theory – Can only approximately separate – How to round??

March 30, 2017 Algorithms in the Field PI Mee-ng 31

slide-32
SLIDE 32

Ideas & Results

  • Sample Average Approxima-on (SAA): sample

scenarios, op-mize just for sampled scenarios

– Bad news: arbitrary samples is Label Cover-hard!

March 30, 2017 Algorithms in the Field PI Mee-ng 32

  • If all samples trees:

– Use Minimum p-Union approxima-on [D- Chlamtac-Makarychev ‘17] (generaliza-on of Densest k-Subgraph) ⇒ O(n1/2)-approx – S-ll as hard as Densest k-Subgraph!

slide-33
SLIDE 33

Targeted Redundancy

  • Wanted to precompute dissemina-on graphs

for problem at source, at sink, and at both

  • What should these graphs be? Can we find the

best?

– Ongoing work...

March 30, 2017 Algorithms in the Field PI Mee-ng 33

slide-34
SLIDE 34

Source or Sink Problem

  • Send to all neighbors of

source

  • Cheapest tree where all

neighbors have short enough path to sink

March 30, 2017 Algorithms in the Field PI Mee-ng 34

  • Shallow-Light Steiner Tree
  • Known approxima-ons, bicriteria approxima-ons
  • Currently: brute-force op-mal solu-on

≤ L

slide-35
SLIDE 35

Source and Sink Problem

  • Graph should be:

– All neighbors S of source – All neighbors T of sink – Path of length at most L between all s ∈ S, t ∈ T

March 30, 2017 Algorithms in the Field PI Mee-ng 35

≤ L

slide-36
SLIDE 36

Bipar-te Shallow-Light Steiner Network

  • Label Cover-hard (unlike shallow-light tree)
  • O(n3/5)-approx using pairwise spanner approx

[Chlamtac-D-Kortsarz-Laekhanukit ‘17]

  • (polylog, polylog)-bicriteria

March 30, 2017 Algorithms in the Field PI Mee-ng 36

slide-37
SLIDE 37

Next Steps: Theory

  • Spinoffs of op-mal dissemina-on graphs

– Stochas-c vaccina-on problems, with Aravind

Srinivasan (UMD) & Anil Vullikan- (Va Tech)

  • Bipar-te Shallow-Light Network

– Befer approxima-ons? – Exact algorithm, exponen-al in # terminals? – Generally: effect of demand graph on shallow- light / spanner problems?

March 30, 2017 Algorithms in the Field PI Mee-ng 37

slide-38
SLIDE 38

Next Steps: Prac-ce

  • Deploying the full system and valida-ng the

simula-on

– Implemen-ng dissemina-on-graph-based rou-ng in the Spines Overlay Messaging Framework (www.spines.org) – Collec-ng data in parallel with the system deployment and comparing experimental and simula-on results

  • Integra-ng and experimen-ng with

applica-ons (e.g. remote ultrasound)

March 30, 2017 Algorithms in the Field PI Mee-ng 38

slide-39
SLIDE 39

Thanks!

39

www.dsn.jhu.edu/funding/aitf/