One Tunnel is (Often) Enough Simon Peter, Umar Javed, Qiao Zhang, - - PowerPoint PPT Presentation

one tunnel is often enough
SMART_READER_LITE
LIVE PREVIEW

One Tunnel is (Often) Enough Simon Peter, Umar Javed, Qiao Zhang, - - PowerPoint PPT Presentation

One Tunnel is (Often) Enough Simon Peter, Umar Javed, Qiao Zhang, Doug Woos, Thomas Anderson , Arvind Krishnamurthy University of Washington Financial support: NSF, Cisco, and Google Internet Routing Has Issues Outages and poor performance due


slide-1
SLIDE 1

One Tunnel is (Often) Enough

Simon Peter, Umar Javed, Qiao Zhang, Doug Woos, Thomas Anderson, Arvind Krishnamurthy University of Washington

Financial support: NSF, Cisco, and Google

slide-2
SLIDE 2

Internet Routing Has Issues

Outages and poor performance due to:

− Pathological routing policies − Route convergence delays − Misconfigured routers − Prefix hijacking − Malicious route injection (route table overload) − Distributed denial of service

Good technical solutions in most/all cases, but glacial progress towards adoption.

slide-3
SLIDE 3

Why?

The fault lies not in our stars, but in ourselves. – Cassius

slide-4
SLIDE 4

Why?

The wheels of justice grind slow but grind fine. – Sun Tzu

slide-5
SLIDE 5

Why?

We don’t care, we don’t have to, we’re the phone company. – Lily Tomlin

slide-6
SLIDE 6

Local Problem => Global Outage

AT&T Sprint Comcast Amazon FlakyISP PowerData ReliableISP 99.999% available! Blackholes IP prefix hijacks DDoS attacks

slide-7
SLIDE 7

Local Problem => Global Outage

AT&T Sprint Comcast Amazon PowerData

ARROW ARROW

ReliableISP

ARROW ARROW

99.999% available! Blackholes IP prefix hijacks DDoS attacks

Can we turn local reliability into global reliability?

slide-8
SLIDE 8

Assumptions/Observations

Shorter paths are more reliable than longer paths Simple packet processing is feasible at high- speed border routers

− 10 Gb per core on commodity hardware

AS graph is relatively small and stable See paper for quantitative justifications.

slide-9
SLIDE 9

ARROW

ARROW: Advertised Reliable Routing Over Waypoints

− ISPs offer a QoS tunnel across their network

to remote customers

− Paid service akin to AWS or Google Cloud

ARROW runs on a small ISP we control Evaluation: ARROW effective even if only a single tier-1 ISP adopts

slide-10
SLIDE 10

ARROW Example

1. Consult atlas of ISPs offering ARROW services 2. Construct tunnel through ARROW ISP, to output target address

PowerData Amazon FlakyISP ARROW ISP

ARROW ARROW

Internet atlas

slide-11
SLIDE 11

Use Cases

Enterprises

− More reliable access to cloud services − QoS between physically remote locations − Home health monitoring

Business-facing ISP or cellular telecom

− Market share driven by perceived data

network performance, reliability

− Well-developed market for premium service − 70% of data traffic exits telecom network

slide-12
SLIDE 12

ARROW Mechanisms

How does endpoint/proxy know what tunnels are available?

− Atlas published by ISPs offering ARROW

service: latency/bw/cost to which prefixes

− User/app-specific path selection

How are packets encapsulated?

Src Addr Hop Addr

Hop Auth IP envelope ARROW Transport

ARROW

Prot Src Addr Dst Addr IP header

slide-13
SLIDE 13

ARROW Mechanisms

How are packets authenticated?

− Packet authenticator provided by ISP at setup − Authenticator can be hashed with checksum

  • f packet to prevent snoop-stealing

What ISP data plane operations are needed?

− Check authenticator − Check packet is within rate limit envelope − Handle fault isolation probe, if any − Re-write destination address

slide-14
SLIDE 14

Failover

Failure of a router internal to an ISP

− ARROW is a stateful service − Local detection/recovery using Zookeeper

Failure of a border router

− End system/proxy detection/recovery − Use backup route through another PoP

Failure of an entire ISP

− End system/proxy detection/recovery − Use backup route through other ISPs

slide-15
SLIDE 15

Failure Isolation

How does endpoint/proxy locate who is at fault for service disruption?

− Send probe packet to locate the failure − Each hop:

  • Responds to the previous hop
  • Forwards the probe packet to next hop

See UW TR for efficient Byzantine-resilient solution

slide-16
SLIDE 16

Implementation

Dest ISP UW WISC GATech Princeton

VICCI VICCI VICCI

ARROW ISP BGP-mux BGP-mux BGP-mux

slide-17
SLIDE 17

Overhead

What is the data plane overhead of ARROW?

RTT (us) Throughput (Gbps) UDP/TCP 96 9.4 Serval 81 9.5 ARROW 1 hop 132 9.5

slide-18
SLIDE 18

Failover Latency

slide-19
SLIDE 19

BGP Outage

UW dest Emulab src

Original BGP path

  • utage

GATech WISC USC New BGP path ARROW path

700 ms 90 s

ARROW ARROW

slide-20
SLIDE 20

Link Failure Disconnections (Simulated: 1 ARROW Tier-1)

slide-21
SLIDE 21

Prefix Hijacking (Simulated)

slide-22
SLIDE 22

Summary

ARROW: Advertised Reliable Routing Over Waypoints

− ISPs offer a paid QoS tunnel across their

network to remote customers

− ARROW runs on a small ISP we control − Also on Google Cloud Platform

One tunnel (through a tier-1) is often enough