Maygh: Building a CDN from client web browsers Liang Zhang Fangfei - - PowerPoint PPT Presentation

maygh building a cdn from client web browsers
SMART_READER_LITE
LIVE PREVIEW

Maygh: Building a CDN from client web browsers Liang Zhang Fangfei - - PowerPoint PPT Presentation

Maygh: Building a CDN from client web browsers Liang Zhang Fangfei Zhou Alan Mislove Ravi Sundaram Northeastern University EuroSys 13, Prague Content exchange and the Web Web is popular mechanism for content distribution News sites,


slide-1
SLIDE 1

Maygh: Building a CDN from client web browsers

Liang Zhang Fangfei Zhou Alan Mislove Ravi Sundaram

Northeastern University EuroSys ’13, Prague

slide-2
SLIDE 2

EuroSys’13 Liang Zhang

Web is popular mechanism for content distribution

News sites, content sharing, movies

Web is fundamentally client-server

I.e., Web site operator serves every client

Popular Web sites receive millions of hits per day

Need to handle a large number of requests

How do large, popular web sites distribute content?

Content exchange and the Web

2

slide-3
SLIDE 3

EuroSys’13 Liang Zhang

Options for content distribution:

  • 1. Serve on your own

Purchase machines, network bandwidth

  • 2. Pay content distribution networks (CDNs)

Akamai, Limelight, Clearway, ...

  • 3. Rent cloud services

Amazon EC2, Azure, App Engine...

In all cases, significant monetary burden on web site operator

Distributing web content

3

slide-4
SLIDE 4

EuroSys’13 Liang Zhang

How do operators pay?

Operators typically use two models to support site:

  • 1. User subscriptions (e.g., Netflix, New York Times, Rdio)

Limited user base

  • 2. Advertising (e.g.,YouTube, Yahoo, Google*)

Resort to data-mining user data, privacy implications

Few choices limit set of sites that can exist

Free web sites have to accept advertising

Can we give web site operators another option?

4

slide-5
SLIDE 5

EuroSys’13 Liang Zhang

Idea: Clients help distribute content

Typical properties of popular web sites:

Many users Same content viewed by many users Content are largely static

Insight: Recruit web clients to help serve content Technically challenging

Significant user churn Web has client–server architecture

But, we are not the first to explore this idea...

5

slide-6
SLIDE 6

EuroSys’13 Liang Zhang

Alternate Approaches

  • 1. Browser plugins

FireCoral, SwarmPlugin

  • 2. Client-side software

Akamai’s NetSession, PPLive

Both require installation of additional software

Typically with few incentives E.g., Adblock Plus, most popular plug-in: 4.2% installations

Can we build a system that does not require additional software?

6

slide-7
SLIDE 7

EuroSys’13 Liang Zhang

Goal: Build content distribution system for the Web

Allow web browsers to assist in content distribution to other users

Requirements:

Works with today’s web sites, browsers No client side changes

Maygh

Serves as a cache for static web content Takes advantage of recent HTML5 browser features Significantly reduces bandwidth requires for operator

Result: On-demand CDN built from web browsers

This talk: Maygh

7

slide-8
SLIDE 8

EuroSys’13 Liang Zhang

Outline

  • 1. Motivation
  • 2. Maygh design
  • 3. Security and privacy implications
  • 4. Evaluation

8

slide-9
SLIDE 9

EuroSys’13 Liang Zhang

Maygh design overview

Maygh: Drop-in content distribution system

Serves as a distributed cache Assume content always available from origin

Maygh serves static content

E.g., image, CSS, JavaScript Content must be named by content-hash

Key challenge: Browsers not designed to communicate directly

Browsers distinct from Web servers Use new techniques to allow browser to serve content

9

slide-10
SLIDE 10

EuroSys’13 Liang Zhang

Protocol: RTMFP or WebRTC

Two peer-to-peer protocols for Web browsers

Designed for direct audio/video chats Both support NAT traversal via STUN

Adobe Flash RTMFP

Supported in Flash player 10.0 since 2008 Available in 99% of browsers

WebRTC

W3C standard, actively under development Currently in Firefox and Chrome

10

slide-11
SLIDE 11

EuroSys’13 Liang Zhang

Maygh overview

Alice

11

Co

slide-12
SLIDE 12

EuroSys’13 Liang Zhang

Maygh overview

Alice

11

Co

slide-13
SLIDE 13

Liang Zhang

Maygh overview

Alice Bob

EuroSys’13 12

Maygh Coordinator

slide-14
SLIDE 14

Liang Zhang

Maygh overview

Alice Bob

EuroSys’13 12

Maygh Coordinator

Bob

slide-15
SLIDE 15

EuroSys’13 Liang Zhang

Maygh Coordinator

Introduce a middlebox: Maygh Coordinator

Run by website operators

Serves two purposes:

  • 1. Serves as a directory for content

Keeps track of content in user’s browsers Content-hash -> {set of online clients}

  • 2. Allows browsers to establish direct connections

Supports NAT traversal using STUN with RTMFP/WebRTC

Techniques to allow multiple coordinators in paper

Can scale to support high churn, 1000s requests/second

13

Coordinator

slide-16
SLIDE 16

EuroSys’13 Liang Zhang

Client-side changes

Implement Maygh client-side library in Javascript

Add it to the site’s pages

Browsers use RTMFP/WebRTC to communicate with coordinator

Allows bi-directional communication Online client is always connected to coordinator

Use LocalStorage to storage browsed content

Persistent cache, up to 5MB/site Easily programmatically accessed

Insert downloaded objects in LocalStorage

Treat like LRU cache

14

+

slide-17
SLIDE 17

EuroSys’13 Liang Zhang

How does an operator use Maygh?

Web site operators need to do three things:

  • 1. Run coordinator(s)
  • 2. Include Maygh Javascript

<script src=”maygh.js”>

  • 3. Change mechanism for loading content

<img id="pic-id" src=”http://www.foo.com/...”/>

replaced with

<img id="pic-id"/> <script> maygh.load("pic-hash", "pic-id"); </script>

15

slide-18
SLIDE 18

EuroSys’13 Liang Zhang

Outline

  • 1. Motivation
  • 2. Maygh design
  • 3. Security and privacy implications
  • 4. Evaluation

16

slide-19
SLIDE 19

EuroSys’13 Liang Zhang

Security

Can users serve forged content?

Can detect forged content using content-hash

Can users violate the Maygh protocol?

E.g., claim to have content, DoS attacks Use similar techniques that are in-use today

Block accounts, IP address, or subnets Existing defenses against DDoS

Fairness

Operator controls coordinator, choice of uploading peer Maygh tracks content users upload/download

E.g., Ensure no user has contributes more resources than they use

17

slide-20
SLIDE 20

EuroSys’13 Liang Zhang

Privacy

Can users view content they are not allowed to?

Content secured by its hash Naming content implies access

Similar semantics to Flickr, other sites today

Can users figure out what others have browsed?

Client receive information about views

Can use cover traffic, pre-fetch requests Or, allow user to disable Maygh for certain content

Privacy implications similar to other Hybrid-CDN models

NFL’s p2p streaming, FireCoral, PPLive

18

slide-21
SLIDE 21

EuroSys’13 Liang Zhang

Outline

  • 1. Motivation
  • 2. Maygh design
  • 3. Security and privacy implications
  • 4. Evaluation

19

slide-22
SLIDE 22

EuroSys’13 Liang Zhang

Evaluation overview

Implemented Maygh using RTMFP

Full browser support today, easy to get user base Also built proof-of-concept WebRTC client

Includes both Maygh coordinator and client-side library

Client: 657 lines of Javascript, 214 lines of ActionScript Coordinator: 2,944 lines of Javascript

Code open-source, available at

http://github.com/leoliangzhang/maygh

20

slide-23
SLIDE 23

EuroSys’13 Liang Zhang

How much additional latency?

Flash RTMFP and WebRTC proof-of-concept implementations Fetch 50 KB objects from other peer

Show First/Subsequent object loading time

Overall, latency is sufficient for many Web sites

Can also be hidden using pre-fetching techniques

21

Accessed from Accessed from Served from Maygh LAN (Boston) Cable (Boston) DSL (New Orl.) LAN (Boston) Cable (Boston) 229 / 87 ms 618 / 307 ms 1314 / 707 ms 771 /283 ms 702 / 314 ms 1600 / 837 ms

slide-24
SLIDE 24

EuroSys’13 Liang Zhang

How much additional latency?

Flash RTMFP and WebRTC proof-of-concept implementations Fetch 50 KB objects from other peer

Show First/Subsequent object loading time

Overall, latency is sufficient for many Web sites

Can also be hidden using pre-fetching techniques

21

Accessed from Accessed from Served from Maygh LAN (Boston) Cable (Boston) DSL (New Orl.) LAN (Boston) Cable (Boston) 229 / 87 ms 618 / 307 ms 1314 / 707 ms 771 /283 ms 702 / 314 ms 1600 / 837 ms 72 / 16 ms 364 / 120 ms 544 / 354 ms 284 / 57 ms 577 / 107 ms 765 / 379 ms

slide-25
SLIDE 25

EuroSys’13 Liang Zhang

Deploying Maygh to large website is challenging

Instead, perform simulation

Use 1-week anonymized Akamai access logs from Etsy

Top-50 US web site, online marketplace 205M requests, 5.7M IPs 2.77TB total network traffic

85% of Etsy’s bandwidth is static images Simulation setup

Client stay on page for 10 to 30 seconds Ensure fairness

Clients never upload more than downloaded, or more than 10 MB

22

How much bandwidth can Maygh save?

slide-26
SLIDE 26

EuroSys’13 Liang Zhang

Median bandwidth used drops

From 50.3 Mb/s to 11.7 Mb/s (a 77% drop) Even with significant churn

75% reduction in 95th-percentile bandwidth

Only requires one 4-core coordinator

23

How much bandwidth can Maygh save?

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 20 40 60 80 100 120 CDF Five-Minute Average Bandwidth (Mb/s) Normal

slide-27
SLIDE 27

EuroSys’13 Liang Zhang

Median bandwidth used drops

From 50.3 Mb/s to 11.7 Mb/s (a 77% drop) Even with significant churn

75% reduction in 95th-percentile bandwidth

Only requires one 4-core coordinator

23

How much bandwidth can Maygh save?

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 20 40 60 80 100 120 CDF Five-Minute Average Bandwidth (Mb/s) Normal 10% Plug-in

slide-28
SLIDE 28

EuroSys’13 Liang Zhang

Median bandwidth used drops

From 50.3 Mb/s to 11.7 Mb/s (a 77% drop) Even with significant churn

75% reduction in 95th-percentile bandwidth

Only requires one 4-core coordinator

23

How much bandwidth can Maygh save?

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 20 40 60 80 100 120 CDF Five-Minute Average Bandwidth (Mb/s) Normal 10% Plug-in Maygh

slide-29
SLIDE 29

EuroSys’13 Liang Zhang

Real-world deployment

Set up special version of our department’s web server

Set up coordinator within our department

Invite graduate students

18 users for 3 days Total of 374 photos viewed, 24% served from other Maygh client

Lower than simulation because more users on Etsy

Take-away: Compatible with today’s

Browsers (Firefox, Safari, Chrome) Websites

24

slide-30
SLIDE 30

16.11.12 WPI Alan Mislove

Summary

Substantial monetary burden to host popular Web site

Site operators typically resort to advertising to pay bills

Idea: Recruit web clients to help distribute content

Without requiring any additional client-side software

Maygh

Serves as cache for static Web content Operator runs coordinator, allows clients to communicate

Evaluation demonstrated practicality, efficacy

Open-source and available to research community

25

slide-31
SLIDE 31

Questions?

http://github.com/leoliangzhang/maygh

slide-32
SLIDE 32

EuroSys’13 Liang Zhang

Cacheable Web content

Dynamically generated web pages popular So, how much content is static, cacheable?

I.e., what is the potential for system like Maygh?

Conduct a experiment

Consider top 100 websites from Alexa’s ranking Simulate web browsing via random walk of five pages per site Consider content with Cache-Control: public cacheable

Result:

On average, 74.2% of bytes are cacheable Maygh could serve a significant fraction of bytes

27

slide-33
SLIDE 33

EuroSys’13 Liang Zhang

Potential cacheable content

74.2% of the bytes requested are marked as cacheable Most static content like images, videos, and SWF are still cacheable

28

Content Type % Requests % Bytes % Cacheable Image JavaScript HTML CSS Flash Other 70.5 40.3 85.7 13.1 29.0 84.8 10.7 19.9 30.1 3.5 8.7 86.5 0.9 1.3 96.0 1.3 1.0 45.7 Overall 100 100 74.2

Breakdown of browsing trace from the top 100 Alexa web sites.

slide-34
SLIDE 34

EuroSys’13 Liang Zhang

Scalability of Maygh coordinators?

Single coordinator

Dual 8-core 2.67 GHz Intel Xeon E5-2670 processors 454 transactions per second with under 15 ms latency

More details in the paper

29

5 10 15 20 25 30 35 40 100 200 300 400 500

  • Avg. Response Time (ms)

Transactions/second

slide-35
SLIDE 35

EuroSys’13 Liang Zhang

Scalability of multiple coordinators

Multiple coordinators on

Single machines, using multiple cores (with hyperthreading) Multiple machines, using only one core

Close-to-linear scaling Single machine performance decreases after 16 coordinators

Due to hyperthreading

A single machine with 4 CPU cores can support Etsy workload

30

1000 2000 3000 4000 5000 5 10 15 20 25 30 Transactions per second (before response time > 15ms) Number of coordinators One machine N machines