problem scaling content delivery
play

Problem: Scaling Content Delivery Millions of clients server and - PowerPoint PPT Presentation

10/22/2019 Problem: Scaling Content Delivery Millions of clients server and network meltdown 15-441/641: Content Delivery and Peer-to-Peer 15-441 Fall 2019 Profs Peter Steenkiste & Justine Sherry Fall 2019


  1. 10/22/2019 Problem: Scaling Content Delivery • Millions of clients  server and network meltdown 15-441/641: Content Delivery and Peer-to-Peer 15-441 Fall 2019 Profs Peter Steenkiste & Justine Sherry Fall 2019 https://computer-networks.github.io/fa19/ P2P System Outline • Peer-to-peer • Overlays: naming, addressing, and routing • CDNs • (Load balancing – consistent hashing) • Leverage the resources of client machines (peers) Computation, storage, bandwidth • 3 4 1

  2. 10/22/2019 P2P Definition Why p2p? • Harness lots of spare capacity Distributed systems consisting of interconnected 1 Big Fast Server: $10k/month++ versus 1000s .. 1000000s clients: $ ?? • nodes able to self-organize into network topologies Capacity grows with the number of users! with the purpose of sharing resources such as • content, CPU cycles, storage and bandwidth, • Build very large-scale, self-managing systems capable of adapting to failures and accommodating Same techniques useful for companies, • transient populations of nodes while maintaining E.g. Akamai’s 14,000+ nodes, Google’s 100,000+ nodes • acceptable connectivity and performance, without But: servers vs. arbitrary nodes, hard vs. soft state (backups vs caches), …. • requiring the intermediation or support of a global Also: security, fairness, freeloading, .. centralized server or authority. • No single point of failure • – A Survey of Peer-To-Peer Content Distribution Technologies, Some nodes go down – others take over • Androutsellis-Theotokis and Spinellis … government shuts down nodes – peers in other countries are available • 6 P2P Construction Key Idea: Network Overlay P2P Overlay Network • A network overlay is a network that is layered on top of the Internet • Simplified picture: overlays use IP as their datalink layer • Overlays need the equivalent of all the functions IP networks need: Clients • Naming and addressing Servers • Routing SPRINT • Bootstrapping Verizon • Security, error recovery, etc. CMU AT&T 2

  3. 10/22/2019 Names, addresses, and routing Common P2P Framework Content retrieval: The Internet ● End-point: content ● Endpoint: host N 2 N 1 N 3 New peer ● Name: identifies content you ● Name: hierarchical Join are looking for domain name ● E.g., hash of file, key words ● Address: IP address of Internet Key=“title” ? node that has the content, ● Address: the IP address of Value=MP3 data… Client Search plus content name node that has the content, Publish Lookup(“title”) Fetch Content plus content name N 4 N 6 ● Routing: how to reach N 5 host, e.g., BGP, … ● Routing: how to find the data 10 Napster: Central Database What is (was) out there? Central Flood Super- Route 123.2.0.18 node flood insert(X, 123.2.21.23) Whole Napster Gnutella Freenet Fetch search(A) File --> 123.2.0.18 Query Publish Reply Chunk BitTorrent KaZaA DHTs (bytes, Based eDonkey Where is file A? not 2000 I have X, Y, and Z! chunks) 123.2.21.23 Join: contact server 12 11 3

  4. 10/22/2019 Napster: Discussion Gnutella: Flooding I have file A. • Pros: I have file A. • Simple • Search scope is O(1) Reply • Controllable (pro or con?) • Cons: • Server maintains O(N) State Query • Server does all processing Where is file A? • Single point of failure Join: contact peers Publish: noop Fetch: direct p2p 13 14 Gnutella: Discussion KaZaA: Query Flooding • Pros: • First released in 2001 and also very popular • Fully de-centralized • Search cost distributed • Processing @ each node permits powerful search semantics • Join : on startup, client contacts a “supernode” ... may at some point • Cons: become one itself • Search scope is O( N ) • Publish : send list of files to supernode • Search time is O(???) • Search : send query to supernode, supernodes flood query amongst • Nodes leave often, network unstable themselves. • TTL-limited search works well for haystacks. • Fetch : get the file directly from peer(s); can fetch simultaneously • For scalability, does NOT search every node. from multiple peers • May have to re-issue query later 15 16 4

  5. 10/22/2019 KaZaA: Discussion KaZaA: Intelligent Query Flooding • Works better than Gnutella because of query consolidation Group of servers: “Super Nodes” • Several nodes may have requested file... How to tell? Gnutella-style Must be able to distinguish identical files Flooding • Same filename not necessarily same file... • • Use Hash of file Can fetch bytes [0..1000] from A, [1001...2000] from B • • Pros: Tries to take into account node heterogeneity: Bandwidth, computational resources, … • Napster-style • Cons: Still no guarantees on search scope or time Client-server • Challenge: want stable superpeers – good prediction Model • Must also be capable platforms 17 20 BitTorrent: Publish/Join BitTorrent: Swarming • Started in 2001 to efficiently support flash crowds Tracker • Focus is on fetching, not searching • Publish : Run a tracker server. • Search : Find a tracker out-of-band for a file, e.g., Google • Join : contact central “tracker” server for list of peers. • Fetch : Download chunks of the file from your peers. Upload chunks you have to them. • Comparison with earlier architectures: Focus on fetching of “few large files” • Chunk based downloading • Anti-freeloading mechanisms • 23 22 5

  6. 10/22/2019 BitTorrent: Fetch BitTorrent: Summary • Pros: • Works reasonably well in practice • Gives peers incentive to share resources; avoids freeloaders • Cons: • Pareto Efficiency relative weak condition • Central tracker server needed to bootstrap swarm • (Tracker is a design choice, not a requirement, as you know from your projects. Could easily combine with other approaches.) 24 26 When are p2p Useful? Outline Works well for caching and “soft-state”, read-only data • • Peer-to-peer Works well! BitTorrent, KaZaA, etc., all use peers as caches for hot data • • Overlays: naming, addressing, and routing Difficult to extend to persistent data • • CDNs Nodes come and go: need to create multiple copies for availability and • replicate more as nodes leave • (Load balancing – consistent hashing) Not appropriate for search engine styles searches • Complex intersection queries (“the” + “who”): billions of hits for each term alone • Sophisticated ranking: Must compare many results before returning a subset to • user Need massive compute power • 27 28 6

  7. 10/22/2019 Content Delivery: Reminder: Caching with Forward Proxies Possible Bottlenecks Cache documents close to clients •  decrease latency Server • Typically done by ISPs or enterprises Last Mile First Mile Problem Problem  reduce provider traffic load • CDNs proactively cache for the Backbone ISP content providers (their clients) ISP-1 ISP-2 • Typically cache at different levels Forward proxies in the Internet hierarchy: • Last mile ISPs for low latency End User Host Server Clients • Closer to core for broader Internet Peering Backbone coverage Problem Problem Content Distribution Networks (CDNs) What is the CDN? origin server in North America • Edge Caches: work with ISP and networks everywhere to install The content providers are the CDN • customers. edge caches Content replication • Edge = close to customers CDN distribution node CDN company installs hundreds of • • Content delivery: getting content to the edge caches CDN servers throughout Internet • Content can be objects, video, or entire web sites Close to users • • Mapping: find the “closest” edge server for each user and deliver CDN replicates its customers’ • content in CDN servers. When content from that server provider updates content, CDN CDN server • Network proximity not the same as geographic proximity CDN server updates servers in S. America CDN server in Asia in Europe • Focus is on performance as observed by user (quality) 15-441 S'10 31 32 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend