cs 744 big data systems
play

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 With - PowerPoint PPT Presentation

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 With slides from Mosharaf Chowdhury and Ion Stoica Datacenter ARCHITECTURE - Hardware Trends - Software Implications - Network Design Why is One Machine Not Enough? Too much data ? Too


  1. CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 With slides from Mosharaf Chowdhury and Ion Stoica

  2. Datacenter ARCHITECTURE - Hardware Trends - Software Implications - Network Design

  3. Why is One Machine Not Enough? Too much data ? Too many requests ? Not enough memory ? Not enough computing capability ?

  4. What’s in a Machine? Interconnected compute and storage Memory Bus Newer Hardware - GPUs, FPGAs PCIe v4 - RDMA, NVlink Ethernet SATA

  5. Scale Up: Make More Powerful Machines Moore’s law – Stated 52 years ago by Intel founder Gordon Moore – Number of transistors on microchip double every 2 years – Today “closer to 2.5 years” Intel CEO Brian Krzanich

  6. Dennard Scaling is the Problem Suggested that power requirements are proportional to the area for transistors – Both voltage and current being proportional to length – Stated in 1974 by Robert H. Dennard (DRAM inventor) “Adapting to Thrive in a New Economy of Memory Abundance,” Bresniker et al Broken since 2005

  7. Dennard Scaling is the Problem Performance per-core is stalled Number of cores is increasing “Adapting to Thrive in a New Economy of Memory Abundance,” Bresniker et al

  8. Memory Capacity Growing by DRAM Capacity +29% per year

  9. MEMORY BANDWIDTH Growing +15% per year

  10. MEMORY BANDWIDTH Growing Data access from memory is getting more expensive ! +15% per year

  11. HDD CAPACITY

  12. HDD BANDWIDTH Disk bandwidth is not growing

  13. SSDs Performance: – Reads: 25us latency – Write: 200us latency – Erase: 1,5 ms Steady state, when SSD full – One erase every 64 or 128 reads (depending on page size) Lifetime: 100,000-1 million writes per page

  14. SSD VS HDD COST

  15. Amazon EC2 (2014) Compute Units Local Storage Machine Memory (GB) Cost / hour (ECU) (GB) t1.micro 0.615 1 0 $0.02 m1.xlarge 15 8 1680 $0.48 88 cc2.8xlarge 60.5 3360 $2.40 (Xeon 2670) 1 ECU = CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor

  16. Amazon EC2 (2018) Compute Units Local Storage Machine Memory (GB) Cost / hour (ECU) (GB) t2.nano 0.5 1 0 $0.0058 r5d.24xlarge 244 768 104 96 4x900 NVMe $6.912 2 TB x1.32xlarge 4 * Xeon E7 3.4 TB (SSD) $13.338 8 Nvidia Tesla 488 GB p3.16xlarge 0 $24.48 V100 GPUs

  17. Ethernet Bandwidth Growing 33-40% per year ! 2017 2002 1998 1995

  18. DISCUSSION Scale up vs. Scale out: When does scale up win ? How do GPUs change the above discussion ?

  19. DATACENTER ARCHITECHTURE Memory Bus PCIe Ethernet SATA Server Server

  20. STORAGE HIERARCHY (PAPER)

  21. STORAGE HIERARCHY Colin Scott: https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html

  22. Scale Out: Warehouse-Scale Computers Many concerns – Infrastructure Single organization – Networking Homogeneity (to some extent) – Storage Cost efficiency at scale – Software – Multiplexing across applications and services – Power/Energy – Rent it out! – Failure/Recovery – …

  23. DISCUSSION Comparison with supercomputers - Compute vs. Data centric - Shared storage - Highly reliable components

  24. SOFTWARE IMPLICATIONS Reliability Storage Hierarchy Workload Diversity Single organization

  25. Three Categories of Software 1. Platform-level – Software firmware that are present in every machine 2. Cluster-level – Distributed systems to enable everything 3. Application-level – User-facing applications built on top

  26. WORKLOAD: Partition-Aggregate BigData Top-level Aggregator Mid-level Aggregators Workers

  27. WORKLOAD: Map-Reduce Map Stage Reduce Stage

  28. WORKLOAD PATTERNS

  29. SOFTWARE CHALLENGES 1. Fault tolerance in software 2. Tail at Scale – Why ? 3. Handling traffic variations 4. Comparison with HPC software ?

  30. BREAK !

  31. Google Maps: A Planet-Scale Playground for Computer Scientists Luiz Barosso Tuesday, September 11, 2018 - 4:00pm to 5:00pm 1240 CS

  32. Datacenter Networks Memory Bus PCIe Ethernet SATA Server Server

  33. Datacenter Networks Core Traditional hierarchical topology – Expensive – Difficult to scale Agg. – High oversubscription Edge – Smaller path diversity – …

  34. Datacenter Networks Core Clos topology – Cheaper – Easier to scale Agg. – NO/low oversubscription Edge – Higher path diversity – …

  35. Datacenter Topology: Clos aka Fat-tree k pods, where each pod has two layers of k/2 switches Each pod consists of (k/2) 2 servers

  36. Datacenter Topology: Clos aka Fat-tree Each edge switch connects to k/2 servers & k/2 aggr. Switches Each aggr. switch connects to k/2 edge & k/2 core switches (k/2) 2 core switches: each connects to k pods

  37. Datacenter Traffic North-South Traffic Core Aggregation Rack East-West Traffic

  38. East-West Traffic Traffic between servers in the datacenter Communication within “big data” computations Traffic may shift on small timescales (< minutes)

  39. Datacenter Traffic Characteristics

  40. Datacenter Traffic Characteristics Two key characteristics – Most flows are small – Most bytes come from large flows Applications want – High bandwidth (large flows) – Low latency (small flows)

  41. What Do We Want? Want to be able to run applications anywhere Want to be able to migrate applications while they are running Want to balance traffic across all these paths in the network Want to fully utilize all the resources we have …

  42. Using Multiple Paths Well 10.4.1.1 10.4.1.2 10.4.2.1 10.4.2.2 10.2.2.1 10.0.2.1 Aggregation 10.2.0.1 10.0.1.1 10.0.1.2 10.2.0.2 10.2.0.3

  43. Forwarding to D to D to D from A from C from B (to D) (to D) (to D) Per-flow load balancing (ECMP , “Equal Cost Multi Path”) – E.g., based on (src and dst IP and port)

  44. Forwarding to D to D to D from A from C from B (to D) (to D) (to D) Per-flow load balancing (ECMP) – A flow follows a single path – Suboptimal load-balancing; elephants are a problem

  45. Solution 1: Topology-aware addressing 10.4.1.1 10.4.1.2 10.4.2.1 10.4.2.2 10.2.2.1 10.0.2.1 Aggregation 10.2.0.1 10.0.1.1 10.0.1.2 10.2.0.2 10.2.0.3 10.3.*.* 10.2.*.* 10.1.*.* 10.0.*.*

  46. Solution 1: Topology-aware addressing 10.4.1.1 10.4.1.2 10.4.2.1 10.4.2.2 10.2.2.1 10.0.2.1 Aggregation 10.2.0.1 10.0.1.1 10.0.1.2 10.2.0.2 10.2.0.3 10.0.1.* 10.1.0.* 10.2.1.* 10.3.1.* 10.0.0.* 10.1.1.* 10.2.0.* 10.3.0.*

  47. Solution 1: Topology-aware addressing 10.4.1.1 10.4.1.2 10.4.2.1 10.4.2.2 10.2.2.1 10.0.2.1 Aggregation 10.2.0.1 10.0.1.1 10.0.1.2 10.2.0.2 10.2.0.3

  48. Solution 1: Topology-aware addressing Addresses embed location in regular topology Maximum #entries/switch: k ( = 4 in example) – Constant, independent of #destinations! No route computation / messages / protocols – Topology is hard-coded, but still need localized link failure detection Problems? – VM migration: ideally, VM keeps its IP address when it moves – Vulnerable to (topology/addresses) misconfiguration

  49. Solution 2: Centralize + Source Routes Centralized “controller” server knows topology and computes routes Controller hands server all paths to each destination – O(#destinations) state per server, but server memory cheap (e.g., 1M routes x 100B/route=100MB) Server inserts entire path vector into packet header (“source routing”) – E.g., header=[dst=D | index=0 | path={S5,S1,S2,S9}] Switch forwards based on packet header – index++; next-hop = path[index]

  50. Solution 2: Centralize + Source Routes #entries per switch? – None! #routing messages? – Akin to a broadcast from controller to all servers Pro: – Switches very simple and scalable – Flexibility: end-points control route selection Cons: – Scalability / robustness of controller (SDN issue) – Clean-slate design of everything

  51. VL2 SUMMARY 1. High capacity: Clos topology + Valiant Load Balancing 2. Flat addressing: Directory service

  52. NEXT STEPS 9/13 class on Storage Systems Presentations due day before! Fill out preference form

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend