tdlo
play

tdlo CS 744: DATACENTER AS A COMPUTER Shivaram Venkataraman Fall - PowerPoint PPT Presentation

tdlo CS 744: DATACENTER AS A COMPUTER Shivaram Venkataraman Fall 2020 ANNOUNCEMENTS - Assignments Piazza - Assignment zero is due! - Form groups for Assignment 1 on Piazza Thursday - Class format - Review - Lecture -


  1. tdlo CS 744: DATACENTER AS A COMPUTER Shivaram Venkataraman Fall 2020

  2. ANNOUNCEMENTS - Assignments Piazza - Assignment zero is due! → - Form groups for Assignment 1 on Piazza ↳ Thursday - Class format - Review - Lecture - Discussion

  3. Applications Machine Learning SQL Streaming Graph Application ¥ Computational Engines ' Scalable Storage Systems Arch [ . Resource Management > Hardware → Architecture Datacenter Architecture

  4. OUTLINE - Hardware Trends - Datacenter design - WSC workloads - Discussion

  5. Why is One Machine Not Enough? parallelism limited - enough resources not → ^ ) high could be Cost - contd maqn.ge Redundancy → - high volumes are Data - - - slow →

  6. What’s in a Machine? DRAM y Procecnpgr f. Interconnected compute and storage Memory Bus Newer Hardware - GPUs, FPGAs PCIe v4 - RDMA, NVlink → Ssp Ethernet SATA HDD →

  7. Scale Up: Make More Powerful Machines Moore’s law ? ? O – Stated 52 years ago by Intel • / founder Gordon Moore – Number of transistors on microchip double every 2 years – Today “closer to 2.5 years” Intel CEO Brian Krzanich

  8. Dennard Scaling is the Problem Core 32 or core If Suggested that power requirements are proportional ¥ to the area for transistors – Both voltage and current being proportional to length – Stated in 1974 by Robert H. Dennard (DRAM inventor) “Adapting to Thrive in a New Economy of Memory Abundance,” Bresniker et al Broken since 2005

  9. ⇒ Dennard Scaling is the Problem Performance per-core is stalled I Number of cores is increasing “Adapting to Thrive in a New Economy of Memory Abundance,” Bresniker et al

  10. ft Memory TRENDS Cop awk or loot of t 's tater a B pi f - GB Is lo - 15 - core per log 100 M = DRAM O '

  11. MEMORY TAKEAWAY Growing Data access from memory is getting more expensive ! +15% per year

  12. HDD CAPACITY storage - Back blaze - - → backup O O O

  13. HDD BANDWIDTH HM bandwidth read MB Is - 200 100 I Disk bandwidth is not growing

  14. SSDs Performance: HDD of latency – Reads: 25us latency moms - – Write: 200us latency deleting data expensive is ~ – Erase: 1,5 ms overwriting - Steady state, when SSD full – One erase every 64 or 128 reads (depending on page size) Lifetime: 100,000-1 million writes per page

  15. SSD VS HDD COST O " " O qq.FEYsn.tn O O - - - - - - -

  16. 100 MB Is Disk I Ethernet Bandwidth " " : r.oas.e.fi Growing 33-40% per year ! 2017 2002 1998 1995 o

  17. AMAZON EC2 (2019) - t drive Flash tf Yat g

  18. TRENDS SUMMARY CPU speed per core is flat Memory bandwidth growing slower than capacity SSD, NVMe replacing HDDs of limitations Ethernet bandwidth growing ? machine Single

  19. net rack offer DATACENTER ARCHITECHTURE gas T Racks with fitches now µ racks → Memory Bus → PCIe → → Ethernet → → SATA Server Server

  20. STORAGE HIERARCHY (DC AS A COMPUTER v2) = ↳ I 201 Or G - I a ::¥¥÷ : - GBH @ → 100M Bb -

  21. Warehouse-Scale Computers Many concerns o – Infrastructure Single organization : – Networking Homogeneity (to some extent) - 19000 getters – Storage Cost efficiency at scale r - ← – Software – Multiplexing across = applications and services – Power/Energy - – Rent it out! – Failure/Recovery – …

  22. SOFTWARE IMPLICATIONS Component → Reliability failures Storage Hierarchy - Workload Diversity Single organization -

  23. WORKLOAD: Partition-Aggregate - - BigData - latency low Top-level Aggregator ijhtkggiegeted Mid-level Aggregators fry Workers shard ed Index

  24. WORKLOAD: SCHOLAR SIMILARITY " mapped → → I quit → Not e Mir → re µ . I .÷÷:w . . Map Stage Reduce Stage

  25. intensive paralleling VIDEO ENCODING compute f fragments TV f K " youtube → ' daleth ly v .

  26. Wsc → MACHINE LEARNING grain we

  27. DISCUSSION https://forms.gle/CrrrhCPYHerwXNEt5

  28. Discussion sale Out Scale up Scale-up vs Scale-out parallelism doesn't have app your If communication → ← overkill dataset small Fault tolerance -8 - you to - as Miriam coiffeur > pay peggy 10 . 000 I

  29. ↳ DISCUSSION Microsoft Word vs. online document editor like Google Docs Word Docs challenge release Yearly is . , collaboration consistency a path monthly anywhere , Access it from - - Machine I hardware patches I release online compatibility Leek tag permanent redundancy → storage 99.99% uptime

  30. DISCUSSION * 99% having well Even work servers makes Parallelism worse latencies tail 0 O X tin only ) - C have slowdown #

  31. NEXT STEPS Next class: Storage Systems Assignment 1 out Thursday. Submit groups before that! Wait list

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend