i n c loud c omputing
play

I N C LOUD C OMPUTING Christina Delimitrou Stanford University - PowerPoint PPT Presentation

I MPROVING R ESOURCE E FFICIENCY I N C LOUD C OMPUTING Christina Delimitrou Stanford University Defense May 26 th 2015 Resource efficiency is a first-order system constraint How efficiently do we utilize resources?


  1. I MPROVING R ESOURCE E FFICIENCY I N C LOUD C OMPUTING Christina Delimitrou Stanford University Defense ¡– ¡May ¡26 th ¡2015 ¡ ¡

  2. Resource efficiency is a first-order system constraint How efficiently do we utilize resources? How efficiently do we utilize resources? How efficiently do we design systems? 2

  3. Why Care about Resource Efficiency? Performance/Cost Time Performance/Cost Time 3

  4. ~10K commodity servers Sophisticated cluster managers ~10s MWatts $100,000,000s Private clouds: • Google, Microsoft, Twitter, eBay Public clouds: • Amazon EC2, Windows Azure, GCE 4

  5. The Promise of Cloud Computing ¨ Flexibility ¤ Provision and launch new services in seconds ¨ High performance ¤ High throughput & low tail latency ¨ Cost effectiveness ¤ Low capital & operational expenses Cloud computing scalability: high performance AND low cost 5

  6. The Reality of Cloud Computing 6

  7. Scaling Datacenters ¨ Switch to commodity servers One time trick ¨ Improve cooling/power distribution < 10% ¨ Build more datacenters >$300M per datacenter ¨ Add more servers Power limit End of voltage scaling ¨ Rely on processor technology Use existing systems more efficiently 7

  8. Datacenter Underutilization Twitter (Mesos) 1 Google (Borg) 2 4-5x 3-5x 0 10 20 30 40 50 60 70 80 90 100 CPU Utilization (%) 1 C. Delimitrou and C. Kozyrakis. Quasar: Resource-Efficient and QoS-Aware Cluster Management, ASPLOS 2014. 2 L. A. Barroso, U. Holzle. The Datacenter as a Computer, 2013. 8

  9. Datacenter Underutilization… Is the cluster manager’s fault Is the user’s fault! 9

  10. Reserved vs. Used Resources 1.5-2x 3-5x ¨ Twitter: up to 5x CPU & up to 2x memory overprovisioning 10

  11. Reserved vs. Used Resources ~25,000 jobs 936 distinct users [ASPLOS’14] Reservation=Usage ¨ 20% of job under-sized, ~70% of jobs over-sized 11

  12. Datacenter Underutilization… Is the user’s fault! (not really…) 12

  13. Resource Management is Hard 13

  14. Performance Depends on Scale-up Performance Cores 14

  15. Performance Depends on Heterogeneity Performance Cores 15

  16. Performance Depends on Heterogeneity Scale-out Performance Performance Servers Cores 16

  17. Performance Depends on Heterogeneity Scale-out Performance Performance Servers Cores Input load Performance Input size 17

  18. Performance Depends on Heterogeneity Scale-out Performance Performance Overprovision Reservations! Servers Cores When sw changes, when platforms change, etc. Input load Interference Performance Performance Input size Interference 18

  19. Can we improve resource efficiency while preserving application QoS guarantees? Potential: 3-5x efficiency; $10Ms in cost savings 19

  20. Requirements ¨ Automate resource management ¤ Large, multi-dimensional space à Leverage big data ¨ General solution ¤ Different application types (batch, latency-critical) ¤ Different types of hardware ¨ Cross-layer design ¤ Architecture à OS à Scheduler à Application design 20

  21. Contributions 21

  22. Contributions Paragon [ASPLOS’13, TopPicks’14] [IISWC’13] Resource reservations Users Scheduler Cluster 1. Practical data mining 22

  23. Contributions Quasar [ASPLOS’14] 2. High level interface Resource Users Scheduler Cluster reservations 1. Practical data mining 23

  24. Contributions Systems: Application assignment : Paragon [ASPLOS’13, TopPicks’14, CAL’13, IISWC’13] Cluster management: Quasar [ASPLOS’14] 24

  25. Contributions Systems: Application assignment : Paragon [ASPLOS’13, TopPicks’14] , iBench [IISWC’13] Cluster management: Quasar [ASPLOS’14] Scalable scheduling: Tarcil [SOCC’15] 25

  26. Contributions Systems: Application assignment : Paragon [ASPLOS’13, TopPicks’14] , iBench [IISWC’13] Cluster management: Quasar [ASPLOS’14] Scalable scheduling: Tarcil [SOCC’15] Cloud provisioning: Hybrid Cloud [in submission] 26

  27. Contributions Systems: Application assignment : Paragon [ASPLOS’13, TopPicks’14] , iBench [IISWC’13] Cluster management: Quasar [ASPLOS’14] Scalable scheduling: Tarcil [SOCC’15] Cloud provisioning: Hybrid Cloud [in submission] Admission control: ARQ [ICAC’13] 27

  28. Contributions Systems: Application assignment : Paragon [ASPLOS’13, TopPicks’14] , iBench [IISWC’13] Cluster management: Quasar [ASPLOS’14] Scalable scheduling: Tarcil [SOCC’15] Cloud provisioning: Hybrid Cloud [in submission] Admission control: ARQ [ICAC’13] Datacenter application modeling: ECHO [IISWC’12], Storage application modeling [CAL’12, IISWC’11, 28 ISPASS’11]

  29. Paragon [ASPLOS’13, TopPicks’14] Resource reservations Scheduler Users Cluster Practical data mining techniques 29

  30. Heterogeneity & Interference Matter ¨ Heterogeneity Ignore Heterogeneity Ignore Both ¤ DCs provisioned over 15 years ¤ Multiple server generations & configurations ¨ Interference ¤ Apps contend on shared resources n CPU & cache hierarchy n Memory system n Storage & network I/O 30

  31. Extracting Resource Preferences ¨ Naïve: exhaustive characterization ¤ ~10-20 platforms x 1,000 apps Resource reservations App App App Users Scheduler Cluster App App Mine Data big data ¨ Looks like a recommendation problem 31

  32. Recommendation Systems ¨ Content-based systems: ¤ Description of items (keywords, feature vector, etc. ) ¤ Profile of user preferences (history, model, user-system interaction, etc. ) ¨ Collaborative filtering: ¤ Uncover similarities between users and items ¤ No need to know item features or explicit user preferences in advance 32

  33. Recommendation Systems ¨ Content-based systems: ¤ Description of items (keywords, feature vector, etc. ) ¤ Profile of user preferences (history, model, user-system interaction, etc. ) ¨ Collaborative filtering: ¤ Uncover similarities between users and items ¤ No need to know item features or explicit user preferences in advance 33

  34. Something familiar… ¨ Collaborative filtering – similar to Netflix Challenge system ¤ Singular Value Decomposition (SVD) + PQ reconstruction (SGD) movies movies 5 4 1 3 5 4 5 4 3 1 3 3 4 1 5 3 3 3 2 4 4 3 5 2 4 1 2 users SVD SVD Recommendations 1 5 2 1 3 5 5 3 1 PQ reconstruction 2 3 1 4 2 3 4 3 2 3 5 3 2 4 3 5 5 5 2 3 2 1 3 4 5 3 4 Dense utility matrix Sparse utility matrix 34

  35. SVD m 1 m 2 … m n movie ! $ u 1 a 11 a 12 ... a 1 n # & user u 2 a 21 a 22 ... a 2 n # & # &     … rating (e.g., ) # & u m a m 1 a m 2 ... a mn # & " % = m 1 … m n ! $ ! $ ! $ u 11 ... u 1 r ... 0 v 11 ... v 1 r u 1 σ 1 # & # & # & x x          # & # & # & … # & # & # & u m 1 ... u mr 0 ... v n 1 ... v nr σ r u m # & # & # & " % " % " % 35

  36. SVD m 1 m 2 … m n movie ! $ u 1 a 11 a 12 ... a 1 n # & user u 2 a 21 a 22 ... a 2 n # & # &     … rating (e.g., ) # & u m a m 1 a m 2 ... a mn # & correlation of user " % to similarity concept = m 1 … m n ! $ ! $ ! $ u 11 ... u 1 r ... 0 v 11 ... v 1 r u 1 σ 1 # & # & # & x x          # & # & # & … # & # & # & u m 1 ... u mr 0 ... v n 1 ... v nr σ r u m # & # & # & " % " % " % similarity concept correlation of movie to similarity concept 36

  37. Heterogeneity Classification … Movie 1 Movie 2 Movie 3 Movie 4 Movie 5 Movie M User A User B … User N 37

  38. Heterogeneity Classification … Platform 1 Platform 2 Platform 3 Platform 4 Platform 5 Platform M User A User B … User N 38

  39. Heterogeneity Classification … Platform 1 Platform 2 Platform 3 Platform 4 Platform 5 Platform M App A App B … App N 39

  40. Heterogeneity Classification … Platform 1 Platform 2 Platform 3 Platform 4 Platform 5 Platform M App A 1,500QPS 843QPS App B 458QPS 946QPS … App N 1,016QPS 186QPS App performance 40

  41. Heterogeneity Classification … Platform 1 Platform 2 Platform 3 Platform 4 Platform 5 Platform M App A 1,500QPS 843QPS App B … App N Profiled Performance Inferred Performance 41

  42. Heterogeneity Classification … Platform 1 Platform 2 Platform 3 Platform 4 Platform 5 Platform M … App A 1,500QPS 843QPS 675QPS 843QPS 1,786QPS 8,675QPS App B … App N Profiled Performance Inferred Performance 42

  43. Heterogeneity Classification … Platform 1 Platform 2 Platform 3 Platform 4 Platform 5 Platform M … App A 1,500QPS 843QPS 675QPS 843QPS 1,786QPS 8,675QPS … App B 987QPS 458QPS 773QPS 1,073QPS 986QPS 1,836QPS … App N Profiled Performance Inferred Performance 43

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend