datacenter application interference
play

Datacenter application interference CMPs (popular in datacenters) - PowerPoint PPT Presentation

Datacenter application interference CMPs (popular in datacenters) offer increased throughput and reduced power consumption They also increase resource sharing between applications, which can result in negative interference. 1 Resource


  1. Datacenter application interference  CMPs (popular in datacenters) offer increased throughput and reduced power consumption  They also increase resource sharing between applications, which can result in negative interference. 1

  2. Resource contention is well studied … at least on single machines. 3 main methods: (1) Gladiator style match-ups (2) Static analysis to predict application resource usage (3) Measure benchmark resource usage; apply to live applications 2

  3. New methodology for understanding datacenter interference is needed. One that can handle complexities of a datacenter:  (10s of) thousands of applications  real user inputs  production hardware  financially feasible  low overhead Hardware counter measurements of live applications. 3

  4. Our contributions 1. ID complexities in datacenters 2. New measurement methodology 3. First large-scale study of measured interference on live datacenter applications. 4

  5. Complexities of understanding application interference in a datacenter 5

  6. Large chips and high core utilizations Profiling 1000 12-core, 24-hyperthread Google servers running production workloads revealed the average machine had >14/24 HW threads in use. 6

  7. Heterogeneous application mixes Often applications have more than one co-runner on a machine. Observed 0-1 Co-runners max of 19 unique co- 2-3 Co-runners runner 4+ Co-runners threads (out of 24 HW threads). 7

  8. Application complexities  Fuzzy definitions  Varying and sometimes unpredictable inputs  Unknown optimal performance 8

  9. Hardware & Economic Complexities  Varying micro-arch platforms  Necessity for low overhead = limited measurement capabilities  Corporate policies 9

  10. Measurement methodology 10

  11. Measurement Methodology The goal: A generic methodology to collect application interference data on live production datacenter servers 11

  12. Measurement Methodology Time App. A App. B 12

  13. Measurement Methodology 1. 1. Use sample- based monitoring to collect per machine per core event (HW counter) sample data. 13

  14. Measurement Methodology 2 M instrs 1 1 2 M instrs 2 M instrs 2 2 M instrs 2 2 M instrs 3 2 M instrs 4 3 2 M instrs 5 2 M instrs 4 2 M instrs 6 2 M instrs App. A App. B 14

  15. Measurement Methodology 2. 2. Identify sample sized co-runner relationships … 15

  16. Measurement Methodology Samples A:1- A:6 are co-runners with App. B. Samples B:1- B:4 are co-runners with App. A. App. A App. B 16

  17. Measurement Methodology Say that a new App. C starts running on CPU 1… … B:4 no longer has a co-runner. App. A App. C App. B 17

  18. Measurement Methodology 3. Filter relationships by arch. independent 3. interference classes… 18

  19. Measurement Methodology Be on opp. sockets. 19

  20. Measurement Methodology Share only I/O 20

  21. Measurement Methodology 4. Aggregate equivalent co- schedules . 4. 21

  22. Measurement Methodology For example: • Aggregate all the samples of App. A that have App. B as a shared core co- runner. • Aggregate all samples of App. A that have App. B as a shared core co-runner and App. C as a shared socket co- runner. 22

  23. Measurement Methodology 5. Finally, calculate statistical indicators (means, medians) to get a midpoint 5. performance for app. interference comparisons 23

  24. Measurement Methodology Avg. IPC = 2.0 Avg. IPC = 1.5 App. A App. B 24

  25. Applying the measurement methodology at Google. 25

  26. Applying the Methodology @ Google Experiment Details: Method: Event Instrs  IPC 1. Collect Sampling period 2.5 Million samples Number of machines* 1000 * All had Intel Westmere chips (24 hyperthreads, 12 cores), matching clock speed, RAM, O/S 26

  27. Applying the Methodology @ Google Experiment Details: Method: Event Instrs  IPC 1. Collect Sampling period 2.5 Million samples Number of machines* 1000 * All had Intel Westmere chips (24 hyperthreads, 12 cores), matching clock speed, RAM, O/S 2. ID sample Collection results: size Unique binary apps 1102 relationships Co-runner relationships (top 8 apps) 3. Filter by Avg. shared core rel’ns 1M (min 2K) interference Avg. shared socket 9.5M (min 12K) classes Avg. opposite socket 11M (min 14K) 27

  28. Applying the Methodology @ Google Method: 4. Aggregate equiv. schedules 5. Calculate statistical indicators 28

  29. Analyze Interference streeview ’s IPC changes with top co-runners Overall median IPC across 1102 applications 29

  30. Beyond noisy interferers (shared core) Less or pos. interference Base Application Noisy data Negative interference Co-running applications 30

  31. Beyond noisy interferers (shared core) Less or pos. Base Applications interference Noisy data Negative interference Co-running applications * Recall minimum pair has 2K samples; medians across full grid of 1102 apps 31

  32. Performance Strategies  Restrict negative beyond noisy interferers (or encourage positive interferers as co-runners)  Isolate sensitive or antagonistic applications 32

  33. Takeaways 1. New datacenter application interference studies can use our identified complexities as a check list. 2. Our measurement methodology (verified at Google in 1st large-scale measurements of live datacenter interference), is generally applicable and shows promising initial performance opportunities. 33

  34. Questions? melanie@cs.columbia.edu http://arcade.cs.columbia.edu/ 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend