ECHO: Recreating Network Traffic Maps for Datacenters with Tens of - PowerPoint PPT Presentation

ECHO: Recreating Network Traffic Maps for Datacenters with Tens of Thousands of Servers Christina Delimitrou 1 , Sriram Sankar 2 , Aman Kansal 3 , Christos Kozyrakis 1 1 Stanford University 2 Microsoft 3 Microsoft Research IISWC – November 5 th 2012

Motivation Network Performance and Efficiency  critical for DC operation  Scalable Topologies  Dragonfly, Fat tree, Clos, etc.  Hotspot detection & elimination  Flow Control  Load balancing  Speculative flow control  Hedera, etc.  Network Switches Design  Low latency RPCs  RAMCloud, etc.  Software-defined DC networks  OpenFlow  Nicira, etc. 2

Challenge Where to find representative traffic patterns?? 3

Executive Summary  Network Workload Model: A scheme that accurately and concisely captures the traffic of a DC workload  User patterns only emerge in large-scale  scalability  Different level of detail per application  modularity/configurability  Prior work on network modeling  mostly single-node, temporal behavior  No spatial patterns, scalability and modularity  ECHO addresses limitations of previous schemes:  System-wide network modeling: Not confined to a single-node  Locality-aware: Accounts for spatial network traffic patterns  Hierarchical: Adjusts the level of granularity to the needs of each app/study  Scalable: Scales to DCs with ~30,000 servers  Lightweight: Low and upper-bound modeling overheads  Validated: ECHO is validated against real traces from applications in production DCs 4

Outline  Simple Temporal Model  DC Network Traffic Characterization  ECHO Design  Model Validation 5

Distribution Fitting Model  Most well-known modeling approach for network  Single-node as opposed to system-wide!  Capture temporal patterns in per-server network traffic  Identify known distributions (e.g., Gaussian, Poisson, Zipf, etc. ) in network activity traces  Represent server network activity as a superposition of identified distributions 6

Distribution Fitting Model  Capture temporal patterns in per- server network traffic 1 2  Identify known distributions (e.g., Gaus- sian, Poisson, Zipf, etc. ) in network activity traces  Represent server network activity as a 3 superposition of identified distributions  Model = Gaussian + 4 5 Exponential + Gaussian + Gaussian + Constant Validation: Deviation between original and synthetic is 4.9% on average 7

Distribution Fitting Model Positive:  Simple, accurate and concise  Captures temporal patterns in network activity  Facilitates traffic characterization (traffic is expressed as well-studied distributions) Negative: Does not track spatial patterns × Bursts in network activity not easily emulated by known distributions  × would complicate the model Non-modular design × 8

Methodology  Workloads:  Entire Websearch application  Combine  Websearch query results aggregator  Render  Websearch query results display  Experimental systems are production DCs with:  30,000 servers running Websearch  360 servers running Combine  1350 servers running Render  We collect per-server bandwidth traces of data sent and received over a period of 5 months (at 5msec granularity) 10

Understanding Network-wide Behavior  Temporal variations of network traffic  Fluctuation over time  Differences between workloads  Average spatial patterns in network activity  Locality in network traffic  Impact of application functionality to locality  Temporal variations in spatial patterns  Changes over different time scales  Changes for different types of workloads 11

Temporal Variations in Network Traffic  Most servers are greatly underutilized  significant overprovisioning for latency-critical apps  Some servers have higher utilization  mostly well load-balanced  Similarity in network activity patterns over time  Model should: capture fluctuation, remove information redundancy 12

Temporal Variations in Network Traffic  Clearer diurnal patterns  31 dark and 31 light vertical bands 13

Temporal Variations in Network Traffic  Clearer diurnal patterns  31 dark and 31 light vertical bands  Higher utilization  not as much overprovisioning for servers that aggregate query results 14

Temporal Variations in Network Traffic  Clearer diurnal patterns  31 dark and 31 light vertical bands  Higher utilization  not as much overprovisioning for servers that aggregate query results  Not equally load-balanced  impact of queries serviced by each server 15

Spatial Patterns in Network Activity  High spatial locality  Most accesses are confined within the same rack  The model should preserve the spatial locality (within racks & hotspots) 16

Spatial Patterns in Network Activity  High spatial locality  Most accesses are confined within the same rack  The model should preserve the spatial locality (within racks & hotspots)  A few servers communicate with most of the machines  cluster scheduler, aggregators, monitoring servers 17

Spatial Patterns in Network Activity  In contrast, Combine has less spatial locality  most servers talk to many machines  Consistent with its functionality  query aggregation 18

Fluctuations in Spatial Patterns  At first glance spatial locality is very similar across months 19

Fluctuations in Spatial Patterns  At first glance spatial locality is very similar across months  However, at finer granularity there are differences 20

Fluctuations in Spatial Patterns  At first glance spatial locality is very similar across months  However, at finer granularity there are differences  Software updates  Changes in traffic due to user load  Background processes (e.g., garbage collection, logging, etc. ) 21

Fluctuations in Spatial Patterns  At first glance spatial locality is very similar across months  However, at finer granularity there are differences  Software updates  Changes in traffic due to user load  Background processes (e.g., garbage collection, logging, etc. )  Fine-grain patterns important for studies focused on specific hours of the day 22

Model Requirements Don’t just model a node. Model the whole DC! Requirements: Average activity over time and space 1. Per-server activity fluctuation over time 2. Spatial patterns in network traffic 3. Individual server-to-server communication 4. 24

Model Design – Spatial Aspects  Hierarchical Markov Chain: groups of racks  racks  individual servers  Configurable granularity based on app/study requirements  Captures spatial patterns in network traffic: fine-grain transitions are explored within each coarse state  most locality confined within a rack 25

Model Design – Temporal Aspects 3 2 4 1 5  Captures temporal patterns in network traffic  multiple models used over time  Number of models is a function of the workload’s activity fluctuations  Switching between models allows compression in replay  fast experimentation 26

Hierarchical vs. Flat Model vs  Hierarchical: explore fine grain transitions within coarse states  Flat: explore all fine grain states  exponential increase in transition count  Even for problems with a few hundred servers the model becomes intractable  No loss in accuracy with the hierarchical model since locality is mostly confined within racks 27

Model Construction p 12 = 90% 8KB, rd, 10msec  Collect system-wide network activity traces  Cluster network requests based on  Sender/receiver server IDs  Type (rd/wr) and size of request (MB)  Inter-arrival time between requests (ms)  Compute transition probabilities between states (e.g., S1  S2: 90% 8KB read requests, 10msec inter-arrival time) 28

Cloud Node: Modeling Server Subsets  Focus on specific interesting activity patterns  Validating the model in server subsets (a few hundred servers)  Network activity is not necessarily self- contained in those server subsets  Cloud Node: Emulate all network activity to and from servers external to the studied server subset  Maintains accuracy of per-server load while enabling more fine-grain validation 29

Validation Temporal variations of network activity 1. Spatial patterns of network activity 2. Individual server interactions (one-to-one communication 3. patterns) 31

Validation – Temporal Patterns Original Original 2 1 Model Model Original 3 Model  Less than 8% deviation between original and synthetic workload, on average across server subsets 32

Validation – Spatial Patterns Model Original 2 Original 1 Original Model 3 Model  Less than 10% deviation between original and synthetic workload, on average across server subsets 33

Validation – Indiv. Server Interactions  12% deviation between original and synthetic for a weekday  9% deviation between original and synthetic for a day of the weekend 34

ECHO: Recreating Network Traffic Maps for Datacenters with Tens of - PowerPoint PPT Presentation

ECHO: Recreating Network Traffic Maps for Datacenters with Tens of Thousands of Servers Christina Delimitrou 1 , Sriram Sankar 2 , Aman Kansal 3 , Christos Kozyrakis 1 1 Stanford University 2 Microsoft 3 Microsoft Research IISWC November 5 th

Three tales of echo by Brayden Isaiah and Kobie Echo Echo lost her voice because Hera knew what

OVERVIEW What is ECHO? Why did our hospice become involved with ECHO? How did we ECHO?

The three tales of Echo By Alexis and Bella :) Interesting facts Did you know that Echo was a

East Hamilton City ARC Hamilton Population 2036 2011 Baby Boom *** Baby Boom Echo Echo

PHP/MySQL Michael Powell Basic PHP Echo Statement: <?php echo Hello World; ?>

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

P3- 60 Recreating Spaces for Making richly designed, pedestrian oriented and economically

Trusted End Host Monitors for Securing Cloud Datacenters Alan Shieh Srikanth Kandula

Performance Datacenters HotNets15 Xpander: Unveiling the Secrets of High-Performance

Reliable Communication for Datacenters Mahesh Balakrishnan Cornell University Mahesh

(5.2) Project ECHO Case Presentation Submission Overview The West Virginia Project ECHO realizes

Project ECHO Case Presentation Submission Overview The West Virginia Project ECHO realizes the

EAE Echo Accreditation EAE Echo Accreditation Dr Kevin Fox Accreditation Project Lead, EAE Board

BLOGS ARE ECHO CHAMBERS: BLOGS ARE ECHO CHAMBERS Eric Gilbert | Tony Bergstrom | Karrie Karahalios

University of Ferrara Heavy ions collision evolution modeling with ECHO-QGP Valentina Rolando

An enumerative relationship between maps and 4-regular maps Michael La Croix April 9, 2008 An

IIRP World Conference Sharon mast, Facilitator October 2013 What is the #1 Asset in Your

Welcome To Mindset Mastery Lesson 5.2 SUBTITLE Your One Thing. Laser Focus Your Attention align

PROFESSIONAL STANDARDS IN YOUTH WORK An Introduction KO WAI AU? Penny Prescott Manager,

Computational methods for text analysis BA program Sociology and Social Informatics Kirill

MAST University of Cantabria Santander (Spain) [*] Funded by CICYT (TIC99-1043-C03-03 and 1FD

Recitation 3: Synchronisation Kai Mast E L 3 3 T H4 X0 R ? Which of the following is

Present and Future Changes in in Low-Level Win ind Cir irculation in in Mexico . Garca

WHAT CAN THE SENTIMENT OF A SOFTWARE REQUIREMENTS SPECIFICATION DOCUMENT TELL US? Colin Werner,

ECHO: Recreating Network Traffic Maps for Datacenters with Tens of - PowerPoint PPT Presentation

ECHO: Recreating Network Traffic Maps for Datacenters with Tens of Thousands of Servers Christina Delimitrou 1 , Sriram Sankar 2 , Aman Kansal 3 , Christos Kozyrakis 1 1 Stanford University 2 Microsoft 3 Microsoft Research IISWC November 5 th

Three tales of echo by Brayden Isaiah and Kobie Echo Echo lost her voice because Hera knew what

OVERVIEW What is ECHO? Why did our hospice become involved with ECHO? How did we ECHO?

The three tales of Echo By Alexis and Bella :) Interesting facts Did you know that Echo was a

East Hamilton City ARC Hamilton Population 2036 2011 Baby Boom *** Baby Boom Echo Echo

PHP/MySQL Michael Powell Basic PHP Echo Statement: &lt;?php echo Hello World; ?&gt;

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

P3- 60 Recreating Spaces for Making richly designed, pedestrian oriented and economically

Trusted End Host Monitors for Securing Cloud Datacenters Alan Shieh Srikanth Kandula

Performance Datacenters HotNets15 Xpander: Unveiling the Secrets of High-Performance

Reliable Communication for Datacenters Mahesh Balakrishnan Cornell University Mahesh

(5.2) Project ECHO Case Presentation Submission Overview The West Virginia Project ECHO realizes

Project ECHO Case Presentation Submission Overview The West Virginia Project ECHO realizes the

EAE Echo Accreditation EAE Echo Accreditation Dr Kevin Fox Accreditation Project Lead, EAE Board

BLOGS ARE ECHO CHAMBERS: BLOGS ARE ECHO CHAMBERS Eric Gilbert | Tony Bergstrom | Karrie Karahalios

University of Ferrara Heavy ions collision evolution modeling with ECHO-QGP Valentina Rolando

An enumerative relationship between maps and 4-regular maps Michael La Croix April 9, 2008 An

IIRP World Conference Sharon mast, Facilitator October 2013 What is the #1 Asset in Your

Welcome To Mindset Mastery Lesson 5.2 SUBTITLE Your One Thing. Laser Focus Your Attention align

PROFESSIONAL STANDARDS IN YOUTH WORK An Introduction KO WAI AU? Penny Prescott Manager,

Computational methods for text analysis BA program Sociology and Social Informatics Kirill

MAST University of Cantabria Santander (Spain) [*] Funded by CICYT (TIC99-1043-C03-03 and 1FD

Recitation 3: Synchronisation Kai Mast E L 3 3 T H4 X0 R ? Which of the following is

Present and Future Changes in in Low-Level Win ind Cir irculation in in Mexico . Garca

WHAT CAN THE SENTIMENT OF A SOFTWARE REQUIREMENTS SPECIFICATION DOCUMENT TELL US? Colin Werner,

PHP/MySQL Michael Powell Basic PHP Echo Statement: <?php echo Hello World; ?>