contributions to large scale distributed systems the
play

Contributions to Large Scale Distributed Systems The - PowerPoint PPT Presentation

Contributions to Large Scale Distributed Systems The infrastructure view point Adrien Lebre September 1, 2017 President and Examiner: Reviewers: Claude Jard, Nantes Univ. Erik Elmroth, Ume Univ. Frdric Desprez, Inria Manish


  1. Utility Computing Infrastructures • A common objective : provide computing resources 
 (both hardware and software) in a flexible, transparent, efficient, secured and reliable way • Distributed Infrastructures (since the 1990’s) • A lot of challenges (data-sharing, software/hardware heterogeneity, workload placements, isolation between applications, performance…) Frontend (Resource Management System) Alice Alice Alice Alice Storage nodes Compute (Distributed File System) nodes Alice Alice Alice 5

  2. Utility Computing Infrastructures • A common objective : provide computing resources 
 (both hardware and software) in a flexible, transparent, efficient, secured and reliable way • Distributed Infrastructures (since the 1990’s) • A lot of challenges (data-sharing, software/hardware heterogeneity, workload placements, isolation between applications, performance…) Frontend (Resource Management System) Alice Alice Alice Alice Storage nodes Compute (Distributed File System) nodes Bob Alice Alice Alice 5

  3. Utility Computing Infrastructures • A common objective : provide computing resources 
 (both hardware and software) in a flexible, transparent, efficient, secured and reliable way • Distributed Infrastructures (since the 1990’s) • A lot of challenges (data-sharing, software/hardware heterogeneity, workload placements, isolation between applications, performance…) ? ? Frontend (Resource Management System) Alice Bob Alice Alice Alice Bob Storage nodes Compute (Distributed File System) nodes Bob ? Alice Alice Alice Bob 5

  4. (Dynamic VM) Placement Contributions Research activities 
 mainly supported by IMT Atlantique Inria PhD - J. Pastor 
 Co supervised with F. Desprez Locality Aware Placement PhD - F. Quesnel 
 Co supervised with M. Südholt Distributed VM Scheduler J.M. Menaud F. Hermenier Cluster-Wide context switch 10/2008 10/2009 10/2010 10/2011 10/2012 10/2013 10/2014 10/2015 10/2016 10/2017

  5. (Dynamic VM) Placement Contributions Research activities 
 mainly supported by IMT Atlantique Inria PhD - J. Pastor 
 Co supervised with F. Desprez Locality Aware Placement PhD - F. Quesnel 
 Co supervised with M. Südholt Distributed VM Scheduler J.M. Menaud F. Hermenier Cluster-Wide context switch 10/2008 10/2009 10/2010 10/2011 10/2012 10/2013 10/2014 10/2015 10/2016 10/2017

  6. Placement Problem Processors 1st 2nd • Jobs 1, 2, 3, and 4 arrive in the queue job in the job 4th job in 3rd job in queue in the the queue and have to be scheduled the queue queue Running Time Processors • FCFS + Easy back filling 
 3rd job in Although Jobs 2 and 3 have been 2nd the queue job 1st backed filled, some resources are in the job queue in the 4th job in unused (dark gray areas) queue Running the queue Time Processors • Easy back filling with preemption . 3rd job in Job 4 can be started earlier without 2nd the queue job 4th job in 1st impacting Job 1’s performance. in the the queue job queue in the queue Running Time Jobs cannot be easily preempted (OS’s internal states) Even with preemption, some resources are still not wasted 7

  7. Virtual Machine 
 The New Building Block Virtual Machines (VMs) • System virtualization : One to multiple OSes on a physical machine thanks to a hypervisor (an operating system of OSes) Hypervisor • VM Capabilities Physical Machine 
 (PM) • Suspend/Resume • Live Migration 8

  8. Virtual Machine 
 The New Building Block Virtual Machines (VMs) • System virtualization : One to multiple OSes on a physical machine thanks to a hypervisor (an operating system of OSes) Hypervisor • VM Capabilities VM 1 VM 3 VM 2 Physical Machine 
 (PM) • Suspend/Resume • Live Migration Hypervisor 8

  9. Virtual Machine 
 The New Building Block Virtual Machines (VMs) • System virtualization : One to multiple OSes on a physical machine thanks to a hypervisor (an operating system of OSes) Hypervisor • VM Capabilities VM 1 VM 3 VM 2 Physical Machine 
 (PM) • Suspend/Resume • Live Migration Hypervisor 8

  10. Virtual Machine 
 The New Building Block Virtual Machines (VMs) • System virtualization : One to multiple OSes on a physical machine thanks to a hypervisor (an operating system of OSes) Hypervisor • VM Capabilities VM 1 VM 3 VM 2 Physical Machine 
 (PM) • Suspend/Resume • Live Migration Hypervisor 8

  11. Virtual Machine 
 The New Building Block Virtual Machines (VMs) • System virtualization : One to multiple OSes on a physical machine thanks to a hypervisor (an operating system of OSes) Hypervisor • VM Capabilities VM 1 VM 3 VM 2 Physical Machine 
 (PM) • Suspend/Resume • Live Migration Hypervisor 8

  12. Virtual Machine 
 The New Building Block Virtual Machines (VMs) • System virtualization : One to multiple OSes on a physical machine thanks to a hypervisor (an operating system of OSes) Hypervisor • VM Capabilities VM 1 VM 3 VM 2 Physical Machine 
 (PM) • Suspend/Resume • Live Migration Hypervisor 8

  13. Virtual Machine 
 The New Building Block Virtual Machines (VMs) • System virtualization : One to multiple OSes on a physical machine thanks to a hypervisor (an operating system of OSes) Hypervisor • VM Capabilities VM 1 VM 3 VM 2 Physical Machine 
 (PM) • Suspend/Resume • Live Migration Hypervisor 8

  14. Virtual Machine 
 The New Building Block Virtual Machines (VMs) • System virtualization : One to multiple OSes on a physical machine thanks to a hypervisor (an operating system of OSes) Hypervisor • VM Capabilities VM 1 VM 3 VM 2 Physical Machine 
 (PM) • Suspend/Resume • Live Migration Hypervisor Hypervisor 8

  15. Virtual Machine 
 The New Building Block Virtual Machines (VMs) • System virtualization : One to multiple OSes on a physical machine thanks to a hypervisor (an operating system of OSes) Hypervisor • VM Capabilities VM 1 VM 2 VM 3 Physical Machine 
 (PM) • Suspend/Resume • Live Migration Hypervisor Hypervisor 8

  16. Virtual Machine 
 The New Building Block Virtual Machines (VMs) • System virtualization : One to multiple OSes on a physical machine thanks to a hypervisor (an operating system of OSes) Hypervisor • VM Capabilities VM 1 VM 2 VM 3 Physical Machine 
 (PM) • Suspend/Resume • Live Migration Hypervisor Hypervisor 8

  17. Virtual Machine 
 The New Building Block Virtual Machines (VMs) • System virtualization : One to multiple OSes on a physical machine thanks to a hypervisor (an operating system of OSes) Hypervisor • VM Capabilities VM 1 VM 2 VM 3 Physical Machine 
 (PM) • Suspend/Resume • Live Migration Hypervisor 8

  18. From Jobs to Virtualised Jobs migrate stop run running A job is now encapsulated in one or • resume waiting terminated several VMs suspend ready sleeping 9

  19. From Jobs to Virtualised Jobs migrate stop run running A job is now encapsulated in one or • resume waiting terminated several VMs suspend ready sleeping 9

  20. From Jobs to Virtualised Jobs migrate stop run running A job is now encapsulated in one or • resume waiting terminated several VMs suspend ready sleeping 9

  21. From Jobs to Virtualised Jobs migrate stop run running A job is now encapsulated in one or • resume waiting terminated several VMs suspend ready sleeping Challenge : Maintain viable mappings between VMs and PMs • Each VM consumes CPU, RAM… • 9 credits: F. Hermenier, OSDI poster session 2008

  22. From Jobs to Virtualised Jobs • Maintain viable mappings between VMs and PMs. • MAPE Control loop 
 (leveraging the Entropy framework) • Make the reconfiguration phase automatic 
 Cluster-Wide context switch Hypervisor Hypervisor Hypervisor PM 1 PM 2 PM 3 [VTDC2010] Infrastructure Current Status Correct Status cost: 3 Non-viable manipulations 10 credits: F. Hermenier, OSDI poster session 2008 cost: 2

  23. Cluster-Wide Context Switch - Evaluations • Scheduling policy : A FIFO queue 
 (priority between jobs to prevent starvation) • Testbed (further details in the manuscript) 
 11 Working nodes (22 CPUs) 
 A queue of 8 vjobs (NASGrid benchmarks) 
 Hypervisor Hypervisor Hypervisor Each job uses 9 VMs (9CPUs) PM 1 PM 2 PM 3 Infrastructure Cumulated completion times have been reduced by 40% 11

  24. Cluster-Wide Context Switch - Evaluations • Scheduling policy : A FIFO queue 
 (priority between jobs to prevent starvation) 
 … ? y s t M i v • Testbed (further details in the manuscript) 
 V i t c 2 a 7 e / r s / e y 11 Working nodes (22 CPUs) 
 t d i l o i b n a 1 l a 1 c A queue of 8 vjobs (NASGrid benchmarks) 
 s t u o b Hypervisor Hypervisor Hypervisor a Each job uses 9 VMs (9CPUs) s PM 1 PM 2 PM 3 ’ t a h W Infrastructure Cumulated completion times have been reduced by 40% credits: A. Simonet, Introduction to Cloud Computing Lecture - Inside a Google DC 12

  25. Cluster-Wide Context Switch - Evaluations • Scheduling policy : A FIFO queue 
 (priority between jobs to prevent starvation) 
 … ? y s t M i v • Testbed (further details in the manuscript) 
 V i t c 2 a 7 e / r s / e y 11 Working nodes (22 CPUs) 
 t d i l o i b n a 1 l a 1 c A queue of 8 vjobs (NASGrid benchmarks) 
 s t u o b Hypervisor Hypervisor Hypervisor a Each job uses 9 VMs (9CPUs) s PM 1 PM 2 PM 3 ’ t a h W Infrastructure A Google Data Center… Cumulated completion times have been reduced by 40% credits: A. Simonet, Introduction to Cloud Computing Lecture - Inside a Google DC 12

  26. Scalability/Reactivity challenge • Computing Phase: a NP hard problem in most cases • Most works have been focusing on proposing heuristics to reduce the computing phase but… reconfiguring the infrastructure is time consuming too ! Timer 1. Monitoring 2. Computing 3. Reconfiguring 1 2 3 Time credits: F. Quesnel, PhD defense 2013 13

  27. Scalability/Reactivity challenge • Computing Phase: a NP hard problem in most cases • Most works have been focusing on proposing heuristics to reduce the computing phase but… reconfiguring the infrastructure is time consuming too ! e r u e r o t g u t fi CPU i p n n m o o M c o e Load C R VM1 Is the configuration 
 VM 2 viable? Time 1 2 3 credits: F. Quesnel, PhD defense 2013 14

  28. Scalability/Reactivity challenge • Computing Phase: a NP hard problem in most cases • Most works have been focusing on proposing heuristics to reduce the computing phase but… reconfiguring the infrastructure is time consuming too ! e r u e r o t g u t fi CPU i p n n m o o M c o e Load C R VM1 Is the configuration 
 VM 2 viable? Time 1 2 3 credits: F. Quesnel, PhD defense 2013 14

  29. Scalability/Reactivity challenge • Computing Phase: a NP hard problem in most cases • Most works have been focusing on proposing heuristics to reduce the computing phase but… reconfiguring the infrastructure is time consuming too ! e r u e r o t g u t fi CPU i p n n m o o M c o e Load C R VM1 Is the configuration 
 VM 2 viable? Time 1 2 3 Can we reduce all phases? credits: F. Quesnel, PhD defense 2013 14

  30. Leverage P2P Algorithms • Make dynamic partitioning of the system according to the effective usage of resources • Make direct cooperations between An Event occurs on a node hypervisors (no service node) Can current node scheduler • Distributed Virtual Machine Scheduler calculate valid schedule? no yes • Event driven / P2P Like system Contact neighbor Apply • Local interactions between nodes 
 and ask it to solve the schedule the problem • Scheduling performed on partitions of the system, created dynamically (nodes are reserved for an exclusive [CCPE2012] use by a scheduler, to prevent several schedulers from migrating the same VMs) 15

  31. Understanding DVMS Partition 3 2 1 4 5 6 9 8 7 16 credits: J. Pastor, PhD Defense 2016

  32. Understanding DVMS Partition 3 2 1 4 5 6 9 8 7 16 credits: J. Pastor, PhD Defense 2016

  33. Understanding DVMS Partition 3 2 1 4 5 6 9 8 7 16 credits: J. Pastor, PhD Defense 2016

  34. Understanding DVMS Partition 3 2 1 4 5 6 9 8 7 16 credits: J. Pastor, PhD Defense 2016

  35. Understanding DVMS Partition 3 2 1 4 5 6 9 8 7 16 credits: J. Pastor, PhD Defense 2016

  36. Understanding DVMS Partition 3 2 1 4 5 6 9 8 7 16 credits: J. Pastor, PhD Defense 2016

  37. Understanding DVMS Partition 3 2 1 4 5 6 9 8 7 16 credits: J. Pastor, PhD Defense 2016

  38. Understanding DVMS Partition 3 2 1 4 5 6 9 8 7 16 credits: J. Pastor, PhD Defense 2016

  39. DVMS Evaluations • Development of a PoC 2000 VMs ! 3325 VMs ! 4754 VMs ! 251 PMs ! 309 PMs ! 467 PMs ! 160 ! • Evaluations (in-vivo) up to 5KVMs 140 ! Number of PMs ! 120 ! 100 ! 80 ! 60 ! 40 ! • IEEE Scale challenge 2013 20 ! 0 ! Griffon ! Graphene ! Paradent ! Parapide ! Parapluie ! Sol ! Suno ! Pastel ! Cluster ! 100 Time to apply a reconfiguration (s) 20 50 DVMS Entropy Entropy Mean and standard deviation Mean and standard deviation Mean and standard deviation Duration of an iteration (s) Time to solve an event (s) DVMS DVMS 80 40 15 60 30 10 40 20 20 5 10 0 0 0 251 PMs 309 PMs 467 PMs 251 PMs 309 PMs 467 PMs 251 PMs 309 PMs 467 PMs 2000 VMs 3325 VMs 4754 VMs 2000 VMs 3325 VMs 4754 VMs 2000 VMs 3325 VMs 4754 VMs 17

  40. DVMS Evaluations • Development of a PoC 2000 VMs ! 3325 VMs ! 4754 VMs ! 251 PMs ! 309 PMs ! 467 PMs ! 160 ! • Evaluations (in-vivo) up to 5KVMs 140 ! Number of PMs ! 120 ! 100 ! 80 ! ? e 60 ! l a c 40 ! • IEEE Scale challenge 2013 s t a 20 ! ? t s i 0 ! e t s h e Griffon ! Graphene ! Paradent ! Parapide ! Parapluie ! Sol ! Suno ! Pastel ! c t a e o Cluster ! w r p n p a a c s w r e o h H t o … h 100 g Time to apply a reconfiguration (s) 20 50 t i n w DVMS Entropy Entropy Mean and standard deviation i Mean and standard deviation Mean and standard deviation s e Duration of an iteration (s) i Time to solve an event (s) DVMS DVMS m r a 80 o 40 p r 15 m p o s c k 60 o 30 e o w l 10 t n I a 40 c 20 w o H 20 5 10 0 0 0 251 PMs 309 PMs 467 PMs 251 PMs 309 PMs 467 PMs 251 PMs 309 PMs 467 PMs 2000 VMs 3325 VMs 4754 VMs 2000 VMs 3325 VMs 4754 VMs 2000 VMs 3325 VMs 4754 VMs 17

  41. VM Placement 
 (Hot Topic) Problem

  42. VM Placement 
 (Hot Topic) Problem

  43. VM Placement 
 (Hot Topic) Problem

  44. VM Placement 
 (Hot Topic) Problem Lots of articles (too many ?)

  45. VM Placement 
 (Hot Topic) Problem Lots of articles (too many ?) Evaluations are performed either at a low scale for in vivo experiments or with ad-hoc simulators. 
 How can we evaluate/compare them?

  46. VM Simulator Toolkits Research activities 
 mainly supported by IMT Atlantique Inria French ANR SONGS Project Hemera Inria Large Scale Initiative Discovery Inria Project Lab A. Simonet 
 EU BigStorage Project Postdoc 
 VMPlaceS Energy dimension T. L. Nguyen 
 T. Hirofuchi 
 PhD 
 Postdoc/Invited researcher 
 Boot time model SimGridVM: VM abstractions Locality Aware Placement Distributed VM Scheduler Cluster-Wide context switch 10/2008 10/2009 10/2010 10/2011 10/2012 10/2013 10/2014 10/2015 10/2016 10/2017

  47. VM Simulator Toolkits Research activities 
 mainly supported by IMT Atlantique Inria French ANR SONGS Project Hemera Inria Large Scale Initiative Discovery Inria Project Lab A. Simonet 
 EU BigStorage Project Postdoc 
 VMPlaceS Energy dimension T. L. Nguyen 
 T. Hirofuchi 
 PhD 
 Postdoc/Invited researcher 
 Boot time model SimGridVM: VM abstractions Locality Aware Placement Distributed VM Scheduler Cluster-Wide context switch 10/2008 10/2009 10/2010 10/2011 10/2012 10/2013 10/2014 10/2015 10/2016 10/2017

  48. Toward a VM PLACEment Simulator • A dedicated simulator to Evaluate/compare VM placement policies at large-scale (and in reproducible • manner) Relieve researchers of the burden of dealing with VM creations and workloads • generation/injection • SimGrid as a base • A scientific instrument to study the behaviour of large-scale distributed systems • Design abstractions and models to enable researchers to control VMs in the same manner as in the real world (e.g., create/destroy, start/shutdown, suspend/resume and migrate) Focus on the migration model 20

  49. 
 
 
 Accurate Live Migration Model 220 200 Naive approximation Migration time 180 Migration time is not a linear function observed • 160 (in sec) 140 according to the size of the VM 120 100 80 60 The more your VM is memory intensive, • 40 20 the longer the migration will be 0 0 20 40 60 80 100 120 Memory Update Speed (MB/s) • Transfer VM’s states to destination without perfectible shutdown (pre-copy algorithm) 1. Transfer all memory pages of the VM 
 (but, keep in mind the VM is still running at source) 
 Memory Pages � 2. Transfer updated memory pages during the previous step 
 VM (Running) � 3. Iterate this step until the rest of memory pages becomes sufficiently small to meet an acceptable downtime 
 Source PM � Destination PM � (30ms in KVM). 
 4. Stop the VM. Transfer the rest of of memory pages and states 21

  50. 
 
 
 Accurate Live Migration Model 220 200 Naive approximation Migration time 180 Migration time is not a linear function observed • 160 (in sec) 140 according to the size of the VM 120 100 80 60 The more your VM is memory intensive, • 40 20 the longer the migration will be 0 0 20 40 60 80 100 120 Memory Update Speed (MB/s) • Transfer VM’s states to destination without perfectible shutdown (pre-copy algorithm) 1. Transfer all memory pages of the VM 
 (but, keep in mind the VM is still running at source) 
 Memory Pages � 2. Transfer updated memory pages during the previous step 
 VM (Running) � 3. Iterate this step until the rest of memory pages becomes sufficiently small to meet an acceptable downtime 
 Source PM � Destination PM � (30ms in KVM). 
 4. Stop the VM. Transfer the rest of of memory pages and states 21

  51. 
 
 
 Accurate Live Migration Model 220 200 Naive approximation Migration time 180 Migration time is not a linear function observed • 160 (in sec) 140 according to the size of the VM 120 100 80 60 The more your VM is memory intensive, • 40 20 the longer the migration will be 0 0 20 40 60 80 100 120 Memory Update Speed (MB/s) • Transfer VM’s states to destination without perfectible shutdown (pre-copy algorithm) 1. Transfer all memory pages of the VM 
 (but, keep in mind the VM is still running at source) 
 2. Transfer updated memory pages during the previous step 
 VM (Running) � 3. Iterate this step until the rest of memory pages becomes sufficiently small to meet an acceptable downtime 
 Source PM � Destination PM � (30ms in KVM). 
 4. Stop the VM. Transfer the rest of of memory pages and states 21

  52. 
 
 
 Accurate Live Migration Model 220 200 Naive approximation Migration time 180 Migration time is not a linear function observed • 160 (in sec) 140 according to the size of the VM 120 100 80 60 The more your VM is memory intensive, • 40 20 the longer the migration will be 0 0 20 40 60 80 100 120 Memory Update Speed (MB/s) • Transfer VM’s states to destination without perfectible shutdown (pre-copy algorithm) 1. Transfer all memory pages of the VM 
 (but, keep in mind the VM is still running at source) 
 2. Transfer updated memory pages during the previous step 
 VM (Running) � 3. Iterate this step until the rest of memory pages becomes sufficiently small to meet an acceptable downtime 
 Source PM � Destination PM � (30ms in KVM). 
 4. Stop the VM. Transfer the rest of of memory pages and states 21

  53. 
 
 
 Accurate Live Migration Model 220 200 Naive approximation Migration time 180 Migration time is not a linear function observed • 160 (in sec) 140 according to the size of the VM 120 100 80 60 The more your VM is memory intensive, • 40 20 the longer the migration will be 0 0 20 40 60 80 100 120 Memory Update Speed (MB/s) • Transfer VM’s states to destination without perfectible shutdown (pre-copy algorithm) 1. Transfer all memory pages of the VM 
 (but, keep in mind the VM is still running at source) 
 2. Transfer updated memory pages during the previous step 
 VM (Running) � 3. Iterate this step until the rest of memory pages becomes sufficiently small to meet an acceptable downtime 
 Source PM � Destination PM � (30ms in KVM). 
 4. Stop the VM. Transfer the rest of of memory pages and states 21

  54. 
 
 
 Accurate Live Migration Model 220 200 Naive approximation Migration time 180 Migration time is not a linear function observed • 160 (in sec) 140 according to the size of the VM 120 100 80 60 The more your VM is memory intensive, • 40 20 the longer the migration will be 0 0 20 40 60 80 100 120 Memory Update Speed (MB/s) • Transfer VM’s states to destination without perfectible shutdown (pre-copy algorithm) 1. Transfer all memory pages of the VM 
 (but, keep in mind the VM is still running at source) 
 2. Transfer updated memory pages during the previous step 
 VM (Running) � 3. Iterate this step until the rest of memory pages becomes sufficiently small to meet an acceptable downtime 
 Source PM � Destination PM � (30ms in KVM). 
 4. Stop the VM. Transfer the rest of of memory pages and states 21

  55. 
 
 
 Accurate Live Migration Model 220 200 Naive approximation Migration time 180 Migration time is not a linear function observed • 160 (in sec) 140 according to the size of the VM 120 100 80 60 The more your VM is memory intensive, • 40 20 the longer the migration will be 0 0 20 40 60 80 100 120 Memory Update Speed (MB/s) • Transfer VM’s states to destination without perfectible shutdown (pre-copy algorithm) 1. Transfer all memory pages of the VM 
 (but, keep in mind the VM is still running at source) 
 2. Transfer updated memory pages during the previous step 
 VM (Restart) � 3. Iterate this step until the rest of memory pages becomes sufficiently small to meet an acceptable downtime 
 Source PM � Destination PM � (30ms in KVM). 
 4. Stop the VM. Transfer the rest of of memory pages and states 21

  56. 
 Accurate Live Migration Model • Application memory footprints can be considered as linear functions Apache Postgres SQL [CloudCom 2013] 45 First accurate live migration model Grid5000 • Simulation (Precopy) 40 Simulation (Naive) (implementing the pre-copy strategy) 
 35 Migration Time (s) 30 25 The time and the resulting traffic of a 20 15 migration should be computed by taking 10 5 into account competition arising in the 0 0 20 40 60 80 100 presence of resource sharing and the CPU Utilization (%) memory refresh rate. 22

  57. SimGrid VM [TCC 2015] SimGrid VM allows users to launch hundreds of thousands of VMs on their • simulation programs and control VMs in the same manner as in the real world Users can execute computation and communication tasks on physical • machines (PMs) and VMs through the same SimGrid API, which will provide a seamless migration path to IaaS simulations for hundreds of SimGrid users SimGrid with Virtual Machine Support SimGrid without Virtual Machine Support Task11 Task12 Task21 Virtual Machine (X1,1) (X1,2) (X2,1) Extend VM1 VM2 Task3 Task1 Task2 (X1) 
 (X2) 
 (X3) (X1) (X2) Physical Machine Physical Machine Physical (Capacity C) 
 (Capacity C) 
 1. Solve all the constraint 1. Solve the constraint problems at the physical machine layer. problems at once. Eq1: X 1 + X 2 + X 3 < C Eq1: X 1 + X 2 < C 2. Solve the constraint problems at the virtual machine layer. Eq2: X 1,1 + X 1,2 < X 1 Eq3: X 2,1 < X 2 All extensions have been integrated into SimGrid 23

  58. VMPlaceS • A three steps engine to evaluate VM Placement Strategies Input: infrastructure topology, VM Nb, Workloads Initialization Phase Injector/Scheduling Phase Injector Scheduler Injects events 
 Researchers should (only) develop Analysis Phase (CPU variations, node crashes…) their scheduling algorithm in JAVA (or SCALA) using the SimGrid MSG API and a more abstract interface provided by VMPlaceS Output: a JSON trace file which is then consumed by the statistics R system to deliver tables/graphs (VMPlaces records several metrics during the simulation execution) 24

  59. VMPlaceS: A First Use-Case [EuroPar2015] • To illustrate how different strategies can be evaluated/compared Period GL Période 1. Collecte des 1. Resource Monitoring GMs informations LCs 3. Application de 2. Prise de 2. Computing a viable scheduling 3. Applying reconfiguration actions la décision décision Centralized 
 Hierarchical 
 Distributed Ti 1 2 3 durée Entropy [VEE’09] Snooze [CCGRID’12] DVMS [CCPE’12] Simulation Input parameters • • PMs: 8 cores, 32GB, 1Gbps, 7 cores are considered. 
 VMs: 1 core, 1GB, 1Gbps, memory footprint varies between 0 and 80% 
 VM CPU load ( μ =60, σ =20) 
 10 VMs per PM, Cluster infrastructure composed of 128/256/512/1024 PMs 
 Duration: 1800 seconds 
 Period of scheduling invocations: 30 seconds. 25

  60. Entropy/Snooze/DVMS Analysis ● Centralized ● 40000 Distributed ● Hierarchical Without scheduling 30000 Time (s) The centralized strategy looks useless? 20000 ● 10000 ● ● ● ● ● ● 0 128 nodes 256 nodes 512 nodes 1024 nodes 1280 vms 2560 vms 5120 vms 10240 vms Infrastructure sizes Cumulated violation time 26

  61. Entropy/Snooze/DVMS Analysis Entropy first DVMS Entropy first false positive DVMS false positive 25 20 Duration of the violation (s) 15 10 5 0 500 1000 1500 2000 2500 3000 3500 Time (s) Another view focusing on Entropy and DVMS 27

  62. Entropy/Snooze/DVMS Analysis (AVG | STD) (AVG | STD) (AVG | STD) (AVG | STD) 28

  63. Entropy/Snooze/DVMS Analysis (AVG | STD) DVMS outperforms the others !? (AVG | STD) (AVG | STD) (AVG | STD) 28

  64. Entropy/Snooze/DVMS Analysis (AVG | STD) DVMS outperforms the others !? (AVG | STD) (AVG | STD) (AVG | STD) While the centralized approach does not scale, both phases are constant from the time viewpoint for the two other approaches 28

  65. Entropy/Snooze/DVMS Analysis (AVG | STD) 1./ Can we find a good partitioning size for Snooze ? DVMS outperforms the others !? 2./ What would be the benefit for Snooze of a reactive approach? (AVG | STD) (AVG | STD) (AVG | STD) While the centralized approach does not scale, both phases are constant from the time viewpoint for the two other approaches 28

  66. Investigate Variants • Evaluate the impact of having smaller partitions in Snooze Same numbers of PMs but partitions grow from 2 LCs to 32 LCs per GM • 25000 ● Hierarchical2LCs ● Hierarchical4LCs ● Hierarchical8LCs 20000 ● Hierarchical32LCs 15000 Time (s) 10000 ● (AVG | STD) ● 5000 ● ● ● ● 0 128 nodes 256 nodes 512 nodes 1024 nodes 1280 vms 2560 vms 5120 vms 10240 vms Cumulated violation time Infrastructure sizes 29

  67. Investigate Variants • Evaluate the impact of having smaller partitions in Snooze Same numbers of PMs but partitions grow from 2 LCs to 32 LCs per GM • 25000 ● Hierarchical2LCs ● Hierarchical4LCs ● Hierarchical8LCs 20000 ● Hierarchical32LCs 15000 Time (s) 10000 ● (AVG | STD) ● 5000 ● ● ● ● 0 128 nodes 256 nodes 512 nodes 1024 nodes 1280 vms 2560 vms 5120 vms 10240 vms Cumulated violation time Infrastructure sizes 29

  68. Investigate Variants • Evaluate the impact of having smaller partitions in Snooze Same numbers of PMs but partitions grow from 2 LCs to 32 LCs per GM • 25000 ● Hierarchical2LCs ● Hierarchical4LCs ● Hierarchical8LCs 20000 ● Hierarchical32LCs 15000 Time (s) 10000 ● (AVG | STD) ● 5000 ● ● ● ● 0 128 nodes 256 nodes 512 nodes 1024 nodes 1280 vms 2560 vms 5120 vms 10240 vms Cumulated violation time Infrastructure sizes 29

  69. Investigate Variants • Evaluate the impact of having smaller partitions in Snooze The smaller is the size of the partition, 
 the bigger the probability to do not find a viable solution Same numbers of PMs but partitions grow from 2 LCs to 32 LCs per GM • 25000 ● Hierarchical2LCs ● Hierarchical4LCs ● Hierarchical8LCs 20000 ● Hierarchical32LCs 15000 Time (s) 10000 ● (AVG | STD) ● 5000 ● ● ● ● 0 128 nodes 256 nodes 512 nodes 1024 nodes 1280 vms 2560 vms 5120 vms 10240 vms Cumulated violation time Infrastructure sizes 29

  70. Investigate Variants • Evaluate the impact of having smaller partitions in Snooze The smaller is the size of the partition, 
 the bigger the probability to do not find a viable solution Same numbers of PMs but partitions grow from 2 LCs to 32 LCs per GM • 25000 ● Hierarchical2LCs ● Hierarchical4LCs ● Other variants and possible improvements Hierarchical8LCs 20000 ● (for instance contact neighbours two by two in DVMS) Hierarchical32LCs 15000 Time (s) 10000 ● (AVG | STD) ● 5000 ● ● ● ● 0 128 nodes 256 nodes 512 nodes 1024 nodes 1280 vms 2560 vms 5120 vms 10240 vms Cumulated violation time Infrastructure sizes 29

  71. VMPlaceS / VM Simulator toolkits • Difficulties to conduct relevant evaluation of VM placement strategies (in vivo conditions, lot of metrics to monitor, scalability/reactivity, …) • VMPlaceS, a framework providing Programming support for the definition of new VM placement strategies 
 • Execution support for their accurate simulation at large scale 
 Means to analyze the collected traces Validated up to 10K PMs/100K VMs • Available online : http://beyondtheclouds.github.io/VMPlaceS/ • • On-going and future work [TPDS submission under review] Collect energy metrics 
 • VM boot time 
 [PDP2017] VM image migrations (storage challenge) 
 Workloads reproducing real traces (complex to get real traces) 
 Provide similar abstractions for container technologies (must have) 30

  72. Beyond the Clouds Research activities 
 mainly supported by Research Eng. IMT Atlantique R-A Cherrueau 
 Inria M. Simonin French ANR SONGS Project EnOS Hemera Inria Large Scale Initiative OpenStack: From SQL to noSQL backends Discovery Inria Project Lab EU BigStorage Project Discovery vision Discovery Inria Project Lab VMPlaceS SimGridVM Locality Aware Placement Distributed VM Scheduler Cluster-Wide context switch 10/2008 10/2009 10/2010 10/2011 10/2012 10/2013 10/2014 10/2015 10/2016 10/2017

  73. UTILITY COMPUTING From mainframes to …

  74. UTILITY COMPUTING From mainframes to … …larger “mainframes” Microsoft DC, Quincy, WA state

  75. Jurisdiction concerns Reliability CC distance (network overheads) l e d o 
 m 3 C 1 C 0 2 e - h t 2 f 1 o 0 n 2 o i t p o d a e h t r o f s e k a r b r o j a M 33

  76. Discovery Vision • Bring Clouds back to the cloud [VHPC2011] [Discovery2013] Leverage the concept of µDC/nDC to extend any point of presence of • network backbones (aka PoP) with servers From network hubs up to major DSLAMs that are operated by • telecom companies, network institutions… Geant RENATER Internet 2 34

  77. Discovery Vision • Bring Clouds back to the cloud [VHPC2011] [Discovery2013] Leverage the concept of µDC/nDC to extend any point of presence of • network backbones (aka PoP) with servers From network hubs up to major DSLAMs that are operated by • telecom companies, network institutions… How operating/using such a massively distributed infrastructure from the software viewpoint? Geant RENATER Internet 2 34

  78. What’s about Brokering Approaches? Sporadic (hybrid computing/cloud bursting) almost ready for production • Brokers are rather limited to simple usages and not advanced administration • operations Charles Alice Bob 35

  79. What’s about Brokering Approaches? Sporadic (hybrid computing/cloud bursting) almost ready for production • Brokers are rather limited to simple usages and not advanced administration • operations Charles Alice Bob 35

  80. What’s about Brokering Approaches? Sporadic (hybrid computing/cloud bursting) almost ready for production • Brokers are rather limited to simple usages and not advanced administration • operations Charles Alice Bob Advanced brokers must reimplement standard IaaS mechanisms while facing the API limitation 35

  81. Would OpenStack be the solution? Do not reinvent the wheel… it’s too late 
 • OpenStack (20Millions of LOC, 3M just for the core-services) Discovery objectives (overview) • Study to what extent the current OpenStack mechanisms can handle such massively • distributed infrastructures Propose revisions/extensions of internal mechanisms when appropriate • Charle Alice Bo From SQL to NoSQL backend in OpenStack 
 [IC2E 2017] • (a research PoC, just the top of the iceberg, numerous challenges) [CCGRID 2017] Toward a Holistic Framework for Conducting Scientific Evaluations of OpenStack 
 • EnOS, A tool for diving into OpenStack and performing scientific investigations 36

  82. Conclusion / Future Work Research activities 
 STACK 
 mainly supported by Proposal IMT Atlantique Inria EnOS French ANR SONGS Project Hemera Inria Large Scale Initiative OpenStack: From SQL to noSQL backends Discovery Inria Project Lab EU BigStorage Project Discovery vision Discovery Inria Project Lab VMPlaceS SimGridVM Locality Aware Placement Distributed VM Scheduler Cluster-Wide context switch 10/2008 10/2009 10/2010 10/2011 10/2012 10/2013 10/2014 10/2015 10/2016 10/2017

  83. Conclusion / Future Work • Virtualization technologies: a key role in the Cloud Computing adoption (flexibility, portability) but with a cost… • Complexity of the software stack • Difficulty to guarantee performance • Placement challenges Alice • How to express placement constraints? [plasma2013] a good starting point. Alice Alice • Can we consider network and storage dimensions? Alice Frontend • People expect containers technologies will help but.. What you may expect ! What you may have! What you expect ! • Similar consolidation issues • Naive use (containers on top of VM on top of PM) Alice Alice Alice Map/Reduce framework 
 • Current trend: server densification: more cores per PMs, more RAMs…. (leverage attached storage facilities) 38

  84. Conclusion / Future Work • Virtualization technologies: a key role in the Cloud Computing adoption (flexibility, portability) but with a cost… • Complexity of the software stack • Difficulty to guarantee performance • Placement challenges Alice Alice Alice • How to express placement constraints? [plasma2013] a good starting point. • Can we consider network and storage dimensions? Alice Frontend • People expect containers technologies will help but.. What you may expect ! What you may have! • Similar consolidation issues • Naive use (containers on top of VM on top of PM) Alice Alice Alice • Current trend: server densification: more cores per PMs, more RAMs…. 38

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend