Contributions to Large Scale Distributed Systems The - - PowerPoint PPT Presentation

contributions to large scale distributed systems the
SMART_READER_LITE
LIVE PREVIEW

Contributions to Large Scale Distributed Systems The - - PowerPoint PPT Presentation

Contributions to Large Scale Distributed Systems The infrastructure view point Adrien Lebre September 1, 2017 President and Examiner: Reviewers: Claude Jard, Nantes Univ. Erik Elmroth, Ume Univ. Frdric Desprez, Inria Manish


slide-1
SLIDE 1

Contributions to 
 Large Scale Distributed Systems
 The infrastructure view point

Adrien Lebre September 1, 2017 President and Examiner: Claude Jard, Nantes Univ.
 Frédéric Desprez, Inria Reviewers: Erik Elmroth, Umeå Univ. Manish Parashar, Rutgers Univ Pierre Sens, UPMC

slide-2
SLIDE 2

MY RESEARCH TOPIC

John Mc Carthy, 
 Speaking at the MIT centennial in 1961 If computers of the kind I have advocated become the computers of the future, then computing may someday be organized as a public utility just as the telephone system is a public utility...

slide-3
SLIDE 3

ENIAC
 1946 Transistor
 1947

Computation Communication

1999 - Salesforces 
 SaaS Concept 1838 - Telegraph 1876 - Telephone 1896 - Radio 1957 - satellite 1969 - ARPANET 1973 - Ethernet 1985 - TCP/IP Adoption 1975 -Personal Computers
 SmartPhones
 2007 2002- Amazon Initial
 Compute/Storage services
 2006 - Amazon EC2 (IaaS) 2010 - Cloud democratisation 
 2015
 Network/Computers 
 Convergence 
 Software Defined XXX 1999 - The Grid 1995 - Commodity clusters 2002 - Virtualised Infrastructure 1950/1990 - Mainframes 1950 - Batchmode 1960 - Interactive 1970 - Terminals (clients/server concepts)
 1967 - First virtualisation attempt micro processor
 1971

slide-4
SLIDE 4

micro processor
 1971 ENIAC
 1946 Transistor
 1947

Computation Communication

1999 - Salesforces 
 SaaS Concept 1838 - Telegraph 1876 - Telephone 1896 - Radio 1957 - satellite 1969 - ARPANET 1973 - Ethernet 1985 - TCP/IP Adoption 1975 -Personal Computers
 SmartPhones
 2007 1950/1990 - Mainframes 1950 - Batchmode 1960 - Interactive 1970 - Terminals (clients/server concepts)
 1967 - First virtualisation attempt 2002- Amazon Initial
 Compute/Storage services
 2006 - Amazon EC2 (IaaS) 2010 - Cloud democratisation 
 2015
 Network/Computers 
 Convergence 
 Software Defined XXX 1999 - The Grid 1995 - Commodity clusters 2002 - Virtualised Infrastructure

slide-5
SLIDE 5

micro processor
 1971 ENIAC
 1946 Transistor
 1947

Computation Communication

1999 - Salesforces 
 SaaS Concept 1838 - Telegraph 1876 - Telephone 1896 - Radio 1957 - satellite 1969 - ARPANET 1973 - Ethernet 1985 - TCP/IP Adoption 1975 -Personal Computers
 SmartPhones
 2007 1950/1990 - Mainframes 1950 - Batchmode 1960 - Interactive 1970 - Terminals (clients/server concepts)
 1967 - First virtualisation attempt 2002- Amazon Initial
 Compute/Storage services
 2006 - Amazon EC2 (IaaS) 2010 - Cloud democratisation 
 2015
 Network/Computers 
 Convergence 
 Software Defined XXX 1999 - The Grid 1995 - Commodity clusters 2002 - Virtualised Infrastructure

MSc/PhD - 2001/2006 Cluster (Storage/HPC)

slide-6
SLIDE 6

micro processor
 1971 ENIAC
 1946 Transistor
 1947

Computation Communication

1999 - Salesforces 
 SaaS Concept 1838 - Telegraph 1876 - Telephone 1896 - Radio 1957 - satellite 1969 - ARPANET 1973 - Ethernet 1985 - TCP/IP Adoption 1975 -Personal Computers
 SmartPhones
 2007 1950/1990 - Mainframes 1950 - Batchmode 1960 - Interactive 1970 - Terminals (clients/server concepts)
 1967 - First virtualisation attempt 2002- Amazon Initial
 Compute/Storage services
 2006 - Amazon EC2 (IaaS) 2010 - Cloud democratisation 
 2015
 Network/Computers 
 Convergence 
 Software Defined XXX 1999 - The Grid 1995 - Commodity clusters 2002 - Virtualised Infrastructure

MSc/PhD - 2001/2006 Cluster (Storage/HPC) PostDoctoral fellow - 2006/2008
 (Storage/Grid)

slide-7
SLIDE 7

micro processor
 1971 ENIAC
 1946 Transistor
 1947

Computation Communication

1999 - Salesforces 
 SaaS Concept 1838 - Telegraph 1876 - Telephone 1896 - Radio 1957 - satellite 1969 - ARPANET 1973 - Ethernet 1985 - TCP/IP Adoption 1975 -Personal Computers
 SmartPhones
 2007 1950/1990 - Mainframes 1950 - Batchmode 1960 - Interactive 1970 - Terminals (clients/server concepts)
 1967 - First virtualisation attempt 2002- Amazon Initial
 Compute/Storage services
 2006 - Amazon EC2 (IaaS) 2010 - Cloud democratisation 
 2015
 Network/Computers 
 Convergence 
 Software Defined XXX 1999 - The Grid 1995 - Commodity clusters 2002 - Virtualised Infrastructure

MSc/PhD - 2001/2006 Cluster (Storage/HPC) PostDoctoral fellow - 2006/2008
 (Storage/Grid)

  • Ass. Prof. IMT (2008-20XX)

Inria Researcher (2013-2017) Clouds and Beyond

slide-8
SLIDE 8

micro processor
 1971 ENIAC
 1946 Transistor
 1947

Computation Communication

1999 - Salesforces 
 SaaS Concept 1838 - Telegraph 1876 - Telephone 1896 - Radio 1957 - satellite 1969 - ARPANET 1973 - Ethernet 1985 - TCP/IP Adoption 1975 -Personal Computers
 SmartPhones
 2007 1950/1990 - Mainframes 1950 - Batchmode 1960 - Interactive 1970 - Terminals (clients/server concepts)
 1967 - First virtualisation attempt 2002- Amazon Initial
 Compute/Storage services
 2006 - Amazon EC2 (IaaS) 2010 - Cloud democratisation 
 2015
 Network/Computers 
 Convergence 
 Software Defined XXX 1999 - The Grid 1995 - Commodity clusters 2002 - Virtualised Infrastructure

MSc/PhD - 2001/2006 Cluster (Storage/HPC) PostDoctoral fellow - 2006/2008
 (Storage/Grid)

  • Ass. Prof. IMT (2008-20XX)

Inria Researcher (2013-2017) Clouds and Beyond

INFRASTRUCTURE

STORAGE

VM PLACEMENT RESOURCE MGMT SYSTEMS

slide-9
SLIDE 9

micro processor
 1971 ENIAC
 1946 Transistor
 1947

Computation Communication

1999 - Salesforces 
 SaaS Concept 1838 - Telegraph 1876 - Telephone 1896 - Radio 1957 - satellite 1969 - ARPANET 1973 - Ethernet 1985 - TCP/IP Adoption 1975 -Personal Computers
 SmartPhones
 2007 1950/1990 - Mainframes 1950 - Batchmode 1960 - Interactive 1970 - Terminals (clients/server concepts)
 1967 - First virtualisation attempt 2002- Amazon Initial
 Compute/Storage services
 2006 - Amazon EC2 (IaaS) 2010 - Cloud democratisation 
 2015
 Network/Computers 
 Convergence 
 Software Defined XXX 1999 - The Grid 1995 - Commodity clusters 2002 - Virtualised Infrastructure

MSc/PhD - 2001/2006 Cluster (Storage/HPC) PostDoctoral fellow - 2006/2008
 (Storage/Grid)

  • Ass. Prof. IMT (2008-20XX)

Inria Researcher (2013-2017) Clouds and Beyond

INFRASTRUCTURE

STORAGE

VM PLACEMENT RESOURCE MGMT SYSTEMS

PROOF OF CONCEPT
 OPEN-SOURCE

IN VIVO EXPERIMENTS

slide-10
SLIDE 10

Utility Computing Infrastructures

  • A common objective: provide computing resources 


(both hardware and software) in a flexible, transparent, efficient, secured and reliable way

  • Distributed Infrastructures (since the 1990’s)
  • A lot of challenges (data-sharing, software/hardware heterogeneity, workload

placements, isolation between applications, performance…)

5

slide-11
SLIDE 11

Utility Computing Infrastructures

  • A common objective: provide computing resources 


(both hardware and software) in a flexible, transparent, efficient, secured and reliable way

  • Distributed Infrastructures (since the 1990’s)
  • A lot of challenges (data-sharing, software/hardware heterogeneity, workload

placements, isolation between applications, performance…)

5

Compute nodes

slide-12
SLIDE 12

Utility Computing Infrastructures

  • A common objective: provide computing resources 


(both hardware and software) in a flexible, transparent, efficient, secured and reliable way

  • Distributed Infrastructures (since the 1990’s)
  • A lot of challenges (data-sharing, software/hardware heterogeneity, workload

placements, isolation between applications, performance…)

5

Compute nodes

Storage nodes

(Distributed File System)

slide-13
SLIDE 13

Utility Computing Infrastructures

  • A common objective: provide computing resources 


(both hardware and software) in a flexible, transparent, efficient, secured and reliable way

  • Distributed Infrastructures (since the 1990’s)
  • A lot of challenges (data-sharing, software/hardware heterogeneity, workload

placements, isolation between applications, performance…)

5

Frontend

(Resource Management System)

Compute nodes

Storage nodes

(Distributed File System)

slide-14
SLIDE 14

Utility Computing Infrastructures

  • A common objective: provide computing resources 


(both hardware and software) in a flexible, transparent, efficient, secured and reliable way

  • Distributed Infrastructures (since the 1990’s)
  • A lot of challenges (data-sharing, software/hardware heterogeneity, workload

placements, isolation between applications, performance…)

5

Frontend

(Resource Management System)

Alice

Compute nodes

Storage nodes

(Distributed File System)

slide-15
SLIDE 15

Utility Computing Infrastructures

  • A common objective: provide computing resources 


(both hardware and software) in a flexible, transparent, efficient, secured and reliable way

  • Distributed Infrastructures (since the 1990’s)
  • A lot of challenges (data-sharing, software/hardware heterogeneity, workload

placements, isolation between applications, performance…)

5

Frontend

(Resource Management System)

Alice

Compute nodes

Storage nodes

(Distributed File System)

slide-16
SLIDE 16

Utility Computing Infrastructures

  • A common objective: provide computing resources 


(both hardware and software) in a flexible, transparent, efficient, secured and reliable way

  • Distributed Infrastructures (since the 1990’s)
  • A lot of challenges (data-sharing, software/hardware heterogeneity, workload

placements, isolation between applications, performance…)

5

Frontend

(Resource Management System)

Alice

Compute nodes

Storage nodes

(Distributed File System)

slide-17
SLIDE 17

Utility Computing Infrastructures

  • A common objective: provide computing resources 


(both hardware and software) in a flexible, transparent, efficient, secured and reliable way

  • Distributed Infrastructures (since the 1990’s)
  • A lot of challenges (data-sharing, software/hardware heterogeneity, workload

placements, isolation between applications, performance…)

5

Frontend

(Resource Management System)

Alice Alice Alice Alice Alice Alice

Alice

Compute nodes

Storage nodes

(Distributed File System)

slide-18
SLIDE 18

Utility Computing Infrastructures

  • A common objective: provide computing resources 


(both hardware and software) in a flexible, transparent, efficient, secured and reliable way

  • Distributed Infrastructures (since the 1990’s)
  • A lot of challenges (data-sharing, software/hardware heterogeneity, workload

placements, isolation between applications, performance…)

5

Frontend

(Resource Management System)

Alice Alice Alice Alice Alice Alice

Alice

Compute nodes

Storage nodes

(Distributed File System)

Bob

slide-19
SLIDE 19

Utility Computing Infrastructures

  • A common objective: provide computing resources 


(both hardware and software) in a flexible, transparent, efficient, secured and reliable way

  • Distributed Infrastructures (since the 1990’s)
  • A lot of challenges (data-sharing, software/hardware heterogeneity, workload

placements, isolation between applications, performance…)

5

Frontend

(Resource Management System)

Alice Alice Alice Alice Alice Alice

Alice

Compute nodes

Storage nodes

(Distributed File System)

Bob

Bob Bob Bob

? ? ?

slide-20
SLIDE 20

(Dynamic VM) Placement Contributions

10/2010 10/2008 10/2009 10/2013 10/2011 10/2012 10/2016 10/2014 10/2015

Cluster-Wide context switch J.M. Menaud

  • F. Hermenier

PhD - F. Quesnel
 Co supervised with M. Südholt Distributed VM Scheduler PhD - J. Pastor
 Co supervised with F. Desprez Locality Aware Placement

10/2017

IMT Atlantique Research activities 
 mainly supported by Inria

slide-21
SLIDE 21

(Dynamic VM) Placement Contributions

10/2010 10/2008 10/2009 10/2013 10/2011 10/2012 10/2016 10/2014 10/2015

Cluster-Wide context switch J.M. Menaud

  • F. Hermenier

PhD - F. Quesnel
 Co supervised with M. Südholt Distributed VM Scheduler PhD - J. Pastor
 Co supervised with F. Desprez Locality Aware Placement

10/2017

IMT Atlantique Research activities 
 mainly supported by Inria

slide-22
SLIDE 22

Placement Problem

  • Jobs 1, 2, 3, and 4 arrive in the queue

and have to be scheduled

  • FCFS + Easy back filling


Although Jobs 2 and 3 have been backed filled, some resources are unused (dark gray areas)

  • Easy back filling with preemption.

Job 4 can be started earlier without impacting Job 1’s performance.

7

the queue

Processors Time

job Running in the queue 2nd

1st job in the queue

3rd job in the queue 4th job in 2nd 3rd job in the queue

Processors Time

4th job in Running the queue 1st job in the queue job in the queue the queue Running 2nd job in the 1st queue job in the queue

Processors Time

3rd job in the queue 4th job in

Jobs cannot be easily preempted (OS’s internal states) Even with preemption, some resources are still not wasted

slide-23
SLIDE 23

Virtual Machine 
 The New Building Block

  • System virtualization: One to multiple OSes on a

physical machine thanks to a hypervisor (an operating system of OSes)

  • VM Capabilities
  • Suspend/Resume
  • Live Migration

8

Hypervisor Virtual Machines (VMs) Physical Machine 
 (PM)

slide-24
SLIDE 24

VM 3

Virtual Machine 
 The New Building Block

  • System virtualization: One to multiple OSes on a

physical machine thanks to a hypervisor (an operating system of OSes)

  • VM Capabilities
  • Suspend/Resume
  • Live Migration

8

Hypervisor

VM 1 VM 2

Hypervisor Virtual Machines (VMs) Physical Machine 
 (PM)

slide-25
SLIDE 25

VM 3

Virtual Machine 
 The New Building Block

  • System virtualization: One to multiple OSes on a

physical machine thanks to a hypervisor (an operating system of OSes)

  • VM Capabilities
  • Suspend/Resume
  • Live Migration

8

Hypervisor

VM 1 VM 2

Hypervisor Virtual Machines (VMs) Physical Machine 
 (PM)

slide-26
SLIDE 26

VM 3

Virtual Machine 
 The New Building Block

  • System virtualization: One to multiple OSes on a

physical machine thanks to a hypervisor (an operating system of OSes)

  • VM Capabilities
  • Suspend/Resume
  • Live Migration

8

Hypervisor

VM 1 VM 2

Hypervisor Virtual Machines (VMs) Physical Machine 
 (PM)

slide-27
SLIDE 27

VM 3

Virtual Machine 
 The New Building Block

  • System virtualization: One to multiple OSes on a

physical machine thanks to a hypervisor (an operating system of OSes)

  • VM Capabilities
  • Suspend/Resume
  • Live Migration

8

Hypervisor

VM 1 VM 2

Hypervisor Virtual Machines (VMs) Physical Machine 
 (PM)

slide-28
SLIDE 28

VM 3

Virtual Machine 
 The New Building Block

  • System virtualization: One to multiple OSes on a

physical machine thanks to a hypervisor (an operating system of OSes)

  • VM Capabilities
  • Suspend/Resume
  • Live Migration

8

Hypervisor

VM 1 VM 2

Hypervisor Virtual Machines (VMs) Physical Machine 
 (PM)

slide-29
SLIDE 29

VM 3

Virtual Machine 
 The New Building Block

  • System virtualization: One to multiple OSes on a

physical machine thanks to a hypervisor (an operating system of OSes)

  • VM Capabilities
  • Suspend/Resume
  • Live Migration

8

Hypervisor

VM 1 VM 2

Hypervisor Virtual Machines (VMs) Physical Machine 
 (PM)

slide-30
SLIDE 30

VM 3

Virtual Machine 
 The New Building Block

  • System virtualization: One to multiple OSes on a

physical machine thanks to a hypervisor (an operating system of OSes)

  • VM Capabilities
  • Suspend/Resume
  • Live Migration

8

Hypervisor Hypervisor

VM 1 VM 2

Hypervisor Virtual Machines (VMs) Physical Machine 
 (PM)

slide-31
SLIDE 31

Virtual Machine 
 The New Building Block

  • System virtualization: One to multiple OSes on a

physical machine thanks to a hypervisor (an operating system of OSes)

  • VM Capabilities
  • Suspend/Resume
  • Live Migration

8

Hypervisor Hypervisor

VM 1 VM 2 VM 3

Hypervisor Virtual Machines (VMs) Physical Machine 
 (PM)

slide-32
SLIDE 32

Virtual Machine 
 The New Building Block

  • System virtualization: One to multiple OSes on a

physical machine thanks to a hypervisor (an operating system of OSes)

  • VM Capabilities
  • Suspend/Resume
  • Live Migration

8

Hypervisor Hypervisor

VM 1 VM 2 VM 3

Hypervisor Virtual Machines (VMs) Physical Machine 
 (PM)

slide-33
SLIDE 33

Virtual Machine 
 The New Building Block

  • System virtualization: One to multiple OSes on a

physical machine thanks to a hypervisor (an operating system of OSes)

  • VM Capabilities
  • Suspend/Resume
  • Live Migration

8

Hypervisor

VM 1 VM 2 VM 3

Hypervisor Virtual Machines (VMs) Physical Machine 
 (PM)

slide-34
SLIDE 34

From Jobs to Virtualised Jobs

  • A job is now encapsulated in one or

several VMs

9

terminated running sleeping run stop resume suspend waiting ready migrate

slide-35
SLIDE 35

From Jobs to Virtualised Jobs

  • A job is now encapsulated in one or

several VMs

9

terminated running sleeping run stop resume suspend waiting ready migrate

slide-36
SLIDE 36

From Jobs to Virtualised Jobs

  • A job is now encapsulated in one or

several VMs

9

terminated running sleeping run stop resume suspend waiting ready migrate

slide-37
SLIDE 37

From Jobs to Virtualised Jobs

  • A job is now encapsulated in one or

several VMs

9

terminated running sleeping run stop resume suspend waiting ready migrate

credits: F. Hermenier, OSDI poster session 2008

  • Challenge: Maintain viable mappings between VMs and PMs
  • Each VM consumes CPU, RAM…
slide-38
SLIDE 38 Hypervisor Hypervisor Hypervisor

PM 1

PM 2

Infrastructure

PM 3

From Jobs to Virtualised Jobs

cost: 3 cost: 2

Current Status Correct Status Non-viable manipulations

credits: F. Hermenier, OSDI poster session 2008

  • Maintain viable mappings between VMs and PMs.
  • MAPE Control loop 


(leveraging the Entropy framework)

  • Make the reconfiguration phase automatic


Cluster-Wide context switch

10

[VTDC2010]

slide-39
SLIDE 39

Cluster-Wide Context Switch - Evaluations

  • Scheduling policy: A FIFO queue 


(priority between jobs to prevent starvation)

  • Testbed (further details in the manuscript)


11 Working nodes (22 CPUs)
 A queue of 8 vjobs (NASGrid benchmarks) 
 Each job uses 9 VMs (9CPUs)

11

Cumulated completion times have been reduced by 40%

Hypervisor Hypervisor Hypervisor

PM 1

PM 2

Infrastructure

PM 3

slide-40
SLIDE 40 Hypervisor Hypervisor Hypervisor

PM 1

PM 2

Infrastructure

PM 3

Cluster-Wide Context Switch - Evaluations

  • Scheduling policy: A FIFO queue 


(priority between jobs to prevent starvation)

  • Testbed (further details in the manuscript)


11 Working nodes (22 CPUs)
 A queue of 8 vjobs (NASGrid benchmarks) 
 Each job uses 9 VMs (9CPUs)

12

Cumulated completion times have been reduced by 40%

credits: A. Simonet, Introduction to Cloud Computing Lecture - Inside a Google DC

1 1 n

  • d

e s / 7 2 V M s … 
 W h a t ’ s a b

  • u

t s c a l a b i l i t y / r e a c t i v i t y ?

slide-41
SLIDE 41 Hypervisor Hypervisor Hypervisor

PM 1

PM 2

Infrastructure

PM 3

Cluster-Wide Context Switch - Evaluations

  • Scheduling policy: A FIFO queue 


(priority between jobs to prevent starvation)

  • Testbed (further details in the manuscript)


11 Working nodes (22 CPUs)
 A queue of 8 vjobs (NASGrid benchmarks) 
 Each job uses 9 VMs (9CPUs)

12

Cumulated completion times have been reduced by 40%

credits: A. Simonet, Introduction to Cloud Computing Lecture - Inside a Google DC

A Google Data Center…

1 1 n

  • d

e s / 7 2 V M s … 
 W h a t ’ s a b

  • u

t s c a l a b i l i t y / r e a c t i v i t y ?

slide-42
SLIDE 42

Scalability/Reactivity challenge

  • Computing Phase: a NP hard problem in most cases
  • Most works have been focusing on proposing heuristics to reduce the

computing phase but… reconfiguring the infrastructure is time consuming too !

13

  • 1. Monitoring
  • 2. Computing
  • 3. Reconfiguring

Timer

Time

credits: F. Quesnel, PhD defense 2013

1

2 3

slide-43
SLIDE 43
  • Computing Phase: a NP hard problem in most cases
  • Most works have been focusing on proposing heuristics to reduce the

computing phase but… reconfiguring the infrastructure is time consuming too !

Scalability/Reactivity challenge

CPU Load Time

VM 2 VM1

14

M

  • n

i t

  • r

C

  • m

p u t e R e c

  • n

fi g u r e

Is the configuration
 viable?

credits: F. Quesnel, PhD defense 2013

1

2 3

slide-44
SLIDE 44
  • Computing Phase: a NP hard problem in most cases
  • Most works have been focusing on proposing heuristics to reduce the

computing phase but… reconfiguring the infrastructure is time consuming too !

Scalability/Reactivity challenge

CPU Load Time

VM 2 VM1

14

M

  • n

i t

  • r

C

  • m

p u t e R e c

  • n

fi g u r e

Is the configuration
 viable?

credits: F. Quesnel, PhD defense 2013

1

2 3

slide-45
SLIDE 45
  • Computing Phase: a NP hard problem in most cases
  • Most works have been focusing on proposing heuristics to reduce the

computing phase but… reconfiguring the infrastructure is time consuming too !

Scalability/Reactivity challenge

CPU Load Time

VM 2 VM1

14

M

  • n

i t

  • r

C

  • m

p u t e R e c

  • n

fi g u r e

Is the configuration
 viable?

Can we reduce all phases?

credits: F. Quesnel, PhD defense 2013

1

2 3

slide-46
SLIDE 46

Leverage P2P Algorithms

  • Make direct cooperations between

hypervisors (no service node)

  • Distributed Virtual Machine Scheduler
  • Event driven / P2P Like system
  • Local interactions between nodes

  • Scheduling performed on partitions of

the system, created dynamically (nodes are reserved for an exclusive use by a scheduler, to prevent several schedulers from migrating the same VMs)

15

[CCPE2012] An Event occurs on a node Can current node scheduler calculate valid schedule? yes no Apply the schedule Contact neighbor and ask it to solve the problem

  • Make dynamic partitioning of the system according to the effective

usage of resources

slide-47
SLIDE 47

Understanding DVMS

16

9 4 2 7 1 3 5 8 6

Partition

credits: J. Pastor, PhD Defense 2016

slide-48
SLIDE 48

Understanding DVMS

16

9 4 2 7 1 3 5 8 6

Partition

credits: J. Pastor, PhD Defense 2016

slide-49
SLIDE 49

Understanding DVMS

16

9 4 2 7 1 3 5 8 6

Partition

credits: J. Pastor, PhD Defense 2016

slide-50
SLIDE 50

Understanding DVMS

16

9 4 2 7 1 3 5 8 6

Partition

credits: J. Pastor, PhD Defense 2016

slide-51
SLIDE 51

Understanding DVMS

16

9 4 2 7 1 3 5 8 6

Partition

credits: J. Pastor, PhD Defense 2016

slide-52
SLIDE 52

Understanding DVMS

16

9 4 2 7 1 3 5 8 6

Partition

credits: J. Pastor, PhD Defense 2016

slide-53
SLIDE 53

Understanding DVMS

16

9 4 2 7 1 3 5 8 6

Partition

credits: J. Pastor, PhD Defense 2016

slide-54
SLIDE 54

Understanding DVMS

16

9 4 2 7 1 3 5 8 6

Partition

credits: J. Pastor, PhD Defense 2016

slide-55
SLIDE 55

DVMS Evaluations

  • Development of a PoC
  • Evaluations (in-vivo) up to 5KVMs
  • IEEE Scale challenge 2013

17

0! 20! 40! 60! 80! 100! 120! 140! 160! Griffon! Graphene! Paradent! Parapide! Parapluie! Sol! Suno! Pastel! Number of PMs! Cluster! 2000 VMs! 251 PMs! 3325 VMs! 309 PMs! 4754 VMs! 467 PMs! 10 20 30 40 50 Time to apply a reconfiguration (s) Mean and standard deviation 251 PMs 2000 VMs 309 PMs 3325 VMs 467 PMs 4754 VMs Entropy DVMS 5 10 15 20 Time to solve an event (s) Mean and standard deviation 251 PMs 2000 VMs 309 PMs 3325 VMs 467 PMs 4754 VMs DVMS

20 40 60 80 100 Duration of an iteration (s) Mean and standard deviation 251 PMs 2000 VMs 309 PMs 3325 VMs 467 PMs 4754 VMs Entropy DVMS

slide-56
SLIDE 56

DVMS Evaluations

  • Development of a PoC
  • Evaluations (in-vivo) up to 5KVMs
  • IEEE Scale challenge 2013

17

0! 20! 40! 60! 80! 100! 120! 140! 160! Griffon! Graphene! Paradent! Parapide! Parapluie! Sol! Suno! Pastel! Number of PMs! Cluster! 2000 VMs! 251 PMs! 3325 VMs! 309 PMs! 4754 VMs! 467 PMs! 10 20 30 40 50 Time to apply a reconfiguration (s) Mean and standard deviation 251 PMs 2000 VMs 309 PMs 3325 VMs 467 PMs 4754 VMs Entropy DVMS 5 10 15 20 Time to solve an event (s) Mean and standard deviation 251 PMs 2000 VMs 309 PMs 3325 VMs 467 PMs 4754 VMs DVMS

20 40 60 80 100 Duration of an iteration (s) Mean and standard deviation 251 PMs 2000 VMs 309 PMs 3325 VMs 467 PMs 4754 VMs Entropy DVMS

I t l

  • k

s p r

  • m

i s i n g … H

  • w

c a n w e t e s t i t a t s c a l e ? H

  • w

c a n w e c

  • m

p a r e w i t h

  • t

h e r s a p p r

  • a

c h e s ?

slide-57
SLIDE 57

VM Placement
 (Hot Topic) Problem

slide-58
SLIDE 58

VM Placement
 (Hot Topic) Problem

slide-59
SLIDE 59

VM Placement
 (Hot Topic) Problem

slide-60
SLIDE 60

VM Placement
 (Hot Topic) Problem

Lots of articles (too many ?)

slide-61
SLIDE 61

VM Placement
 (Hot Topic) Problem

Lots of articles (too many ?) Evaluations are performed either at a low scale for in vivo experiments or with ad-hoc simulators. 
 How can we evaluate/compare them?

slide-62
SLIDE 62

VM Simulator Toolkits

10/2010 10/2008 10/2009 10/2013 10/2011 10/2012 10/2016 10/2014 10/2015

Cluster-Wide context switch Distributed VM Scheduler Locality Aware Placement French ANR SONGS Project

10/2017

VMPlaceS

  • T. L. Nguyen 


PhD
 Boot time model IMT Atlantique Hemera Inria Large Scale Initiative Discovery Inria Project Lab Research activities 
 mainly supported by EU BigStorage Project

  • T. Hirofuchi 


Postdoc/Invited researcher
 SimGridVM: VM abstractions

  • A. Simonet 


Postdoc
 Energy dimension Inria

slide-63
SLIDE 63

VM Simulator Toolkits

10/2010 10/2008 10/2009 10/2013 10/2011 10/2012 10/2016 10/2014 10/2015

Cluster-Wide context switch Distributed VM Scheduler Locality Aware Placement French ANR SONGS Project

10/2017

VMPlaceS

  • T. L. Nguyen 


PhD
 Boot time model IMT Atlantique Hemera Inria Large Scale Initiative Discovery Inria Project Lab Research activities 
 mainly supported by EU BigStorage Project

  • T. Hirofuchi 


Postdoc/Invited researcher
 SimGridVM: VM abstractions

  • A. Simonet 


Postdoc
 Energy dimension Inria

slide-64
SLIDE 64
  • SimGrid as a base
  • A scientific instrument to study the behaviour of large-scale distributed

systems

  • Design abstractions and models to enable researchers to control VMs in the

same manner as in the real world (e.g., create/destroy, start/shutdown, suspend/resume and migrate)

Toward a VM PLACEment Simulator

20

  • A dedicated simulator to
  • Evaluate/compare VM placement policies at large-scale (and in reproducible

manner)

  • Relieve researchers of the burden of dealing with VM creations and workloads

generation/injection

Focus on the migration model

slide-65
SLIDE 65

Source PM Destination PM VM (Running) Memory Pages

21 20 40 60 80 100 120 140 160 180 200 220 20 40 60 80 100 120 Memory Update Speed (MB/s)

Migration time (in sec)

  • Transfer VM’s states to destination without perfectible shutdown (pre-copy algorithm)
  • 1. Transfer all memory pages of the VM 


(but, keep in mind the VM is still running at source)


  • 2. Transfer updated memory pages during the previous step


  • 3. Iterate this step until the rest of memory pages becomes

sufficiently small to meet an acceptable downtime 
 (30ms in KVM). 


  • 4. Stop the VM. Transfer the rest of of memory pages and states

Accurate Live Migration Model

  • Migration time is not a linear function

according to the size of the VM

  • The more your VM is memory intensive,

the longer the migration will be

Naive approximation

  • bserved
slide-66
SLIDE 66

Source PM Destination PM VM (Running) Memory Pages

21 20 40 60 80 100 120 140 160 180 200 220 20 40 60 80 100 120 Memory Update Speed (MB/s)

Migration time (in sec)

  • Transfer VM’s states to destination without perfectible shutdown (pre-copy algorithm)
  • 1. Transfer all memory pages of the VM 


(but, keep in mind the VM is still running at source)


  • 2. Transfer updated memory pages during the previous step


  • 3. Iterate this step until the rest of memory pages becomes

sufficiently small to meet an acceptable downtime 
 (30ms in KVM). 


  • 4. Stop the VM. Transfer the rest of of memory pages and states

Accurate Live Migration Model

  • Migration time is not a linear function

according to the size of the VM

  • The more your VM is memory intensive,

the longer the migration will be

Naive approximation

  • bserved
slide-67
SLIDE 67

Source PM Destination PM VM (Running)

21 20 40 60 80 100 120 140 160 180 200 220 20 40 60 80 100 120 Memory Update Speed (MB/s)

Migration time (in sec)

  • Transfer VM’s states to destination without perfectible shutdown (pre-copy algorithm)
  • 1. Transfer all memory pages of the VM 


(but, keep in mind the VM is still running at source)


  • 2. Transfer updated memory pages during the previous step


  • 3. Iterate this step until the rest of memory pages becomes

sufficiently small to meet an acceptable downtime 
 (30ms in KVM). 


  • 4. Stop the VM. Transfer the rest of of memory pages and states

Accurate Live Migration Model

  • Migration time is not a linear function

according to the size of the VM

  • The more your VM is memory intensive,

the longer the migration will be

Naive approximation

  • bserved
slide-68
SLIDE 68

Source PM Destination PM VM (Running)

21 20 40 60 80 100 120 140 160 180 200 220 20 40 60 80 100 120 Memory Update Speed (MB/s)

Migration time (in sec)

  • Transfer VM’s states to destination without perfectible shutdown (pre-copy algorithm)
  • 1. Transfer all memory pages of the VM 


(but, keep in mind the VM is still running at source)


  • 2. Transfer updated memory pages during the previous step


  • 3. Iterate this step until the rest of memory pages becomes

sufficiently small to meet an acceptable downtime 
 (30ms in KVM). 


  • 4. Stop the VM. Transfer the rest of of memory pages and states

Accurate Live Migration Model

  • Migration time is not a linear function

according to the size of the VM

  • The more your VM is memory intensive,

the longer the migration will be

Naive approximation

  • bserved
slide-69
SLIDE 69

Source PM Destination PM VM (Running)

21 20 40 60 80 100 120 140 160 180 200 220 20 40 60 80 100 120 Memory Update Speed (MB/s)

Migration time (in sec)

  • Transfer VM’s states to destination without perfectible shutdown (pre-copy algorithm)
  • 1. Transfer all memory pages of the VM 


(but, keep in mind the VM is still running at source)


  • 2. Transfer updated memory pages during the previous step


  • 3. Iterate this step until the rest of memory pages becomes

sufficiently small to meet an acceptable downtime 
 (30ms in KVM). 


  • 4. Stop the VM. Transfer the rest of of memory pages and states

Accurate Live Migration Model

  • Migration time is not a linear function

according to the size of the VM

  • The more your VM is memory intensive,

the longer the migration will be

Naive approximation

  • bserved
slide-70
SLIDE 70

Source PM Destination PM VM (Running)

21 20 40 60 80 100 120 140 160 180 200 220 20 40 60 80 100 120 Memory Update Speed (MB/s)

Migration time (in sec)

  • Transfer VM’s states to destination without perfectible shutdown (pre-copy algorithm)
  • 1. Transfer all memory pages of the VM 


(but, keep in mind the VM is still running at source)


  • 2. Transfer updated memory pages during the previous step


  • 3. Iterate this step until the rest of memory pages becomes

sufficiently small to meet an acceptable downtime 
 (30ms in KVM). 


  • 4. Stop the VM. Transfer the rest of of memory pages and states

Accurate Live Migration Model

  • Migration time is not a linear function

according to the size of the VM

  • The more your VM is memory intensive,

the longer the migration will be

Naive approximation

  • bserved
slide-71
SLIDE 71

Source PM Destination PM VM (Restart)

21 20 40 60 80 100 120 140 160 180 200 220 20 40 60 80 100 120 Memory Update Speed (MB/s)

Migration time (in sec)

  • Transfer VM’s states to destination without perfectible shutdown (pre-copy algorithm)
  • 1. Transfer all memory pages of the VM 


(but, keep in mind the VM is still running at source)


  • 2. Transfer updated memory pages during the previous step


  • 3. Iterate this step until the rest of memory pages becomes

sufficiently small to meet an acceptable downtime 
 (30ms in KVM). 


  • 4. Stop the VM. Transfer the rest of of memory pages and states

Accurate Live Migration Model

  • Migration time is not a linear function

according to the size of the VM

  • The more your VM is memory intensive,

the longer the migration will be

Naive approximation

  • bserved
slide-72
SLIDE 72

22

Apache Postgres SQL

5 10 15 20 25 30 35 40 45 20 40 60 80 100 Migration Time (s) CPU Utilization (%) Grid5000 Simulation (Precopy) Simulation (Naive)

  • First accurate live migration model

(implementing the pre-copy strategy)
 
 The time and the resulting traffic of a migration should be computed by taking into account competition arising in the presence of resource sharing and the memory refresh rate.

  • Application memory footprints can be considered as linear functions

[CloudCom 2013]

Accurate Live Migration Model

slide-73
SLIDE 73

SimGrid VM

23

[TCC 2015]

Physical Machine (Capacity C)
 Task1 (X1) Task2 (X2)

  • 1. Solve all the constraint

problems at once. Eq1: X1 + X2 < C

Physical Machine (Capacity C)


  • 1. Solve the constraint problems at the physical machine layer.

Eq1: X1 + X2 + X3 < C

  • 2. Solve the constraint problems at the virtual machine layer.

Eq2: X1,1 + X1,2 < X1 Eq3: X2,1 < X2

VM1 (X1)
 Task11 (X1,1) Task12 (X1,2) VM2 (X2)
 Task21 (X2,1) Task3 (X3) Virtual Machine Physical SimGrid with Virtual Machine Support SimGrid without Virtual Machine Support Extend

  • SimGrid VM allows users to launch hundreds of thousands of VMs on their

simulation programs and control VMs in the same manner as in the real world

  • Users can execute computation and communication tasks on physical

machines (PMs) and VMs through the same SimGrid API, which will provide a seamless migration path to IaaS simulations for hundreds of SimGrid users

All extensions have been integrated into SimGrid

slide-74
SLIDE 74

VMPlaceS

  • A three steps engine to evaluate VM Placement Strategies

24

Initialization Phase Injector/Scheduling Phase Analysis Phase Injector Scheduler

Input: infrastructure topology, VM Nb, Workloads Injects events 
 (CPU variations, node crashes…) Researchers should (only) develop their scheduling algorithm in JAVA (or SCALA) using the SimGrid MSG API and a more abstract interface provided by VMPlaceS Output: a JSON trace file which is then consumed by the statistics R system to deliver tables/graphs (VMPlaces records several metrics during the simulation execution)

slide-75
SLIDE 75

VMPlaceS: A First Use-Case

  • To illustrate how different strategies can be evaluated/compared

25

  • 1. Collecte des

informations

  • 2. Prise de

décision

  • 3. Application de

la décision

Période 1 2 3 durée
  • 1. Resource Monitoring
  • 2. Computing a viable scheduling
  • 3. Applying reconfiguration actions

Ti

Period

Centralized 
 Entropy [VEE’09]

GL GMs LCs

Hierarchical 
 Snooze [CCGRID’12] Distributed DVMS [CCPE’12]

[EuroPar2015]

  • Simulation Input parameters
  • PMs: 8 cores, 32GB, 1Gbps, 7 cores are considered.


VMs: 1 core, 1GB, 1Gbps, memory footprint varies between 0 and 80% 
 VM CPU load (μ=60, σ=20)
 10 VMs per PM, Cluster infrastructure composed of 128/256/512/1024 PMs
 Duration: 1800 seconds 
 Period of scheduling invocations: 30 seconds.

slide-76
SLIDE 76

26

  • 128 nodes

1280 vms 256 nodes 2560 vms 512 nodes 5120 vms 1024 nodes 10240 vms 10000 20000 30000 40000

  • Infrastructure sizes

Time (s)

  • Centralized

Distributed Hierarchical Without scheduling

Cumulated violation time

Entropy/Snooze/DVMS Analysis

The centralized strategy looks useless?

slide-77
SLIDE 77

500 1000 1500 2000 2500 3000 3500 5 10 15 20 25 Time (s) Duration of the violation (s) Entropy first Entropy first false positive DVMS DVMS false positive 27

Another view focusing on Entropy and DVMS

Entropy/Snooze/DVMS Analysis

slide-78
SLIDE 78

(AVG | STD) (AVG | STD) (AVG | STD) (AVG | STD)

Entropy/Snooze/DVMS Analysis

28

slide-79
SLIDE 79

(AVG | STD) (AVG | STD) (AVG | STD) (AVG | STD)

Entropy/Snooze/DVMS Analysis

28

DVMS outperforms the others !?

slide-80
SLIDE 80

(AVG | STD) (AVG | STD) (AVG | STD) (AVG | STD)

Entropy/Snooze/DVMS Analysis

28

While the centralized approach does not scale, both phases are constant from the time viewpoint for the two other approaches DVMS outperforms the others !?

slide-81
SLIDE 81

(AVG | STD) (AVG | STD) (AVG | STD) (AVG | STD)

Entropy/Snooze/DVMS Analysis

28

While the centralized approach does not scale, both phases are constant from the time viewpoint for the two other approaches DVMS outperforms the others !?

1./ Can we find a good partitioning size for Snooze ? 2./ What would be the benefit for Snooze of a reactive approach?

slide-82
SLIDE 82

(AVG | STD)

Investigate Variants

  • Evaluate the impact of having smaller partitions in Snooze
  • Same numbers of PMs but partitions grow from 2 LCs to 32 LCs per GM

29

  • 128 nodes

1280 vms 256 nodes 2560 vms 512 nodes 5120 vms 1024 nodes 10240 vms 5000 10000 15000 20000 25000

  • Infrastructure sizes

Time (s)

  • Hierarchical2LCs

Hierarchical4LCs Hierarchical8LCs Hierarchical32LCs

Cumulated violation time

slide-83
SLIDE 83

(AVG | STD)

Investigate Variants

  • Evaluate the impact of having smaller partitions in Snooze
  • Same numbers of PMs but partitions grow from 2 LCs to 32 LCs per GM

29

  • 128 nodes

1280 vms 256 nodes 2560 vms 512 nodes 5120 vms 1024 nodes 10240 vms 5000 10000 15000 20000 25000

  • Infrastructure sizes

Time (s)

  • Hierarchical2LCs

Hierarchical4LCs Hierarchical8LCs Hierarchical32LCs

Cumulated violation time

slide-84
SLIDE 84

(AVG | STD)

Investigate Variants

  • Evaluate the impact of having smaller partitions in Snooze
  • Same numbers of PMs but partitions grow from 2 LCs to 32 LCs per GM

29

  • 128 nodes

1280 vms 256 nodes 2560 vms 512 nodes 5120 vms 1024 nodes 10240 vms 5000 10000 15000 20000 25000

  • Infrastructure sizes

Time (s)

  • Hierarchical2LCs

Hierarchical4LCs Hierarchical8LCs Hierarchical32LCs

Cumulated violation time

slide-85
SLIDE 85

(AVG | STD)

Investigate Variants

  • Evaluate the impact of having smaller partitions in Snooze
  • Same numbers of PMs but partitions grow from 2 LCs to 32 LCs per GM

29

  • 128 nodes

1280 vms 256 nodes 2560 vms 512 nodes 5120 vms 1024 nodes 10240 vms 5000 10000 15000 20000 25000

  • Infrastructure sizes

Time (s)

  • Hierarchical2LCs

Hierarchical4LCs Hierarchical8LCs Hierarchical32LCs

Cumulated violation time

The smaller is the size of the partition, 
 the bigger the probability to do not find a viable solution

slide-86
SLIDE 86

(AVG | STD)

Investigate Variants

  • Evaluate the impact of having smaller partitions in Snooze
  • Same numbers of PMs but partitions grow from 2 LCs to 32 LCs per GM

29

  • 128 nodes

1280 vms 256 nodes 2560 vms 512 nodes 5120 vms 1024 nodes 10240 vms 5000 10000 15000 20000 25000

  • Infrastructure sizes

Time (s)

  • Hierarchical2LCs

Hierarchical4LCs Hierarchical8LCs Hierarchical32LCs

Cumulated violation time

The smaller is the size of the partition, 
 the bigger the probability to do not find a viable solution Other variants and possible improvements (for instance contact neighbours two by two in DVMS)

slide-87
SLIDE 87

VMPlaceS / VM Simulator toolkits

  • Difficulties to conduct relevant evaluation of VM placement strategies (in

vivo conditions, lot of metrics to monitor, scalability/reactivity, …)

  • VMPlaceS, a framework providing
  • Programming support for the definition of new VM placement strategies


Execution support for their accurate simulation at large scale
 Means to analyze the collected traces

  • Validated up to 10K PMs/100K VMs
  • Available online: http://beyondtheclouds.github.io/VMPlaceS/
  • On-going and future work
  • Collect energy metrics


VM boot time 
 VM image migrations (storage challenge)
 Workloads reproducing real traces (complex to get real traces)
 Provide similar abstractions for container technologies (must have)

30

[TPDS submission under review]

[PDP2017]

slide-88
SLIDE 88

Beyond the Clouds

10/2010 10/2008 10/2009 10/2013 10/2011 10/2012 10/2016 10/2014 10/2015

Cluster-Wide context switch Distributed VM Scheduler Locality Aware Placement French ANR SONGS Project

10/2017

VMPlaceS

IMT Atlantique Hemera Inria Large Scale Initiative Discovery Inria Project Lab Research activities 
 mainly supported by EU BigStorage Project Inria SimGridVM

Discovery vision OpenStack: From SQL to noSQL backends Discovery Inria Project Lab Research Eng. R-A Cherrueau


  • M. Simonin

EnOS

slide-89
SLIDE 89

UTILITY COMPUTING

From mainframes to …

slide-90
SLIDE 90

…larger “mainframes”

Microsoft DC, Quincy, WA state

UTILITY COMPUTING

From mainframes to …

slide-91
SLIDE 91

33

2 1 2

  • 2

1 3 
 M a j

  • r

b r a k e s f

  • r

t h e a d

  • p

t i

  • n
  • f

t h e C C m

  • d

e l

Jurisdiction concerns Reliability CC distance (network overheads)

slide-92
SLIDE 92

Discovery Vision

  • Bring Clouds back to the cloud
  • Leverage the concept of µDC/nDC to extend any point of presence of

network backbones (aka PoP) with servers

  • From network hubs up to major DSLAMs that are operated by

telecom companies, network institutions…

34

[Discovery2013] [VHPC2011] Geant Internet 2 RENATER

slide-93
SLIDE 93

Discovery Vision

  • Bring Clouds back to the cloud
  • Leverage the concept of µDC/nDC to extend any point of presence of

network backbones (aka PoP) with servers

  • From network hubs up to major DSLAMs that are operated by

telecom companies, network institutions…

34

[Discovery2013] [VHPC2011] Geant Internet 2

How operating/using such a massively distributed infrastructure from the software viewpoint?

RENATER

slide-94
SLIDE 94
  • Sporadic (hybrid computing/cloud bursting) almost ready for production
  • Brokers are rather limited to simple usages and not advanced administration
  • perations

35

Alice Bob Charles

What’s about Brokering Approaches?

slide-95
SLIDE 95
  • Sporadic (hybrid computing/cloud bursting) almost ready for production
  • Brokers are rather limited to simple usages and not advanced administration
  • perations

35

Alice Bob Charles

What’s about Brokering Approaches?

slide-96
SLIDE 96
  • Sporadic (hybrid computing/cloud bursting) almost ready for production
  • Brokers are rather limited to simple usages and not advanced administration
  • perations

35

Alice Bob Charles

Advanced brokers must reimplement standard IaaS mechanisms while facing the API limitation

What’s about Brokering Approaches?

slide-97
SLIDE 97
  • Do not reinvent the wheel… it’s too late


OpenStack (20Millions of LOC, 3M just for the core-services)

  • Discovery objectives (overview)
  • Study to what extent the current OpenStack mechanisms can handle such massively

distributed infrastructures

  • Propose revisions/extensions of internal mechanisms when appropriate

Alice Bo Charle

Would OpenStack be the solution?

36

  • From SQL to NoSQL backend in OpenStack


(a research PoC, just the top of the iceberg, numerous challenges)

  • Toward a Holistic Framework for Conducting Scientific Evaluations of OpenStack


EnOS, A tool for diving into OpenStack and performing scientific investigations

[IC2E 2017] [CCGRID 2017]

slide-98
SLIDE 98

Conclusion / Future Work

10/2010 10/2008 10/2009 10/2013 10/2011 10/2012 10/2016 10/2014 10/2015

Cluster-Wide context switch Distributed VM Scheduler Locality Aware Placement French ANR SONGS Project

10/2017

VMPlaceS

IMT Atlantique Hemera Inria Large Scale Initiative Discovery Inria Project Lab Research activities 
 mainly supported by EU BigStorage Project Inria SimGridVM

Discovery vision OpenStack: From SQL to noSQL backends Discovery Inria Project Lab EnOS STACK
 Proposal

slide-99
SLIDE 99
  • Virtualization technologies: a key role in the Cloud Computing adoption

(flexibility, portability) but with a cost…

  • Complexity of the software stack
  • Difficulty to guarantee performance
  • Placement challenges
  • How to express placement constraints? [plasma2013] a good starting point.
  • Can we consider network and storage dimensions?
  • People expect containers technologies will help but..
  • Similar consolidation issues
  • Naive use (containers on top of VM on top of PM)
  • Current trend: server densification: more cores per PMs, more RAMs….

What you may have! What you may expect ! What you expect !

Alice Alice Alice Alice Alice Alice

Map/Reduce framework
 (leverage attached storage facilities)

38

Alice

Frontend

Conclusion / Future Work

slide-100
SLIDE 100
  • Virtualization technologies: a key role in the Cloud Computing adoption

(flexibility, portability) but with a cost…

  • Complexity of the software stack
  • Difficulty to guarantee performance
  • Placement challenges
  • How to express placement constraints? [plasma2013] a good starting point.
  • Can we consider network and storage dimensions?
  • People expect containers technologies will help but..
  • Similar consolidation issues
  • Naive use (containers on top of VM on top of PM)
  • Current trend: server densification: more cores per PMs, more RAMs….

What you may have! What you may expect !

38

Alice

Frontend

Alice Alice Alice Alice Alice Alice

Conclusion / Future Work

slide-101
SLIDE 101
  • Virtualization technologies: a key role in the Cloud Computing adoption

(flexibility, portability) but with a cost…

  • Complexity of the software stack
  • Difficulty to guarantee performance
  • Placement challenges
  • How to express placement constraints? [plasma2013] a good starting point.
  • Can we consider network and storage dimensions?
  • People expect containers technologies will help but..
  • Similar consolidation issues
  • Naive use (containers on top of VM on top of PM)
  • Current trend: server densification: more cores per PMs, more RAMs….

What you may have!

38

Alice

Frontend

Alice Alice Alice Alice Alice Alice

Bob Bob Bob Bob Bob Bob

Bob

Conclusion / Future Work

slide-102
SLIDE 102
  • Virtualization technologies: a key role in the Cloud Computing adoption

(flexibility, portability) but with a cost…

  • Complexity of the software stack
  • Difficulty to guarantee performance
  • Placement challenges
  • How to express placement constraints? [plasma2013] a good starting point.
  • Can we consider network and storage dimensions?
  • People expect containers technologies will help but..
  • Similar consolidation issues
  • Naive use (containers on top of VM on top of PM)
  • Current trend: server densification: more cores per PMs, more RAMs….

What you may have!

38

Alice

Frontend

Bob Bob Bob Bob Bob Bob

Bob

Alice

Conclusion / Future Work

slide-103
SLIDE 103
  • Virtualization technologies: a key role in the Cloud Computing adoption

(flexibility, portability) but with a cost…

  • Complexity of the software stack
  • Difficulty to guarantee performance
  • Placement challenges
  • How to express placement constraints? [plasma2013] a good starting point.
  • Can we consider network and storage dimensions?
  • People expect containers technologies will help but..
  • Similar consolidation issues
  • Naive use (containers on top of VM on top of PM)
  • Current trend: server densification: more cores per PMs, more RAMs….

What you may have!

38

Alice

Frontend

Bob Bob Bob Bob Bob Bob

Bob

Conclusion / Future Work

slide-104
SLIDE 104
  • Virtualization technologies: a key role in the Cloud Computing adoption

(flexibility, portability) but with a cost…

  • Complexity of the software stack
  • Difficulty to guarantee performance
  • Placement challenges
  • How to express placement constraints? [plasma2013] a good starting point.
  • Can we consider network and storage dimensions?
  • People expect containers technologies will help but..
  • Similar consolidation issues
  • Naive use (containers on top of VM on top of PM)
  • Current trend: server densification (more cores per PMs, more RAMs….)

39

Conclusion / Future Work

slide-105
SLIDE 105
  • Virtualization technologies: a key role in the Cloud Computing adoption

(flexibility, portability) but with a cost…

  • Complexity of the software stack
  • Difficulty to guarantee performance
  • Placement challenges
  • How to express placement constraints? [plasma2013] a good starting point.
  • Can we consider network and storage dimensions?
  • People expect containers technologies will help but..
  • Similar consolidation issues
  • Naive use (containers on top of VM on top of PM)
  • Current trend: server densification (more cores per PMs, more RAMs….)

39

P r

  • p
  • s

e t

  • l

s t h a t h e l p u s t

  • u

n d e r s t a n d t h i s c

  • m

p l e x i t y

Conclusion / Future Work

slide-106
SLIDE 106
  • Virtualization technologies: a key role in the Cloud Computing adoption

(flexibility, portability) but with a cost…

  • Complexity of the software stack
  • Difficulty to guarantee performance
  • Placement challenges
  • How to express placement constraints? [plasma2013] a good starting point.
  • Can we consider network and storage dimensions?
  • People expect containers technologies will help but..
  • Similar consolidation issues
  • Naive use (containers on top of VM on top of PM)
  • Current trend: server densification (more cores per PMs, more RAMs….)

39

P r

  • p
  • s

e t

  • l

s t h a t h e l p u s t

  • u

n d e r s t a n d t h i s c

  • m

p l e x i t y

Conclusion / Future Work

Propose models to capture this complexity in advanced alg.

slide-107
SLIDE 107
  • Utility Computing: a constant switch between centralization and distribution

(Mainframe/Cluster vs Grid vs Cloud)

  • A new decentralization phase (no more debated) due to locality requirements
  • f IoT and NFV Applications.
  • Are the challenges similar to the Grid ones?
  • Topology (static vs dynamic)
  • Federation of a few sites vs massively distributed
  • Heterogeneity in terms of ICT resources
  • Some research groups work on federated clouds (in particular for science)

but edge/fog computing infrastructure significantly differ
 


40

Industrial Internet / Internet of Skills

Conclusion / Future Work

slide-108
SLIDE 108
  • Utility Computing: a constant switch between centralization and distribution

(Mainframe/Cluster vs Grid vs Cloud)

  • A new decentralization phase (no more debated) due to locality requirements
  • f IoT and NFV Applications.
  • Are the challenges similar to the Grid ones?
  • Topology (static vs dynamic)
  • Federation of a few sites vs massively distributed
  • Heterogeneity in terms of ICT resources
  • Some research groups work on federated clouds (in particular for science)

but edge/fog computing infrastructure significantly differ
 


40

Internet Backbone

Industrial Internet / Internet of Skills

Conclusion / Future Work

slide-109
SLIDE 109
  • Utility Computing: a constant switch between centralization and distribution

(Mainframe/Cluster vs Grid vs Cloud)

  • A new decentralization phase (no more debated) due to locality requirements
  • f IoT and NFV Applications.
  • Are the challenges similar to the Grid ones?
  • Topology (static vs dynamic)
  • Federation of a few sites vs massively distributed
  • Heterogeneity in terms of ICT resources
  • Some research groups work on federated clouds (in particular for science)

but edge/fog computing infrastructure significantly differ
 


40

Internet Backbone

Industrial Internet / Internet of Skills

Conclusion / Future Work

slide-110
SLIDE 110

41

Fog/Edge - What are the challenges?

  • Utility Computing: a constant switch between centralization and distribution

(Mainframe/Cluster vs Grid vs Cloud)

  • A new decentralization phase (no more debated) due to locality

requirements of IoT and NFV Applications.

  • Are the challenges similar to the Grid ones?
  • Topology (static vs dynamic)
  • Federation of a few sites vs massively distributed
  • Heterogeneity in terms of ICT resources
  • Some research groups work on federated clouds (in particular for science)

but edge/fog computing infrastructure significantly differ
 


Conclusion / Future Work

slide-111
SLIDE 111

STACK Proposal

slide-112
SLIDE 112

STACK Proposal

W h e r e s h

  • u

l d I d e p l

  • y

m i c r

  • D

C s ? O n e a c h P

  • P

? W h a t ’ s a b

  • u

t c

  • n

t r

  • l

s e r v i c e s ? a t w h a t s c a l e ?

slide-113
SLIDE 113

STACK Proposal

W h e r e s h

  • u

l d I d e p l

  • y

m i c r

  • D

C s ? O n e a c h P

  • P

? W h a t ’ s a b

  • u

t c

  • n

t r

  • l

s e r v i c e s ? a t w h a t s c a l e ? W h a t ’ s a b

  • u

t t h e p e r f

  • r

m a n c e / r e l i a b i l i t y c r i t e r i a ?

slide-114
SLIDE 114

STACK Proposal

W h e r e s h

  • u

l d I d e p l

  • y

m i c r

  • D

C s ? O n e a c h P

  • P

? W h a t ’ s a b

  • u

t c

  • n

t r

  • l

s e r v i c e s ? a t w h a t s c a l e ? W h a t ’ s a b

  • u

t t h e p e r f

  • r

m a n c e / r e l i a b i l i t y c r i t e r i a ? H

  • w

c

  • n

t r

  • l

s e r v i c e s s h

  • u

l d b e d e s i g n e d ? 
 C e n t r a l i s e d / H i e r a r c h i c a l / P 2 P b a s e d ?

slide-115
SLIDE 115

STACK Proposal

W h e r e s h

  • u

l d I d e p l

  • y

m i c r

  • D

C s ? O n e a c h P

  • P

? W h a t ’ s a b

  • u

t c

  • n

t r

  • l

s e r v i c e s ? a t w h a t s c a l e ? W h a t ’ s a b

  • u

t t h e p e r f

  • r

m a n c e / r e l i a b i l i t y c r i t e r i a ? H

  • w

c

  • n

t r

  • l

s e r v i c e s s h

  • u

l d b e d e s i g n e d ? 
 C e n t r a l i s e d / H i e r a r c h i c a l / P 2 P b a s e d ? G l

  • b

a l v s p a r t i a l v i e w s

  • f

t h e s y s t e m ?

slide-116
SLIDE 116

STACK Proposal

W h e r e s h

  • u

l d I d e p l

  • y

m i c r

  • D

C s ? O n e a c h P

  • P

? W h a t ’ s a b

  • u

t c

  • n

t r

  • l

s e r v i c e s ? a t w h a t s c a l e ? W h a t ’ s a b

  • u

t t h e p e r f

  • r

m a n c e / r e l i a b i l i t y c r i t e r i a ?

P l a c e m e n t a l g

  • r

i t h m s h a v e b e e n d e s i g n e d w i t h s t r

  • n

g a s s u m p t i

  • n

s ( i n fi n i t y

  • f

r e s

  • u

r c e s , d a t a l

  • c

a l i t y ) . 
 H e r e r e s

  • u

r c e s a r e b

  • u

n d e d , a p p l i c a t i

  • n

s h a v e m

  • r

e c

  • n

s t r a i n t s t

  • d

e a l w i t h …

H

  • w

c

  • n

t r

  • l

s e r v i c e s s h

  • u

l d b e d e s i g n e d ? 
 C e n t r a l i s e d / H i e r a r c h i c a l / P 2 P b a s e d ? G l

  • b

a l v s p a r t i a l v i e w s

  • f

t h e s y s t e m ?

slide-117
SLIDE 117

STACK Proposal

W h e r e s h

  • u

l d I d e p l

  • y

m i c r

  • D

C s ? O n e a c h P

  • P

? W h a t ’ s a b

  • u

t c

  • n

t r

  • l

s e r v i c e s ? a t w h a t s c a l e ? W h a t ’ s a b

  • u

t t h e p e r f

  • r

m a n c e / r e l i a b i l i t y c r i t e r i a ?

P l a c e m e n t a l g

  • r

i t h m s h a v e b e e n d e s i g n e d w i t h s t r

  • n

g a s s u m p t i

  • n

s ( i n fi n i t y

  • f

r e s

  • u

r c e s , d a t a l

  • c

a l i t y ) . 
 H e r e r e s

  • u

r c e s a r e b

  • u

n d e d , a p p l i c a t i

  • n

s h a v e m

  • r

e c

  • n

s t r a i n t s t

  • d

e a l w i t h …

H

  • w

c

  • n

t r

  • l

s e r v i c e s s h

  • u

l d b e d e s i g n e d ? 
 C e n t r a l i s e d / H i e r a r c h i c a l / P 2 P b a s e d ? G l

  • b

a l v s p a r t i a l v i e w s

  • f

t h e s y s t e m ? H

  • w

c a n d e v e l

  • p

e r s e x p r e s s t h

  • s

e c

  • n

s t r a i n t s ? 
 H

  • w

c a n t h e s y s t e m g u a r a n t e e t h e m d u r i n g r e c

  • n

fi g u r a t i

  • n
  • p

e r a t i

  • n

s ?

slide-118
SLIDE 118

STACK Proposal

W h e r e s h

  • u

l d I d e p l

  • y

m i c r

  • D

C s ? O n e a c h P

  • P

? W h a t ’ s a b

  • u

t c

  • n

t r

  • l

s e r v i c e s ? a t w h a t s c a l e ? W h a t ’ s a b

  • u

t t h e p e r f

  • r

m a n c e / r e l i a b i l i t y c r i t e r i a ?

P l a c e m e n t a l g

  • r

i t h m s h a v e b e e n d e s i g n e d w i t h s t r

  • n

g a s s u m p t i

  • n

s ( i n fi n i t y

  • f

r e s

  • u

r c e s , d a t a l

  • c

a l i t y ) . 
 H e r e r e s

  • u

r c e s a r e b

  • u

n d e d , a p p l i c a t i

  • n

s h a v e m

  • r

e c

  • n

s t r a i n t s t

  • d

e a l w i t h …

H

  • w

c

  • n

t r

  • l

s e r v i c e s s h

  • u

l d b e d e s i g n e d ? 
 C e n t r a l i s e d / H i e r a r c h i c a l / P 2 P b a s e d ? G l

  • b

a l v s p a r t i a l v i e w s

  • f

t h e s y s t e m ? H

  • w

c a n d e v e l

  • p

e r s e x p r e s s t h

  • s

e c

  • n

s t r a i n t s ? 
 H

  • w

c a n t h e s y s t e m g u a r a n t e e t h e m d u r i n g r e c

  • n

fi g u r a t i

  • n
  • p

e r a t i

  • n

s ? H

  • w

s h

  • u

l d t h e s y s t e m a d d r e s s b i g d a t a a p p l i c a t i

  • n

s 
 ( c

  • n

s i d e r i n g a s i g n i fi c a n t n u m b e r

  • f

g e

  • d

i s t r i b u t e d d a t a s

  • u

r c e s ) ?

slide-119
SLIDE 119

STACK Proposal

W h e r e s h

  • u

l d I d e p l

  • y

m i c r

  • D

C s ? O n e a c h P

  • P

? W h a t ’ s a b

  • u

t c

  • n

t r

  • l

s e r v i c e s ? a t w h a t s c a l e ? W h a t ’ s a b

  • u

t t h e p e r f

  • r

m a n c e / r e l i a b i l i t y c r i t e r i a ?

P l a c e m e n t a l g

  • r

i t h m s h a v e b e e n d e s i g n e d w i t h s t r

  • n

g a s s u m p t i

  • n

s ( i n fi n i t y

  • f

r e s

  • u

r c e s , d a t a l

  • c

a l i t y ) . 
 H e r e r e s

  • u

r c e s a r e b

  • u

n d e d , a p p l i c a t i

  • n

s h a v e m

  • r

e c

  • n

s t r a i n t s t

  • d

e a l w i t h …

H

  • w

c

  • n

t r

  • l

s e r v i c e s s h

  • u

l d b e d e s i g n e d ? 
 C e n t r a l i s e d / H i e r a r c h i c a l / P 2 P b a s e d ? G l

  • b

a l v s p a r t i a l v i e w s

  • f

t h e s y s t e m ? H

  • w

c a n d e v e l

  • p

e r s e x p r e s s t h

  • s

e c

  • n

s t r a i n t s ? 
 H

  • w

c a n t h e s y s t e m g u a r a n t e e t h e m d u r i n g r e c

  • n

fi g u r a t i

  • n
  • p

e r a t i

  • n

s ? H

  • w

s h

  • u

l d t h e s y s t e m a d d r e s s b i g d a t a a p p l i c a t i

  • n

s 
 ( c

  • n

s i d e r i n g a s i g n i fi c a n t n u m b e r

  • f

g e

  • d

i s t r i b u t e d d a t a s

  • u

r c e s ) ? what’s about security aspects?

slide-120
SLIDE 120

STACK Proposal

W h e r e s h

  • u

l d I d e p l

  • y

m i c r

  • D

C s ? O n e a c h P

  • P

? W h a t ’ s a b

  • u

t c

  • n

t r

  • l

s e r v i c e s ? a t w h a t s c a l e ? W h a t ’ s a b

  • u

t t h e p e r f

  • r

m a n c e / r e l i a b i l i t y c r i t e r i a ?

P l a c e m e n t a l g

  • r

i t h m s h a v e b e e n d e s i g n e d w i t h s t r

  • n

g a s s u m p t i

  • n

s ( i n fi n i t y

  • f

r e s

  • u

r c e s , d a t a l

  • c

a l i t y ) . 
 H e r e r e s

  • u

r c e s a r e b

  • u

n d e d , a p p l i c a t i

  • n

s h a v e m

  • r

e c

  • n

s t r a i n t s t

  • d

e a l w i t h …

H

  • w

c

  • n

t r

  • l

s e r v i c e s s h

  • u

l d b e d e s i g n e d ? 
 C e n t r a l i s e d / H i e r a r c h i c a l / P 2 P b a s e d ? G l

  • b

a l v s p a r t i a l v i e w s

  • f

t h e s y s t e m ? H

  • w

c a n d e v e l

  • p

e r s e x p r e s s t h

  • s

e c

  • n

s t r a i n t s ? 
 H

  • w

c a n t h e s y s t e m g u a r a n t e e t h e m d u r i n g r e c

  • n

fi g u r a t i

  • n
  • p

e r a t i

  • n

s ? H

  • w

s h

  • u

l d t h e s y s t e m a d d r e s s b i g d a t a a p p l i c a t i

  • n

s 
 ( c

  • n

s i d e r i n g a s i g n i fi c a n t n u m b e r

  • f

g e

  • d

i s t r i b u t e d d a t a s

  • u

r c e s ) ? what’s about security aspects? Energy footprint of such infrastructures?

slide-121
SLIDE 121

STACK Proposal

W h e r e s h

  • u

l d I d e p l

  • y

m i c r

  • D

C s ? O n e a c h P

  • P

? W h a t ’ s a b

  • u

t c

  • n

t r

  • l

s e r v i c e s ? a t w h a t s c a l e ? W h a t ’ s a b

  • u

t t h e p e r f

  • r

m a n c e / r e l i a b i l i t y c r i t e r i a ?

P l a c e m e n t a l g

  • r

i t h m s h a v e b e e n d e s i g n e d w i t h s t r

  • n

g a s s u m p t i

  • n

s ( i n fi n i t y

  • f

r e s

  • u

r c e s , d a t a l

  • c

a l i t y ) . 
 H e r e r e s

  • u

r c e s a r e b

  • u

n d e d , a p p l i c a t i

  • n

s h a v e m

  • r

e c

  • n

s t r a i n t s t

  • d

e a l w i t h …

H

  • w

c

  • n

t r

  • l

s e r v i c e s s h

  • u

l d b e d e s i g n e d ? 
 C e n t r a l i s e d / H i e r a r c h i c a l / P 2 P b a s e d ? G l

  • b

a l v s p a r t i a l v i e w s

  • f

t h e s y s t e m ? H

  • w

c a n d e v e l

  • p

e r s e x p r e s s t h

  • s

e c

  • n

s t r a i n t s ? 
 H

  • w

c a n t h e s y s t e m g u a r a n t e e t h e m d u r i n g r e c

  • n

fi g u r a t i

  • n
  • p

e r a t i

  • n

s ? H

  • w

s h

  • u

l d t h e s y s t e m a d d r e s s b i g d a t a a p p l i c a t i

  • n

s 
 ( c

  • n

s i d e r i n g a s i g n i fi c a n t n u m b e r

  • f

g e

  • d

i s t r i b u t e d d a t a s

  • u

r c e s ) ? what’s about security aspects? Can μDCs benefit from renewable energy sources? Energy footprint of such infrastructures?

slide-122
SLIDE 122

STACK Proposal

W h e r e s h

  • u

l d I d e p l

  • y

m i c r

  • D

C s ? O n e a c h P

  • P

? W h a t ’ s a b

  • u

t c

  • n

t r

  • l

s e r v i c e s ? a t w h a t s c a l e ? W h a t ’ s a b

  • u

t t h e p e r f

  • r

m a n c e / r e l i a b i l i t y c r i t e r i a ?

P l a c e m e n t a l g

  • r

i t h m s h a v e b e e n d e s i g n e d w i t h s t r

  • n

g a s s u m p t i

  • n

s ( i n fi n i t y

  • f

r e s

  • u

r c e s , d a t a l

  • c

a l i t y ) . 
 H e r e r e s

  • u

r c e s a r e b

  • u

n d e d , a p p l i c a t i

  • n

s h a v e m

  • r

e c

  • n

s t r a i n t s t

  • d

e a l w i t h …

H

  • w

c

  • n

t r

  • l

s e r v i c e s s h

  • u

l d b e d e s i g n e d ? 
 C e n t r a l i s e d / H i e r a r c h i c a l / P 2 P b a s e d ? G l

  • b

a l v s p a r t i a l v i e w s

  • f

t h e s y s t e m ? H

  • w

c a n d e v e l

  • p

e r s e x p r e s s t h

  • s

e c

  • n

s t r a i n t s ? 
 H

  • w

c a n t h e s y s t e m g u a r a n t e e t h e m d u r i n g r e c

  • n

fi g u r a t i

  • n
  • p

e r a t i

  • n

s ? H

  • w

s h

  • u

l d t h e s y s t e m a d d r e s s b i g d a t a a p p l i c a t i

  • n

s 
 ( c

  • n

s i d e r i n g a s i g n i fi c a n t n u m b e r

  • f

g e

  • d

i s t r i b u t e d d a t a s

  • u

r c e s ) ? what’s about security aspects? Can μDCs benefit from renewable energy sources? Energy footprint of such infrastructures?

STACK, a new research team to address 
 Fog/Edge Infrastructures’ challenges

slide-123
SLIDE 123

It does not matter how slowly you go as long as you do not stop. 
 — Confucius

slide-124
SLIDE 124

Back up

slide-125
SLIDE 125

A lot of challenges

  • A huge gap between open source software stacks to run production systems and

academic proposals

  • Do not let Google/Amazon be the only actors to address infrastructure challenges

just because they operate them.

  • The scientific community (both network and distributed system ones) should take

part to the evolution of major software stacks such as OpenStack like it has been

  • nce done for Linux.
  • We need dedicated infrastructures to conduct such scientific studies


45

slide-126
SLIDE 126

A lot of challenges

  • A huge gap between open source software stacks to run production systems and

academic proposals

  • Do not let Google/Amazon be the only actors to address infrastructure challenges

just because they operate them.

  • The scientific community (both network and distributed system ones) should take

part to the evolution of major software stacks such as OpenStack like it has been

  • nce done for Linux.
  • We need dedicated infrastructures to conduct such scientific studies


45 “Just good enough to publish a good paper” Francesco Lo Presti, Ass. Prof. University Di Roma, Italy
 ‘’…Optimality is not needed, it should just run…’’ Johan Ecker, Principal Researcher, Cloud Technology, Ericsson Research, Sweden
 CloudControl WS June 2017

slide-127
SLIDE 127

VM Placement
 Still a Hot Topic Problem

slide-128
SLIDE 128

47

  • Implementation of a dedicated version of VMPlaceS on Grid’5000
  • Implementation of the Entropy proposal [VEE’09] in both systems
  • Comparison between in vivo and simulated executions


 32PMs, 4 cores/16GB/1Gbps per PM 192 VMs (6 per node) 
 
 1 core,1GB, 1Gbps 
 Memory footprint (between 0 and 80% of 1Gbps)
 Average CPU load 60%
 
 The scheduling algorithm has been invoked every
 60 seconds over a 3600 seconds execution
 
 A dedicated tool to inject the load in each VM 
 running on top of G5K according to the events
 consumed by the injector

Accuracy of VMPlaceS

1 2 3

1000 2000 3000 4000 20 40 60

Computation Reconfiguration

Grid’5000

1000 2000 3000 4000 20 40 60

Computation Reconfiguration

VMPlaceS

slide-129
SLIDE 129

47

  • Implementation of a dedicated version of VMPlaceS on Grid’5000
  • Implementation of the Entropy proposal [VEE’09] in both systems
  • Comparison between in vivo and simulated executions


 32PMs, 4 cores/16GB/1Gbps per PM 192 VMs (6 per node) 
 
 1 core,1GB, 1Gbps 
 Memory footprint (between 0 and 80% of 1Gbps)
 Average CPU load 60%
 
 The scheduling algorithm has been invoked every
 60 seconds over a 3600 seconds execution
 
 A dedicated tool to inject the load in each VM 
 running on top of G5K according to the events
 consumed by the injector

Accuracy of VMPlaceS

1 2 3

1000 2000 3000 4000 20 40 60

Computation Reconfiguration

Grid’5000

1000 2000 3000 4000 20 40 60

Computation Reconfiguration

VMPlaceS Difference between 6% and 18% (med: 12%) Worst case when multiple migrations
 are performed simultaneously to the same node

slide-130
SLIDE 130

Understanding DVMS

48

9 4 2 7 1 3 5 8 6

Partition

credits: J. Pastor, PhD Defense 2016

slide-131
SLIDE 131

Understanding DVMS

48

9 4 2 7 1 3 5 8 6

Partition

credits: J. Pastor, PhD Defense 2016

slide-132
SLIDE 132

Understanding DVMS

48

9 4 2 7 1 3 5 8 6

Partition

credits: J. Pastor, PhD Defense 2016

slide-133
SLIDE 133

Understanding DVMS

48

9 4 2 7 1 3 5 8 6

Partition

credits: J. Pastor, PhD Defense 2016

slide-134
SLIDE 134

Understanding DVMS

48

9 4 2 7 1 3 5 8 6

Partition

credits: J. Pastor, PhD Defense 2016

slide-135
SLIDE 135

Understanding DVMS

48

9 4 2 7 1 3 5 8 6

Partition

credits: J. Pastor, PhD Defense 2016

slide-136
SLIDE 136

Understanding DVMS

48

9 4 2 7 1 3 5 8 6

Partition

credits: J. Pastor, PhD Defense 2016

slide-137
SLIDE 137

Understanding DVMS

48

9 4 2 7 1 3 5 8 6

Partition

credits: J. Pastor, PhD Defense 2016

slide-138
SLIDE 138

Locality-aware Cooperation

49

Partition

credits: J. Pastor, PhD Defense 2016

9 4 2 7 1 3 5 8 6

! ! ! ! ! !

slide-139
SLIDE 139

A Ring to Rule them All

From SQL to NoSQL backend in OpenStack

slide-140
SLIDE 140
  • Austin Summit - May 2016 - Nova PoC

51

Looking back to the Future

slide-141
SLIDE 141
  • Austin Summit - May 2016 - Nova PoC

51

Looking back to the Future

slide-142
SLIDE 142

/39

Leveraging a Key/Value Store DB

52

Non-Relational

ROME

Key/Value
 DB

Relational

SQLAlchemy

Nova Network Nova Compute Nova Scheduler Nova Conductor

db.api

MySQL DB

Nova (compute service) - software architecture

slide-143
SLIDE 143

/39

ROME

  • Relational Object Mapping Extension for key/value stores


Jonathan Pastor’s Phd


https://github.com/BeyondTheClouds/rome

  • Enables the query of Key/Value Store DB with the same

interface as SQLAlchemy

  • Enables Nova OpenStack to switch 


to a KVS without being too intrusive

  • The KVS is distributed over


(dedicated) nodes

  • Nova services connect to the


Key/value store cluster

53

AMQP bus AMQP bus AMQP bus

Key/Value Store

Nova Controller 3

n-sched n-cond n-api n-net n-cpu horizon

Nova Controller 2

n-sched n-cond n-api n-net n-cpu horizon

Nova Compute Nodes Nova Compute Nodes Nova Controller 1

n-sched n-cond n-api n-net n-cpu horizon

Nova Controller 5 n-sched

n-cond n-api n-net n-cpu horizon

Nova Controller 4 and compute node

n-sched n-cond n-api n-net n-cpu horizon

Nova Compute Node Site 1 Site 2 Site 3 Site 4

slide-144
SLIDE 144

/39

Experiments

  • Experiments have been conducted on Grid’5000
  • Mono-site experiments 


⟹ Evaluate the overhead of using ROME/Redis and the network impact.

  • Multi-site experiments


⟹ Determine the impact of latency.
 ⟹ Validate compatibility with higher 
 level mechanisms validation

54

www.grid5000.fr
 1500 servers, spread across 10 sites Full admin rights

slide-145
SLIDE 145

/39

Mono-Site Experiments

55

MySQL/SQLAlchemy ROME/Redis

  • Creation of 500 VMs
  • Comparison MySQL/SQLAlchemy vs ROME/Redis 


(one dedicated node for the DB server/the REDIS server)

slide-146
SLIDE 146

56

  • Evaluate the overhead of using ROME/Redis
  • ROME stores objects in a JSON format: serialization/deserialization cost
  • ROME reimplements some mechanisms: join, transaction/session, …

ROME requests are faster for 80% of requests SQLAlchemy is faster for 20% of requests

Mono-Site Experiments

slide-147
SLIDE 147

57

Mono-Site Experiments

  • Evaluate the overhead of using ROME/Redis
  • ROME stores objects in a JSON format: serialization/deserialization cost
  • ROME reimplements some mechanisms: join, transaction/session, …
slide-148
SLIDE 148

58

ROME+Redis SQLAlchemy+MySQL

Multi-site Experiments

  • Creation of 500 VMs, fairly distributed on each controller
  • From 2 to 8 sites (emulation of virtual clusters by adding latency thanks to TC)
  • Each cluster was containing 1 controller, 6 compute nodes 


(and 1 dedicated node in the case of REDIS).

  • MySQL and Redis used in the default configuration
  • To fairly compare with MySQL, data replication was not activated in Redis
  • Galera experiments have been performed but due to reproducible issues with

more than 4 sites, results are not satisfactory enough to be discussed (RR available on demand)

slide-149
SLIDE 149

Multi-Site Experiments

59

(one SQL server for
 the whole infrastructure)

SQL scalability 
 bottleneck

Increasing the nb of nodes leads to better reactivity From 8 clusters, MySQL becomes a bottleneck

slide-150
SLIDE 150

60

  • Asses the usage of advanced OpenStack feature:


host-aggregates / availability zones

  • As we targeted a low-level component, ROME is compatible with

most of the existing features.

  • Performance is not impacted (same order of magnitude)
  • VM Repartition is correctly achieved 


(without availability zones the distribution was respectively 26%, 20%, 22%, 32% of the created VMs for a 4 clusters experiments).

Geographical Site1 Geographical Site2 Geographical Site3 IaaS IaaS IaaS Iaas IaaS IaaS IaaS IaaS IaaS IaaS IaaS IaaS IaaS IaaS IaaS IaaS IaaS IaaS

Availability
 Zone 1 Availability
 Zone 2 Availability
 Zone 3

Can we go beyond a research POC ?

Compatibility with Higher Level Features

slide-151
SLIDE 151

60

  • Asses the usage of advanced OpenStack feature:


host-aggregates / availability zones

  • As we targeted a low-level component, ROME is compatible with

most of the existing features.

  • Performance is not impacted (same order of magnitude)
  • VM Repartition is correctly achieved 


(without availability zones the distribution was respectively 26%, 20%, 22%, 32% of the created VMs for a 4 clusters experiments).

Geographical Site1 Geographical Site2 Geographical Site3 IaaS IaaS IaaS Iaas IaaS IaaS IaaS IaaS IaaS IaaS IaaS IaaS IaaS IaaS IaaS IaaS IaaS IaaS

Availability
 Zone 1 Availability
 Zone 2 Availability
 Zone 3

Can we go beyond a research POC ?

Compatibility with Higher Level Features

Interesting but …just the top of the Iceberg ! Glance, Neutron, Cinder…? Scalability of the AMQP bus? HA?
 Reify locality aspects at every level of the stack?

slide-152
SLIDE 152

STACK Proposal

ASCOLA Follow-up

slide-153
SLIDE 153

Tomorrow STACK ?

62

  • Designing a tightly-coupled software stack to operate and use massively geo-distributed

ICT infrastructures.

  • Delivering appropriate system abstractions, from low (system) to high-levels (applications),

and by addressing cross cutting dimensions such as energy or security, to operate massively geo-distributed infrastructures

slide-154
SLIDE 154

Tomorrow STACK ?

62

Compute

Storage

Networking

  • Designing a tightly-coupled software stack to operate and use massively geo-distributed

ICT infrastructures.

  • Delivering appropriate system abstractions, from low (system) to high-levels (applications),

and by addressing cross cutting dimensions such as energy or security, to operate massively geo-distributed infrastructures

slide-155
SLIDE 155

Tomorrow STACK ?

62

Compute

Storage

Networking

Building blocks

  • Designing a tightly-coupled software stack to operate and use massively geo-distributed

ICT infrastructures.

  • Delivering appropriate system abstractions, from low (system) to high-levels (applications),

and by addressing cross cutting dimensions such as energy or security, to operate massively geo-distributed infrastructures

slide-156
SLIDE 156

Tomorrow STACK ?

62

Compute

Storage

Networking Application management
 Programming Model/API, deployment and reconfiguration Engines (self*) Resource Management
 Capacity Planning / Deployment and reconfiguration

Building blocks

  • Designing a tightly-coupled software stack to operate and use massively geo-distributed

ICT infrastructures.

  • Delivering appropriate system abstractions, from low (system) to high-levels (applications),

and by addressing cross cutting dimensions such as energy or security, to operate massively geo-distributed infrastructures

slide-157
SLIDE 157

Tomorrow STACK ?

62

security / energy Compute

Storage

Networking Application management
 Programming Model/API, deployment and reconfiguration Engines (self*) Resource Management
 Capacity Planning / Deployment and reconfiguration

Building blocks

  • Designing a tightly-coupled software stack to operate and use massively geo-distributed

ICT infrastructures.

  • Delivering appropriate system abstractions, from low (system) to high-levels (applications),

and by addressing cross cutting dimensions such as energy or security, to operate massively geo-distributed infrastructures

slide-158
SLIDE 158

Tomorrow STACK ?

62

security / energy Compute

Storage

Networking Application management
 Programming Model/API, deployment and reconfiguration Engines (self*) Resource Management
 Capacity Planning / Deployment and reconfiguration

Building blocks

General: Revising such a stack to deal with geo-distributed constraints/opportunities

OpenStack
 3Millions LoC (core components) 20Millions overall

  • Designing a tightly-coupled software stack to operate and use massively geo-distributed

ICT infrastructures.

  • Delivering appropriate system abstractions, from low (system) to high-levels (applications),

and by addressing cross cutting dimensions such as energy or security, to operate massively geo-distributed infrastructures

slide-159
SLIDE 159

security / energy

Compute Storage Networking

App Mgmt - Programming Model/API, deployment and reconfiguration Engines (self*) Resource Mgmt - Capacity Planning / Deployment and reconfiguration

Building block

security / energy

Compute Storage Networking

App Mgmt - Programming Model/API, deployment and reconfiguration Engines (self*) Resource Mgmt - Capacity Planning / Deployment and reconfiguration

Building block

STACK Proposal

security / energy

Compute Storage Networking

App Mgmt - Programming Model/API, deployment and reconfiguration Engines (self*) Resource Mgmt - Capacity Planning / Deployment and reconfiguration

Building block
slide-160
SLIDE 160

security / energy

Compute Storage Networking

App Mgmt - Programming Model/API, deployment and reconfiguration Engines (self*) Resource Mgmt - Capacity Planning / Deployment and reconfiguration

Building block

security / energy

Compute Storage Networking

App Mgmt - Programming Model/API, deployment and reconfiguration Engines (self*) Resource Mgmt - Capacity Planning / Deployment and reconfiguration

Building block

STACK Proposal

security / energy

Compute Storage Networking

App Mgmt - Programming Model/API, deployment and reconfiguration Engines (self*) Resource Mgmt - Capacity Planning / Deployment and reconfiguration

Building block

Building blocks

slide-161
SLIDE 161

security / energy

Compute Storage Networking

App Mgmt - Programming Model/API, deployment and reconfiguration Engines (self*) Resource Mgmt - Capacity Planning / Deployment and reconfiguration

Building block

security / energy

Compute Storage Networking

App Mgmt - Programming Model/API, deployment and reconfiguration Engines (self*) Resource Mgmt - Capacity Planning / Deployment and reconfiguration

Building block

STACK Proposal

security / energy

Compute Storage Networking

App Mgmt - Programming Model/API, deployment and reconfiguration Engines (self*) Resource Mgmt - Capacity Planning / Deployment and reconfiguration

Building block

Building blocks

Resource management

slide-162
SLIDE 162

security / energy

Compute Storage Networking

App Mgmt - Programming Model/API, deployment and reconfiguration Engines (self*) Resource Mgmt - Capacity Planning / Deployment and reconfiguration

Building block

security / energy

Compute Storage Networking

App Mgmt - Programming Model/API, deployment and reconfiguration Engines (self*) Resource Mgmt - Capacity Planning / Deployment and reconfiguration

Building block

STACK Proposal

security / energy

Compute Storage Networking

App Mgmt - Programming Model/API, deployment and reconfiguration Engines (self*) Resource Mgmt - Capacity Planning / Deployment and reconfiguration

Building block

Building blocks

Resource management

Application management

slide-163
SLIDE 163

security / energy

Compute Storage Networking

App Mgmt - Programming Model/API, deployment and reconfiguration Engines (self*) Resource Mgmt - Capacity Planning / Deployment and reconfiguration

Building block

security / energy

Compute Storage Networking

App Mgmt - Programming Model/API, deployment and reconfiguration Engines (self*) Resource Mgmt - Capacity Planning / Deployment and reconfiguration

Building block

STACK Proposal

security / energy

Compute Storage Networking

App Mgmt - Programming Model/API, deployment and reconfiguration Engines (self*) Resource Mgmt - Capacity Planning / Deployment and reconfiguration

Building block

Building blocks

Scalability

Efficiency / Security / Reliability

Resource management

Application management

slide-164
SLIDE 164

security / energy

Compute Storage Networking

App Mgmt - Programming Model/API, deployment and reconfiguration Engines (self*) Resource Mgmt - Capacity Planning / Deployment and reconfiguration

Building block

security / energy

Compute Storage Networking

App Mgmt - Programming Model/API, deployment and reconfiguration Engines (self*) Resource Mgmt - Capacity Planning / Deployment and reconfiguration

Building block

STACK Proposal

security / energy

Compute Storage Networking

App Mgmt - Programming Model/API, deployment and reconfiguration Engines (self*) Resource Mgmt - Capacity Planning / Deployment and reconfiguration

Building block

Building blocks

Scalability

Efficiency / Security / Reliability

Resource management

‘’Just’’ a distributed resource management system?

Application management

Synergy

slide-165
SLIDE 165
  • Leverage ‘’green’’ energy (solar, wind turbines...)


Transfer the green micro/nano DCs concept to the network PoP
 Take the advantage of the geographical distribution


Energy Dimension

  • From sustainable data centers to a new source of energy


A promising way to deliver highly efficient and sustainable UC services
 is to provide UC platforms as close as possible to the end-users and to...


64

  • Leveraging the data furnaces concept

Deploy UC servers in medium and large institutions 
 and use them as sources of heat inside public 
 buildings such as hospitals or universities

http://parasol.cs.rutgers.edu https://www.qarnot.com

C P E R S e D u C e A N R G R E C O

slide-166
SLIDE 166

65

Challenges & Foundations

  • Challenges


Identify and revise core mechanisms/algorithms to enable scalability, distribution… while taking account fog/edge specifics.


Extend API and software programming abstractions (high level) and identify missing mechanisms (low-level) to benefit from geo- distribution opportunities.


Tightly coupled : synergy between all mechanisms composing the system.

  • Foundations


Distributed systems
 Software programming (Component-based model, DSL, composition)
 Self-* mechanisms
 Performance evaluations (experiment driven research)

slide-167
SLIDE 167

66

STACK Proposal

  • STACK is a new research group focusing on challenges related to

the management and advanced usages of Utility Computing infrastructures (i.e., Cloud, Fog, Edge, and beyond). 
 
 The team is interested by delivering appropriate system abstractions, from low (system) to high-levels (applications), and by addressing cross cutting dimensions such as energy or security, to

  • perate massively geo-distributed infrastructures
slide-168
SLIDE 168

66

D e p l

  • y

m e n t / R e c

  • n

fi g u r a t i

  • n

STACK Proposal

  • STACK is a new research group focusing on challenges related to

the management and advanced usages of Utility Computing infrastructures (i.e., Cloud, Fog, Edge, and beyond). 
 
 The team is interested by delivering appropriate system abstractions, from low (system) to high-levels (applications), and by addressing cross cutting dimensions such as energy or security, to

  • perate massively geo-distributed infrastructures
slide-169
SLIDE 169

66

D e p l

  • y

m e n t / R e c

  • n

fi g u r a t i

  • n

S t

  • r

a g e / B i g D a t a

STACK Proposal

  • STACK is a new research group focusing on challenges related to

the management and advanced usages of Utility Computing infrastructures (i.e., Cloud, Fog, Edge, and beyond). 
 
 The team is interested by delivering appropriate system abstractions, from low (system) to high-levels (applications), and by addressing cross cutting dimensions such as energy or security, to

  • perate massively geo-distributed infrastructures
slide-170
SLIDE 170

66

D e p l

  • y

m e n t / R e c

  • n

fi g u r a t i

  • n

S t

  • r

a g e / B i g D a t a L a r g e

  • s

c a l e p l a t f

  • r

m s m a n a g e m e n t

STACK Proposal

  • STACK is a new research group focusing on challenges related to

the management and advanced usages of Utility Computing infrastructures (i.e., Cloud, Fog, Edge, and beyond). 
 
 The team is interested by delivering appropriate system abstractions, from low (system) to high-levels (applications), and by addressing cross cutting dimensions such as energy or security, to

  • perate massively geo-distributed infrastructures
slide-171
SLIDE 171

66

D e p l

  • y

m e n t / R e c

  • n

fi g u r a t i

  • n

S t

  • r

a g e / B i g D a t a L a r g e

  • s

c a l e p l a t f

  • r

m s m a n a g e m e n t A p p l i c a t i

  • n

l i f e

  • c

y c l e m a n a g e m e n t

STACK Proposal

  • STACK is a new research group focusing on challenges related to

the management and advanced usages of Utility Computing infrastructures (i.e., Cloud, Fog, Edge, and beyond). 
 
 The team is interested by delivering appropriate system abstractions, from low (system) to high-levels (applications), and by addressing cross cutting dimensions such as energy or security, to

  • perate massively geo-distributed infrastructures
slide-172
SLIDE 172

66

D e p l

  • y

m e n t / R e c

  • n

fi g u r a t i

  • n

S t

  • r

a g e / B i g D a t a L a r g e

  • s

c a l e p l a t f

  • r

m s m a n a g e m e n t A p p l i c a t i

  • n

l i f e

  • c

y c l e m a n a g e m e n t E n e r g y

STACK Proposal

  • STACK is a new research group focusing on challenges related to

the management and advanced usages of Utility Computing infrastructures (i.e., Cloud, Fog, Edge, and beyond). 
 
 The team is interested by delivering appropriate system abstractions, from low (system) to high-levels (applications), and by addressing cross cutting dimensions such as energy or security, to

  • perate massively geo-distributed infrastructures
slide-173
SLIDE 173

66

D e p l

  • y

m e n t / R e c

  • n

fi g u r a t i

  • n

S t

  • r

a g e / B i g D a t a L a r g e

  • s

c a l e p l a t f

  • r

m s m a n a g e m e n t A p p l i c a t i

  • n

l i f e

  • c

y c l e m a n a g e m e n t E n e r g y S

  • f

t w a r e p r

  • g

r a m m i n g m

  • d

e l s

STACK Proposal

  • STACK is a new research group focusing on challenges related to

the management and advanced usages of Utility Computing infrastructures (i.e., Cloud, Fog, Edge, and beyond). 
 
 The team is interested by delivering appropriate system abstractions, from low (system) to high-levels (applications), and by addressing cross cutting dimensions such as energy or security, to

  • perate massively geo-distributed infrastructures
slide-174
SLIDE 174

66

D e p l

  • y

m e n t / R e c

  • n

fi g u r a t i

  • n

S t

  • r

a g e / B i g D a t a L a r g e

  • s

c a l e p l a t f

  • r

m s m a n a g e m e n t A p p l i c a t i

  • n

l i f e

  • c

y c l e m a n a g e m e n t E n e r g y S

  • f

t w a r e p r

  • g

r a m m i n g m

  • d

e l s S e c u r i t y

STACK Proposal

  • STACK is a new research group focusing on challenges related to

the management and advanced usages of Utility Computing infrastructures (i.e., Cloud, Fog, Edge, and beyond). 
 
 The team is interested by delivering appropriate system abstractions, from low (system) to high-levels (applications), and by addressing cross cutting dimensions such as energy or security, to

  • perate massively geo-distributed infrastructures
slide-175
SLIDE 175

STACK - Positioning

  • Inria
  • Several research groups address challenges related large-scale

infrastructures (ASAP, AVALON, DATAMOVE, KERDATA…)

  • Stack members are used to collaborate/work with
  • AVALON, MYRIADS, KERDATA, MADYNES, REGAL, CTRL-A (IPL

Discovery, SCALUS/BigStorage EU MCITN, EPOC CominLabs, ANR MyCloud, I/O Lab…)

  • Nantes

Historical links with TASC/AtlanModels
 CRE Orange Entreprise du future (Helene Coullon)
 Complementarities/Collaborations opportunities with GDD, STR, RESTO/RIO (co-supervision of a Phd on Fog/Edge storage backends)

67

slide-176
SLIDE 176
  • France
  • ERODS-LIG (Distributed systems), SEPIA-IRIT (Energy)

  • International
  • Pr. Erik Elmroth, Umea University, (Control, Autonomous mechanisms)
  • Prof. Mira Mezini, TU Darmstad, Germany (Software Defined XXX).
  • Ass. Pr. Paolo Bellavista, University of Bologna (Mobile Computing)
  • Pr. Manish Parashar, Rutgers University (Fog/Edge + Energy)

  • Pr. Weisong Shi, Wayne State University. (Edge + Mobile Computing)
  • Pr. Hai Jin, Huazhong University (I/O + Virtualization)

STACK - Positioning

68

slide-177
SLIDE 177
  • Resource management of virtualised distributed systems
  • VM Placements


Entropy (Energy) [VEE09]
 DVMS (Scalability) [FGCS12]
 VMPlaceS (SimGrid) [TCC15, Europar16]

  • Synergy between applications and the resource manager


Autonomic models for Cloud Computing applications [IGI12, Closer17]
 DSLs (Quality of Service, Elasticity) [FGCS16, SAC17]
 Proportional Energy (use of renewable energy) [Computing17]

  • Security


Compose security/privacy mechanisms [CloudCom13, RATSP15]

  • Distributed Clouds


FSN HOSANNA (2015-2017)
 Inria Project Lab DISCOVERY (2015-2019)


69

Looking Back…

EU EU

slide-178
SLIDE 178
  • Projects


Ongoing


IPL Discovery (2015-2019)
 EU BigStorage (2015-2018)
 CPER SeDuCE (2016-2020)
 CominLabs PrivGen (2016-2019)
 ANR GRECO (2017-2020) 
 FSN/PIA Hydda HPC/Cloud between distinct sites (2017-2020). 


Proposal
 SILECS (TGIR/ESFRI, national/EU consortium)

  • Animation


GDR RSD: co-chair of the Virtualisation action (CloudDays,ResCom17)


IEEE ICFEC conference 
 Chairs of Fog/Edge related tracks in EuroPar’16, CloudCom’16/’17
 Chair of the Massively Distributed WG - OpenStack.

70

… to the Future