From Static Scheduling Towards Understanding Uncertainty Andrei - - PowerPoint PPT Presentation

from static scheduling
SMART_READER_LITE
LIVE PREVIEW

From Static Scheduling Towards Understanding Uncertainty Andrei - - PowerPoint PPT Presentation

From Static Scheduling Towards Understanding Uncertainty Andrei Tchernykh CICESE Research Center , Ensenada, Baja California, Mxico chernykh@cicese.mx http://usuario.cicese.mx/~chernykh/ Algorithms and Scheduling Techniques to Manage


slide-1
SLIDE 1

Algorithms and Scheduling Techniques to Manage Resilience and Power Consumption in Distributed Systems

Andrei Tchernykh

CICESE Research Center, Ensenada, Baja California, México

chernykh@cicese.mx

http://usuario.cicese.mx/~chernykh/

Dagstuhl – July 7, 2015

From Static Scheduling

Towards Understanding Uncertainty

slide-2
SLIDE 2

Baja California, México

slide-3
SLIDE 3

Ensenada, Baja California, México

slide-4
SLIDE 4
slide-5
SLIDE 5

Sonora

Yaqui deer dancer

slide-6
SLIDE 6

Research Areas HPC Cloud Computing

Scheduling Resource optimization

  • nline
  • ffline

Real Time Systems Grid Computing

Knowledge Free Scheduling with Uncertainty Multiobjective Optimization Computational Intelligence

List Scheduling Stealing Scheduling with Service Levels

Approximation Algorithms Workflow Scheduling

slide-7
SLIDE 7

Collaboration

Mexico USA Uruguay Spain France Russia Luxembourg Germany

Universidad Autónoma de Baja California Universidad Autónoma de Nuevo León Tecnológico de Monterrey Instituto Tecnológico de Morelia Centro de Estudios Superiores del Estado de Sonora

Dortmund University

  • Prof. Uwe Schwiegelshohn

University of Göttingen

  • Prof. Ramin Yahyapour

Institute for System Programming, RAS

  • Prof. Arutyun Avetisyan
  • Prof. Nikolay Kuzurin

Moscow Institute of Physics and Technology

  • Prof. Alexander Drozdov

Institute of Informatics and Applied Mathematics of Grenoble

  • Prof. Denis Trystram

INRIA Lille - Nord Europe

  • Prof. El-ghazali Talbi

University of Notre Dame

  • Dr. Jarek Nabrzyski

University of California – Irvine, CA, USA

  • Prof. Isaac Scherson,
  • Prof. Jean Luc Gaudiot

University of Luxembourg

  • Prof. Pascal Bouvry
  • Dr. Dzmitry Kliazovich

Universidad de la República

  • Dr. Sergio Nesmachnow

BSC

  • Prof. Vassil Alexandrov
slide-8
SLIDE 8

CICESE Parallel Computing Laboratory

Team

8

slide-9
SLIDE 9

9 CICESE Parallel Computing Laboratory

Towards Understanding Uncertainty in Cloud Computing Resource Provisioning

Andrei Tchernykh CICESE Research Center, Mexico Uwe Schwiegelshohn University of Dortmund, Germany El-ghazali Talbi University of Lille, France Vassil Alexandrov Barcelona Supercomputing Centre, Spain

ICCS-SPU 2015. Procedia Computer Science, Elsevier, 2015

slide-10
SLIDE 10

10 CICESE Parallel Computing Laboratory

Uncertainty

Can be classified in several different ways according to their nature:

1. Long-term uncertainty is due to the object is poorly understood and inadvertent factors can influence its behavior. 2. Retrospective uncertainty is due to the lack of information about the behavior of the object in the past. 3. Technical uncertainty is a consequence of the impossibility of predicting the exact results of decisions 4. Stochastic uncertainty is a result of probabilistic (stochastic) nature of the studied processes and phenomena.

  • there is a reliable statistical information;
  • statistical information is not available;
  • hypothesis on the stochastic nature requires verification.

Tychinsky 2006

slide-11
SLIDE 11

11 CICESE Parallel Computing Laboratory

Uncertainty

5. Constraint uncertainty

  • partial
  • r

complete ignorance

  • f

the conditions. 6. Participant uncertainty

  • conflict
  • f

main stakeholders: cloud providers, users and administrators.

  • wn preferences, incomplete, inaccurate information about the

motives and behavior of opposing parties. 7. Goal uncertainty

  • inability to select one goal
  • conflicts in building multi objective optimization model.
  • competing interests

8. Condition uncertainty occurs when a failure or a complete lack of information about the conditions under which decisions are made.

slide-12
SLIDE 12

12 CICESE Parallel Computing Laboratory

Uncertainty

9. Action uncertainty occurs when there is no ambiguity when choosing solutions.

  • Single objective case
  • determine the best solution among all feasible ones;
  • In multiple objective case,
  • there exists a (possibly infinite) number of Pareto optimal

solutions.

  • There is the problem of finding a good element of this set.
slide-13
SLIDE 13

13 CICESE Parallel Computing Laboratory

Uncertainty

Can be grouped into: parameter (parametric) uncertainties 1. arise from the incomplete knowledge and variation

  • f

the parameters 2. estimated using statistical techniques system uncertainties. 1. arise from an incomplete understanding of the processes that control service provisioning 2. incomplete information about a system

slide-14
SLIDE 14

14 CICESE Parallel Computing Laboratory

Services and resources are subject to considerable uncertainty during provisioning. Uncertainty brings additional challenges to

  • End-users
  • Resource providers
  • Brokering

It requires

  • waiving habitual computing paradigms
  • adapting current computing models
  • designing

novel resource management strategies to handle uncertainty in an effective way

Uncertainty in Clouds

The question is: How to deliver scalable and robust cloud behavior under uncertainties and specific constraints, such as budgets, QoS, SLA, energy costs; etc.

slide-15
SLIDE 15

15 CICESE Parallel Computing Laboratory

  • dynamic elasticity
  • dynamic performance changing
  • virtualization, loosely coupling application to the infrastructure
  • resource provisioning time variation
  • inaccuracy of application runtimes, variation of processing times
  • variation in data transmission, variable data streams,
  • release time and workload uncertainty
  • effective bandwidth variation,

and other phenomenon.

Sources of uncertainty

  • workload is not predictable and can be changed dramatically
  • performance can be changed due to sharing of common resources

with other VM

slide-16
SLIDE 16

16 CICESE Parallel Computing Laboratory

Providers might not know the

  • Quantity of transmitted data
  • Amount of computation

Example: Every time when a user requires a status of his e-mail or bank account, it could generate

  • different amount of data and
  • take different time for delivering.

Sources of uncertainty

slide-17
SLIDE 17

17 CICESE Parallel Computing Laboratory

It is impossible to get exact knowledge about the system. Parameters such as

  • effective processor speed,
  • number of available processors,
  • actual bandwidth

are changing over the time. Topology is unknown In general, an execution environment will differ for each program/service invocation.

Sources of uncertainty

slide-18
SLIDE 18

18 CICESE Parallel Computing Laboratory

Sources of uncertainty Data (volume, variety, value) Virtualization Jobs arrival Migration Energy minimization Fault tolerance Scalability Cost (dynamic pricing) Resource availability Elasticity Consolidation Communication Replication Cloud infrastructure Resource provisioning time Cloud computing parameters Effective performance

  • Effective bandwidth
  • Processing time
  • Available memory
  • Number of processors
  • Available storage
  • Data transfer time
  • Resource capacity
  • Network capacity
  • Source of uncertainty
slide-19
SLIDE 19

19 CICESE Parallel Computing Laboratory

Approaches

To treat uncertainly and dynamism we need sophisticated solutions.

  • Fuzzy,
  • Robust,
  • Non-clairvoyant
  • Knowledge-free
  • Stochastic
  • Randomized algorithms
  • Dynamic priority
  • Adaptive strategies (reactive)
  • Dynamic load balancing
slide-20
SLIDE 20

Preliminary results

slide-21
SLIDE 21

Scheduling for Cloud Computing with Different Service Levels

IPDPS 2012, IEEE 26th International Parallel and Distributed Processing Symposium

Uwe Schwiegelshohn University of Dortmund, Germany Andrei Tchernykh CICESE Research Center, Mexico

slide-22
SLIDE 22

Quality of Service

CICESE Parallel Computing Laboratory 22

Deadline Service Level (slack factor) Execution time Profit

 Response time in relation to the requested processing time

price per time unit

slide-23
SLIDE 23

Competitive Factor Obtained Income Optimal income

CICESE Parallel Computing Laboratory 23

Competitive Factor

slide-24
SLIDE 24

Competitive Factor

CICESE Parallel Computing Laboratory 24

SSL-SM

𝝇 ≤ 𝟐 − (𝟐 −

𝒒𝒏𝒋𝒐 𝒒𝒏𝒃𝒚) 𝟐 𝒈 SSL-MM

𝝇 ≤ 𝒈 𝟐 + 𝒈(𝟐 − 𝒒𝒏𝒋𝒐 𝒒𝒏𝒃𝒚 )

Das Gupta and Palis, 2001 Schwiegelshohn,Tchernykh 2012

slide-25
SLIDE 25

Competitive Factor

CICESE Parallel Computing Laboratory 25

𝝇 ≤ 𝒏𝒃𝒚{ 𝒒𝒏𝒋𝒐 𝒒𝒏𝒃𝒚 𝒈𝑱 − 𝟐 , 𝒈𝑱 − 𝟐 + 𝒒𝒏𝒋𝒐 𝒒𝒏𝒃𝒚 𝒈𝑱 − 𝟐 + 𝒗𝑱 𝒗𝑱𝑱

MSL-SM MSL-MM

𝝇 ≤ 𝒗𝑱𝑱 𝒗𝑱 (𝟐 − 𝟐 𝒈𝑱 )

Schwiegelshohn,Tchernykh 2012

slide-26
SLIDE 26

On-line Scheduling in Distributed Systems Multiple strip packing

Job Stealing

non-clairvoyant

Uwe Schwiegelshohn University of Dortmund, Germany Andrei Tchernykh CICESE Research Center, Mexico Ramin Yahyapour University of Göttingen, Germany

IEEE IPDPS 200ß

slide-27
SLIDE 27

Any machine applies a priority order when selecting jobs for execution: Jobs of its group A Jobs of its group B Jobs that are enabled for execution on its previous machine.

CICESE Parallel Computing Laboratory 27

Grid Scheduling Algorithm

slide-28
SLIDE 28
  • Theoretical evaluation

– Cmax(LIST)/Cmax* < 3 in the offline case – Cmax(LIST)/Cmax* < 5 in the online case

CICESE Parallel Computing Laboratory 28

Performance of the Algorithm

IEEE IPDPS’08, 2008

(Klaus Jansen, Denis Trystram et. al…) 5/2, 2 + ε, 2 –approximations

Improved by …

slide-29
SLIDE 29

On-line Scheduling in Distributed Systems Multiple strip packing

Adaptive Admissible Allocation

Future Generation Computer Systems 2012 Journal of Scheduling, 2010

Andrei Tchernykh CICESE Research Center José Luis González-García Mexico Vanessa Miranda-López Uwe Schwiegelshohn University of Dortmund Germany Ramin Yahyapour University of Göttingen Germany

slide-30
SLIDE 30

30 CICESE Parallel Computing Laboratory

… m1 m2 m3 m4 m5 mm first(Jj) = 2 last(Jj) = m M-available M-admis last(Jj) = 5

If last is the minimum r such that

 

 

m j first i i r j first i i

j j

m a m

) ( ) (

Allocation

slide-31
SLIDE 31

31 CICESE Parallel Computing Laboratory

For a set of machines with identical processors, and for a set of rigid jobs with admissible range Approximation factor (off-line) Min_LB-a + Best_PS

1   a

Adaptive optimization

           ) 1 ( 2 1 2 1

2

a a a 

m f r f m f r f

m m a p m m a p

, , , ,

ara ara              ) 1 ( 2 3 2 3

2

a a a 

m f r f m f r f

m m a p m m a p

, , , ,

ara ara  

Competitive factor (on-line) Min_LB-a + Best_PS

Tchernykh, et al 2012 Future Generation Computer Systems, Elsevier Tchernykh, et al 2010 Journal of Scheduling, Springer

slide-32
SLIDE 32

Time Cmax(LIST)=4

CICESE Parallel Computing Laboratory 32

List Scheduling in the Grid

Machines with different numbers of processors

a=1

100% 100%

slide-33
SLIDE 33

Time Machines with different numbers of processors Cmax(LIST)=2

CICESE Parallel Computing Laboratory 33

Admissible Allocation in Grid

a=0.5 a=1

slide-34
SLIDE 34

Theoretical Evaluation

34 CICESE Parallel Computing Laboratory

Grid scheduling

On-line

No clarivoyant Different machine sizes

Off-line

No clarivoyant Clarivoyant Equal machine sizes

Different machine sizes

(Schwiegelshon et al. 2008) 3--approximation (Pascual et al. 2008) 4--approximation

(Klaus Jansen, Denis Trystram) 5/2, 2 + ε, 2 –approximation (Zhuk et al. 2004) 10--approximation (Tchernykh et al. 2005) 10—approximation (Tchernykh et al. 2012) 3—approximation

(Tchernykh et al. 2008) 5-competitive (Tchernykh et al. 2010) 17-competitive (Schwiegelshon 2010) (2e+1)-competitive (Tchernykh et al. 2012) 5-competitive

  • Future Generation Computer Systems, Elsevier
  • Journal of Scheduling, Springer
  • Discrete Applied Mathematics, Elsevier
  • Tran Fund Elec, Comm. & Comp. Science, IEICE
  • Parallel and Distributed Processing, IEEE
  • Computers & Industrial Engineering, Elsevier
slide-35
SLIDE 35

Job Allocation Strategies with User Run Time Estimates

Journal of Grid Computing , Springer, 2011

Juan Manuel Ramírez Andrei Tchernykh CICESE Research Center José Luis González Mexico Adán Hirales-Carbajal Uwe Schwiegelshohn University of Dortmund Germany Ramin Yahyapour University of Göttingen Germany

slide-36
SLIDE 36

Multiple Workflow Scheduling Strategies with User Run Time Estimates

  • n a Grid

Journal of Grid Computing , Springer, 2012

Adán Hirales-Carbajal Andrei Tchernykh CICESE Research Center José Luis González Mexico Juan Manuel Ramírez Thomas Röblitz University of Dortmund Germany Ramin Yahyapour University of Göttingen Germany

slide-37
SLIDE 37

Adaptive Resource Allocation in Computational Grids with Runtime Uncertainty

Andrei Tchernykh CICESE Research Center Raul Ramírez-Velarde Tecnológico de Monterrey Carlos Barba-Jimenez México Juan Nolazco Adán Hirales-Carbajal CETYS University, Mexico

Model uses notion of

  • heavy-tails
  • self-similarity

for the predictability of run-time estimate

slide-38
SLIDE 38

Energy-Aware Online Scheduling: Ensuring Quality of Service for IaaS Clouds

Andrei Tchernykh CICESE Research Center, Mexico Luz Lozano Uwe Schwiegelshohn University of Dortmund, Germany Johnatan Pecero University of Luxembourg, Luxembourg Pascal Bouvry Sergio Nesmachnov Universidad de la República, Uruguay Alexander Yu. Drozdov Moscow Institute of Physics and Technology

Journal of Grid Computing , Springer, 2015

slide-39
SLIDE 39

Solution space, Pareto optimal solutions

CICESE Parallel Computing Laboratory 39

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 1 2 3 4 5 6 7 8 9 Income degradation Power consumption degradation

slide-40
SLIDE 40

Adaptive energy efficient scheduling in Peer-to-Peer desktop grids

Knowledge Free Scheduling

Future Generation Computer Systems. 2013

Andrei Tchernykh CICESE Research Center, Mexico Aritz Barrondo Johnatan E. Pecero University of Luxembourg, Luxembourg Elisa Schaeffer Universidad Autónoma de Nuevo León, Mexico

slide-41
SLIDE 41

41 CICESE Parallel Computing Laboratory

Work Queue with Replication (WQR)

Time Resources Time+Resources OurGrid, BOINC

SETI@home, folding@home, Rosetta@home Einstein@home, +50 projects

slide-42
SLIDE 42

A VoIP Service for Cloud Infrastructure

Andrei Tchernykh CICESE Research Center, Mexico Jorge Mario Cortez Johnatan E. Pecero Pascal Bouvry University of Luxembourg, Luxembourg Ana-Maria Simionovici Dzmitry Kliazovich Loic Didelot MIXvoip S.A. Luxembourg Denis Trystram Grenoble institute of Technology, France

slide-43
SLIDE 43

Problem

43 CICESE Parallel Computing Laboratory

Two objectives:

  • Provider cost optimization
  • Voice Quality

Bin-packing approach (well-known)

  • ne-dimensional, on-line
  • classic NP-hard optimization problem

The principal novelty

  • state of the bin is determined not only

by actions of the decision maker during item allocations,

  • but also by item completions after their

lifespan. Unlike in standard formulation,

  • bins are always open
  • dynamic
  • items in bins can be terminated (call

termination)

  • utilization can be changed
slide-44
SLIDE 44

Cloud Infrastructure Cost Optimization: to buy or to lease

Uwe Schwiegelshohn University of Dortmund Stephan Schlagkamp Germany Andrei Tchernykh CICESE Research Center Fermin Armenta Mexico

slide-45
SLIDE 45

Cloud Provider Cost

Our objective is to:

  • avoid overprovisioning
  • find the resource capacity of the private cloud
  • minimize total investment and leasing costs with respect to the

demand forecast

45 CICESE Parallel Computing Laboratory

slide-46
SLIDE 46

46 CICESE Parallel Computing Laboratory

Modeling applications with communications and uncertainty

Dzmitry Kliazovich Johnatan E. Pecero University of Luxembourg, Luxembourg Pascal Bouvry Andrei Tchernykh CICESE Research Center, Mexico Samee U. Khan North Dakota State University, U.S.A. Albert Y. Zomaya University of Sydney, Australia

  • IEEE CLOUD 2013 - IEEE 6th International Conference on Cloud Computing.
  • Journal of Grid Computing , Springer, 2015
slide-47
SLIDE 47

47 CICESE Parallel Computing Laboratory

Modeling Applications

  • Proposed CA-DAG:

Communication-Aware DAG model

– Two types of vertices: one for computing and one for communications – Edges define dependences between tasks and order of execution

  • Main advantage

– Allows separate resource allocation decisions, assigning processors to handle computing jobs and network resources for information transmissions

1 3 2 4

Communication task Computing task Ordinary edge

slide-48
SLIDE 48

Thanks for your attention!

Redmer Hoekstra