On Self-adaptive Resource Allocation through Reinforcement Learning - - PowerPoint PPT Presentation

on self adaptive resource allocation through
SMART_READER_LITE
LIVE PREVIEW

On Self-adaptive Resource Allocation through Reinforcement Learning - - PowerPoint PPT Presentation

On Self-adaptive Resource Allocation through Reinforcement Learning Jacopo Panerati , Filippo Sironi , Matteo Carminati , Martina Maggio , Giovanni Beltrame , Piotr J. Gmytrasiewicz , Donatella Sciuto and Marco D.


slide-1
SLIDE 1

On Self-adaptive Resource Allocation through Reinforcement Learning

Jacopo Panerati†, Filippo Sironi‡, Matteo Carminati‡, Martina Maggio§, Giovanni Beltrame†, Piotr J. Gmytrasiewicz¶, Donatella Sciuto‡ and Marco D. Santambrogio‡

†Polytechnique Montr´

eal, ‡Politecnico Milano, §Lund University, ¶University of Illinois Chicago

Politecnico di Torino - Turin, 25 June 2013

slide-2
SLIDE 2

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Rationale

Methodology

(1) Reinforcement Learning (RL).

Objective

(2) Self-adaptive Computing.

Research Question

Is RL a suitable approach for self-adaptive computing?

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

2/31 – mistlab.ca

slide-3
SLIDE 3

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Rationale

Methodology

(1) Reinforcement Learning (RL).

Objective

(2) Self-adaptive Computing.

Research Question

Is RL a suitable approach for self-adaptive computing?

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

2/31 – mistlab.ca

slide-4
SLIDE 4

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Rationale

Methodology

(1) Reinforcement Learning (RL).

Objective

(2) Self-adaptive Computing.

Research Question

Is RL a suitable approach for self-adaptive computing?

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

2/31 – mistlab.ca

slide-5
SLIDE 5

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Rationale

Methodology

(1) Reinforcement Learning (RL).

Objective

(2) Self-adaptive Computing.

Research Question

Is RL a suitable approach for self-adaptive computing?

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

2/31 – mistlab.ca

slide-6
SLIDE 6

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Rationale

Methodology

(1) Reinforcement Learning (RL).

Objective

(2) Self-adaptive Computing.

Research Question

Is RL a suitable approach for self-adaptive computing?

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

2/31 – mistlab.ca

slide-7
SLIDE 7

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Rationale

Methodology

(1) Reinforcement Learning (RL).

Objective

(2) Self-adaptive Computing.

Research Question

Is RL a suitable approach for self-adaptive computing?

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

2/31 – mistlab.ca

slide-8
SLIDE 8

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

A Typical Machine Learning Problem

Generic (Informal) Steps

  • given a (labelled or unlabelled) training set D ⊆ Rd
  • pick, from hypotheses set H, a function f ∶ Rd → R (or C)
  • such that, given a new data-point X ∈ Rd, f (X) is the actual label of X
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

3/31 – mistlab.ca

slide-9
SLIDE 9

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

A Typical Machine Learning Problem

Generic (Informal) Steps

  • given a (labelled or unlabelled) training set D ⊆ Rd
  • pick, from hypotheses set H, a function f ∶ Rd → R (or C)
  • such that, given a new data-point X ∈ Rd, f (X) is the actual label of X
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

3/31 – mistlab.ca

slide-10
SLIDE 10

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

A Typical Machine Learning Problem

Generic (Informal) Steps

  • given a (labelled or unlabelled) training set D ⊆ Rd
  • pick, from hypotheses set H, a function f ∶ Rd → R (or C)
  • such that, given a new data-point X ∈ Rd, f (X) is the actual label of X
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

3/31 – mistlab.ca

slide-11
SLIDE 11

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

A Typical Machine Learning Problem

Generic (Informal) Steps

  • given a (labelled or unlabelled) training set D ⊆ Rd
  • pick, from hypotheses set H, a function f ∶ Rd → R (or C)
  • such that, given a new data-point X ∈ Rd, f (X) is the actual label of X
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

3/31 – mistlab.ca

slide-12
SLIDE 12

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

A Typical Machine Learning Problem

Generic (Informal) Steps

  • given a (labelled or unlabelled) training set D ⊆ Rd
  • pick, from hypotheses set H, a function f ∶ Rd → R (or C)
  • such that, given a new data-point X ∈ Rd, f (X) is the actual label of X
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

3/31 – mistlab.ca

slide-13
SLIDE 13

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Machine Learning Methodologies

Supervised Learning

Classification Algorithms

when labels are known to belong to a finite set C

Regression Algorithms

when labels are known to belong to R Unsupervised Learning

Clustering Algorithms

when labels are unknown but their cardinality K is assumed be fixed

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

4/31 – mistlab.ca

slide-14
SLIDE 14

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Machine Learning Methodologies

Supervised Learning

Classification Algorithms

when labels are known to belong to a finite set C

Regression Algorithms

when labels are known to belong to R Unsupervised Learning

Clustering Algorithms

when labels are unknown but their cardinality K is assumed be fixed

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

4/31 – mistlab.ca

slide-15
SLIDE 15

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Machine Learning Methodologies

Supervised Learning

Classification Algorithms

when labels are known to belong to a finite set C

Regression Algorithms

when labels are known to belong to R Unsupervised Learning

Clustering Algorithms

when labels are unknown but their cardinality K is assumed be fixed

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

4/31 – mistlab.ca

slide-16
SLIDE 16

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Example of Classification Problem

Hand-Writing

Recognition of hand-written digits is a typical classifcation problem. Data-points are matrices of pixels (∈ Rd) and the label set C is {0,1,2,..,9}.

012456789 012456789012456789

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

5/31 – mistlab.ca

slide-17
SLIDE 17

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Example of Classification Problem

Hand-Writing

Recognition of hand-written digits is a typical classifcation problem. Data-points are matrices of pixels (∈ Rd) and the label set C is {0,1,2,..,9}.

012456789 012456789012456789

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

5/31 – mistlab.ca

slide-18
SLIDE 18

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Machine Learning Methodologies

Supervised Learning

Classification Algorithms

when labels are known to belong to a finite set C

Regression Algorithms

when labels are known to belong to R Unsupervised Learning

Clustering Algorithms

when labels are unknown but their cardinality K is assumed be fixed

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

6/31 – mistlab.ca

slide-19
SLIDE 19

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Machine Learning Methodologies

Supervised Learning

Classification Algorithms

when labels are known to belong to a finite set C

Regression Algorithms

when labels are known to belong to R Unsupervised Learning

Clustering Algorithms

when labels are unknown but their cardinality K is assumed be fixed

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

6/31 – mistlab.ca

slide-20
SLIDE 20

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Machine Learning Methodologies

Supervised Learning

Classification Algorithms

when labels are known to belong to a finite set C

Regression Algorithms

when labels are known to belong to R Unsupervised Learning

Clustering Algorithms

when labels are unknown but their cardinality K is assumed be fixed

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

6/31 – mistlab.ca

slide-21
SLIDE 21

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Machine Learning Methodologies

Supervised Learning

Classification Algorithms

when labels are known to belong to a finite set C

Regression Algorithms

when labels are known to belong to R Unsupervised Learning

Clustering Algorithms

when labels are unknown but their cardinality K is assumed be fixed

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

6/31 – mistlab.ca

slide-22
SLIDE 22

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Example of Clustering Problem

Space Exploration

Clustering algorithms can be used to identify patterns in remotely (e.g. in space) sensed data and improve the scientific return by sending to the ground station only statistically significant data [1].

1

1http://nssdc.gsfc.nasa.gov/

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

7/31 – mistlab.ca

slide-23
SLIDE 23

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Example of Clustering Problem

Space Exploration

Clustering algorithms can be used to identify patterns in remotely (e.g. in space) sensed data and improve the scientific return by sending to the ground station only statistically significant data [1].

1

1http://nssdc.gsfc.nasa.gov/

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

7/31 – mistlab.ca

slide-24
SLIDE 24

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Reinforcements in Behavioural Psychology

Definition

In behavioural psychology, reinforcement consists of the strengthening of a behaviour associated to a stimulus through its repetition.

Pioneers

B.F. Skinner (1904-1990), together with E. Thorndike (1874-1949), is considered to be

  • ne the fathers of current theories on reinforcement and conditioning [2].
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

8/31 – mistlab.ca

slide-25
SLIDE 25

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Reinforcements in Behavioural Psychology

Definition

In behavioural psychology, reinforcement consists of the strengthening of a behaviour associated to a stimulus through its repetition.

Pioneers

B.F. Skinner (1904-1990), together with E. Thorndike (1874-1949), is considered to be

  • ne the fathers of current theories on reinforcement and conditioning [2].
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

8/31 – mistlab.ca

slide-26
SLIDE 26

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Reinforcements in Behavioural Psychology

Definition

In behavioural psychology, reinforcement consists of the strengthening of a behaviour associated to a stimulus through its repetition.

Pioneers

B.F. Skinner (1904-1990), together with E. Thorndike (1874-1949), is considered to be

  • ne the fathers of current theories on reinforcement and conditioning [2].
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

8/31 – mistlab.ca

slide-27
SLIDE 27

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Reinforcements in Behavioural Psychology

Definition

In behavioural psychology, reinforcement consists of the strengthening of a behaviour associated to a stimulus through its repetition.

Pioneers

B.F. Skinner (1904-1990), together with E. Thorndike (1874-1949), is considered to be

  • ne the fathers of current theories on reinforcement and conditioning [2].
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

8/31 – mistlab.ca

slide-28
SLIDE 28

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Reinforcements in Behavioural Psychology

Definition

In behavioural psychology, reinforcement consists of the strengthening of a behaviour associated to a stimulus through its repetition.

Pioneers

B.F. Skinner (1904-1990), together with E. Thorndike (1874-1949), is considered to be

  • ne the fathers of current theories on reinforcement and conditioning [2].
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

8/31 – mistlab.ca

slide-29
SLIDE 29

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Reinforcements in Behavioural Psychology

Definition

In behavioural psychology, reinforcement consists of the strengthening of a behaviour associated to a stimulus through its repetition.

Pioneers

B.F. Skinner (1904-1990), together with E. Thorndike (1874-1949), is considered to be

  • ne the fathers of current theories on reinforcement and conditioning [2].
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

8/31 – mistlab.ca

slide-30
SLIDE 30

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Reinforcements in Behavioural Psychology

Definition

In behavioural psychology, reinforcement consists of the strengthening of a behaviour associated to a stimulus through its repetition.

Pioneers

B.F. Skinner (1904-1990), together with E. Thorndike (1874-1949), is considered to be

  • ne the fathers of current theories on reinforcement and conditioning [2].
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

8/31 – mistlab.ca

slide-31
SLIDE 31

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Reinforcements in Behavioural Psychology

Definition

In behavioural psychology, reinforcement consists of the strengthening of a behaviour associated to a stimulus through its repetition.

Pioneers

B.F. Skinner (1904-1990), together with E. Thorndike (1874-1949), is considered to be

  • ne the fathers of current theories on reinforcement and conditioning [2].
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

8/31 – mistlab.ca

slide-32
SLIDE 32

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Pavlov’s Dog

A precursor of Skinner theories

Ivan Pavlov (1849-1936) made conditioning famous with his experiment of drooling dogs.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

9/31 – mistlab.ca

slide-33
SLIDE 33

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Pavlov’s Dog

A precursor of Skinner theories

Ivan Pavlov (1849-1936) made conditioning famous with his experiment of drooling dogs.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

9/31 – mistlab.ca

slide-34
SLIDE 34

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

reinforcement learning in computer science is something a bit different both from supervised/unsupervised learning and reinforcements in behavioural psychology..

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

10/31 – mistlab.ca

slide-35
SLIDE 35

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (I)

Supervised/Unsupervised Machine Learning

data-point → label (or a cluster)

Reinforcements in Behavioural Psychology

stimulus → behaviour

Reinforcement Learning

state of the world → action

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

11/31 – mistlab.ca

slide-36
SLIDE 36

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (I)

Supervised/Unsupervised Machine Learning

data-point → label (or a cluster)

Reinforcements in Behavioural Psychology

stimulus → behaviour

Reinforcement Learning

state of the world → action

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

11/31 – mistlab.ca

slide-37
SLIDE 37

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (I)

Supervised/Unsupervised Machine Learning

data-point → label (or a cluster)

Reinforcements in Behavioural Psychology

stimulus → behaviour

Reinforcement Learning

state of the world → action

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

11/31 – mistlab.ca

slide-38
SLIDE 38

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (I)

Supervised/Unsupervised Machine Learning

data-point → label (or a cluster)

Reinforcements in Behavioural Psychology

stimulus → behaviour

Reinforcement Learning

state of the world → action

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

11/31 – mistlab.ca

slide-39
SLIDE 39

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (I)

Supervised/Unsupervised Machine Learning

data-point → label (or a cluster)

Reinforcements in Behavioural Psychology

stimulus → behaviour

Reinforcement Learning

state of the world → action

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

11/31 – mistlab.ca

slide-40
SLIDE 40

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (I)

Supervised/Unsupervised Machine Learning

data-point → label (or a cluster)

Reinforcements in Behavioural Psychology

stimulus → behaviour

Reinforcement Learning

state of the world → action

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

11/31 – mistlab.ca

slide-41
SLIDE 41

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (I)

Supervised/Unsupervised Machine Learning

data-point → label (or a cluster)

Reinforcements in Behavioural Psychology

stimulus → behaviour

Reinforcement Learning

state of the world → action

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

11/31 – mistlab.ca

slide-42
SLIDE 42

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (I)

Supervised/Unsupervised Machine Learning

data-point → label (or a cluster)

Reinforcements in Behavioural Psychology

stimulus → behaviour

Reinforcement Learning

state of the world → action

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

11/31 – mistlab.ca

slide-43
SLIDE 43

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (I)

Supervised/Unsupervised Machine Learning

data-point → label (or a cluster)

Reinforcements in Behavioural Psychology

stimulus → behaviour

Reinforcement Learning

state of the world → action

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

11/31 – mistlab.ca

slide-44
SLIDE 44

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (I)

Supervised/Unsupervised Machine Learning

data-point → label (or a cluster)

Reinforcements in Behavioural Psychology

stimulus → behaviour

Reinforcement Learning

state of the world → action

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

11/31 – mistlab.ca

slide-45
SLIDE 45

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (II)

Reinforcement Learning

state of the world → action → new state of the world → action → ..

Because the performance metric of RL (i.e., the collected rewardS)

is computed over time, solving a RL problem allows to make

  • planning
  • complex, sequential decisions
  • even counterintuitive decisions
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

12/31 – mistlab.ca

slide-46
SLIDE 46

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (II)

Reinforcement Learning

state of the world → action → new state of the world → action → ..

Because the performance metric of RL (i.e., the collected rewardS)

is computed over time, solving a RL problem allows to make

  • planning
  • complex, sequential decisions
  • even counterintuitive decisions
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

12/31 – mistlab.ca

slide-47
SLIDE 47

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (II)

Reinforcement Learning

state of the world → action → new state of the world → action → ..

Because the performance metric of RL (i.e., the collected rewardS)

is computed over time, solving a RL problem allows to make

  • planning
  • complex, sequential decisions
  • even counterintuitive decisions
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

12/31 – mistlab.ca

slide-48
SLIDE 48

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (II)

Reinforcement Learning

state of the world → action → new state of the world → action → ..

Because the performance metric of RL (i.e., the collected rewardS)

is computed over time, solving a RL problem allows to make

  • planning
  • complex, sequential decisions
  • even counterintuitive decisions
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

12/31 – mistlab.ca

slide-49
SLIDE 49

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (II)

Reinforcement Learning

state of the world → action → new state of the world → action → ..

Because the performance metric of RL (i.e., the collected rewardS)

is computed over time, solving a RL problem allows to make

  • planning
  • complex, sequential decisions
  • even counterintuitive decisions
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

12/31 – mistlab.ca

slide-50
SLIDE 50

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (II)

Reinforcement Learning

state of the world → action → new state of the world → action → ..

Because the performance metric of RL (i.e., the collected rewardS)

is computed over time, solving a RL problem allows to make

  • planning
  • complex, sequential decisions
  • even counterintuitive decisions
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

12/31 – mistlab.ca

slide-51
SLIDE 51

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (II)

Reinforcement Learning

state of the world → action → new state of the world → action → ..

Because the performance metric of RL (i.e., the collected rewardS)

is computed over time, solving a RL problem allows to make

  • planning
  • complex, sequential decisions
  • even counterintuitive decisions
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

12/31 – mistlab.ca

slide-52
SLIDE 52

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (III)

If today was a sunny day

  • a classification algorithm would label it as “go to the seaside”
  • RL would tell you “you might as well study and enjoy the fact that you did not

fail your exams later in the summer” RL is not an epicurean carpe diem methodology, but a more farsighted and judicious approach. The point is, not how long you live, but how nobly you live.

  • Lucius Annaeus Seneca
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

13/31 – mistlab.ca

slide-53
SLIDE 53

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (III)

If today was a sunny day

  • a classification algorithm would label it as “go to the seaside”
  • RL would tell you “you might as well study and enjoy the fact that you did not

fail your exams later in the summer” RL is not an epicurean carpe diem methodology, but a more farsighted and judicious approach. The point is, not how long you live, but how nobly you live.

  • Lucius Annaeus Seneca
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

13/31 – mistlab.ca

slide-54
SLIDE 54

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (III)

If today was a sunny day

  • a classification algorithm would label it as “go to the seaside”
  • RL would tell you “you might as well study and enjoy the fact that you did not

fail your exams later in the summer” RL is not an epicurean carpe diem methodology, but a more farsighted and judicious approach. The point is, not how long you live, but how nobly you live.

  • Lucius Annaeus Seneca
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

13/31 – mistlab.ca

slide-55
SLIDE 55

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (III)

If today was a sunny day

  • a classification algorithm would label it as “go to the seaside”
  • RL would tell you “you might as well study and enjoy the fact that you did not

fail your exams later in the summer” RL is not an epicurean carpe diem methodology, but a more farsighted and judicious approach. The point is, not how long you live, but how nobly you live.

  • Lucius Annaeus Seneca
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

13/31 – mistlab.ca

slide-56
SLIDE 56

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (III)

If today was a sunny day

  • a classification algorithm would label it as “go to the seaside”
  • RL would tell you “you might as well study and enjoy the fact that you did not

fail your exams later in the summer” RL is not an epicurean carpe diem methodology, but a more farsighted and judicious approach. The point is, not how long you live, but how nobly you live.

  • Lucius Annaeus Seneca
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

13/31 – mistlab.ca

slide-57
SLIDE 57

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Why Reinforcement Learning is Different (III)

If today was a sunny day

  • a classification algorithm would label it as “go to the seaside”
  • RL would tell you “you might as well study and enjoy the fact that you did not

fail your exams later in the summer” RL is not an epicurean carpe diem methodology, but a more farsighted and judicious approach. The point is, not how long you live, but how nobly you live.

  • Lucius Annaeus Seneca
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

13/31 – mistlab.ca

slide-58
SLIDE 58

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

moving on to self-adaptive computing..

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

14/31 – mistlab.ca

slide-59
SLIDE 59

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Typical Properties of Self-adaptive Computing

Self-configuration

The system requires limited or no human intervention in order to set-up.

Self-optimization

The system is able to achieve user-defined goals autonomously, without human interaction.

Self-healing

The system can detect and recover from faults without human intervention. Together with self-protection, these are the properties identified in [3] for autonomic system.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

15/31 – mistlab.ca

slide-60
SLIDE 60

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Typical Properties of Self-adaptive Computing

Self-configuration

The system requires limited or no human intervention in order to set-up.

Self-optimization

The system is able to achieve user-defined goals autonomously, without human interaction.

Self-healing

The system can detect and recover from faults without human intervention. Together with self-protection, these are the properties identified in [3] for autonomic system.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

15/31 – mistlab.ca

slide-61
SLIDE 61

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-configuration Example

Multi-platform software

Software that is able to run on different hardware configurations seamlessly is a good example of self-configuration. Hardware Inst.Tools Software

Detect Config. Install Run

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

16/31 – mistlab.ca

slide-62
SLIDE 62

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-configuration Example

Multi-platform software

Software that is able to run on different hardware configurations seamlessly is a good example of self-configuration. Hardware Inst.Tools Software

Detect Config. Install Run

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

16/31 – mistlab.ca

slide-63
SLIDE 63

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Typical Properties of Self-adaptive Computing

Self-configuration

The system requires limited or no human intervention in order to set-up.

Self-optimization

The system is able to achieve user-defined goals autonomously, without human interaction.

Self-healing

The system can detect and recover from faults without human intervention. Together with self-protection, these are the properties identified in [3] for autonomic system.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

17/31 – mistlab.ca

slide-64
SLIDE 64

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Typical Properties of Self-adaptive Computing

Self-configuration

The system requires limited or no human intervention in order to set-up.

Self-optimization

The system is able to achieve user-defined goals autonomously, without human interaction.

Self-healing

The system can detect and recover from faults without human intervention. Together with self-protection, these are the properties identified in [3] for autonomic system.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

17/31 – mistlab.ca

slide-65
SLIDE 65

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-optimization Example

Smart Video Players

Players that can adjust media encoding in order to maintain a certain Quality of Service (QoS) can be considered self-optimizing applications. Video Manager Encoder

Detect Quality Control Play

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

18/31 – mistlab.ca

slide-66
SLIDE 66

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-optimization Example

Smart Video Players

Players that can adjust media encoding in order to maintain a certain Quality of Service (QoS) can be considered self-optimizing applications. Video Manager Encoder

Detect Quality Control Play

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

18/31 – mistlab.ca

slide-67
SLIDE 67

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Typical Properties of Self-adaptive Computing

Self-configuration

The system requires limited or no human intervention in order to set-up.

Self-optimization

The system is able to achieve user-defined goals autonomously, without human interaction.

Self-healing

The system can detect and recover from faults without human intervention. Together with self-protection, these are the properties identified in [3] for autonomic system.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

19/31 – mistlab.ca

slide-68
SLIDE 68

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Typical Properties of Self-adaptive Computing

Self-configuration

The system requires limited or no human intervention in order to set-up.

Self-optimization

The system is able to achieve user-defined goals autonomously, without human interaction.

Self-healing

The system can detect and recover from faults without human intervention. Together with self-protection, these are the properties identified in [3] for autonomic system.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

19/31 – mistlab.ca

slide-69
SLIDE 69

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-healing Example

Reconfigurable Logic

FPGAs are a good playground for self-healing implementation. Part of the hardware resources can be used to verify the correct functioning of the rest of the logic and force reconfiguration when a fault is detected. Prog.Logic Listener µContr.

Detect Fault Inform Reconfigure

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

20/31 – mistlab.ca

slide-70
SLIDE 70

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-healing Example

Reconfigurable Logic

FPGAs are a good playground for self-healing implementation. Part of the hardware resources can be used to verify the correct functioning of the rest of the logic and force reconfiguration when a fault is detected. Prog.Logic Listener µContr.

Detect Fault Inform Reconfigure

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

20/31 – mistlab.ca

slide-71
SLIDE 71

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Typical Properties of Self-adaptive Computing

Self-configuration

The system requires limited or no human intervention in order to set-up.

Self-optimization

The system is able to achieve user-defined goals autonomously, without human interaction.

Self-healing

The system can detect and recover from faults without human intervention. Together with self-protection, these are the properties identified in [3] for autonomic system.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

21/31 – mistlab.ca

slide-72
SLIDE 72

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Typical Properties of Self-adaptive Computing

Self-configuration

The system requires limited or no human intervention in order to set-up.

Self-optimization

The system is able to achieve user-defined goals autonomously, without human interaction.

Self-healing

The system can detect and recover from faults without human intervention. Together with self-protection, these are the properties identified in [3] for autonomic system.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

21/31 – mistlab.ca

slide-73
SLIDE 73

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Research Question

Is RL a suitable approach for self-adaptive computing?

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

22/31 – mistlab.ca

slide-74
SLIDE 74

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Case Study

Testing Environment

  • Desktop workstation
  • Multi-core Intel i7 Processor
  • Linux-based operating system

Objective of our Experiments

Enabling self-adaptive properties in applications of the PARSEC[4] benchmark suite through reinforcement learning algorithms.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

23/31 – mistlab.ca

slide-75
SLIDE 75

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Case Study

Testing Environment

  • Desktop workstation
  • Multi-core Intel i7 Processor
  • Linux-based operating system

Objective of our Experiments

Enabling self-adaptive properties in applications of the PARSEC[4] benchmark suite through reinforcement learning algorithms.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

23/31 – mistlab.ca

slide-76
SLIDE 76

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Case Study

Testing Environment

  • Desktop workstation
  • Multi-core Intel i7 Processor
  • Linux-based operating system

Objective of our Experiments

Enabling self-adaptive properties in applications of the PARSEC[4] benchmark suite through reinforcement learning algorithms.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

23/31 – mistlab.ca

slide-77
SLIDE 77

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Tests Set-Up

Reinforcement Learning Framework

  • A finite set of states S

→ heart rate of the PARSEC benchmark application measured through Heart Rate Monitor (HRM) APIs [5]

  • A finite set of actions A

→ (1) number of cores on which the PARSEC benchmark application is scheduled 2 and (2) CPU frequency 3

  • A reward function R(s) ∶ S → R

→whether a user-defined target (in heartbeats/s) is met or not

2sched setaffinity system call 3cpufrequtils package

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

24/31 – mistlab.ca

slide-78
SLIDE 78

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Tests Set-Up

Reinforcement Learning Framework

  • A finite set of states S

→ heart rate of the PARSEC benchmark application measured through Heart Rate Monitor (HRM) APIs [5]

  • A finite set of actions A

→ (1) number of cores on which the PARSEC benchmark application is scheduled 2 and (2) CPU frequency 3

  • A reward function R(s) ∶ S → R

→whether a user-defined target (in heartbeats/s) is met or not

2sched setaffinity system call 3cpufrequtils package

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

24/31 – mistlab.ca

slide-79
SLIDE 79

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Tests Set-Up

Reinforcement Learning Framework

  • A finite set of states S

→ heart rate of the PARSEC benchmark application measured through Heart Rate Monitor (HRM) APIs [5]

  • A finite set of actions A

→ (1) number of cores on which the PARSEC benchmark application is scheduled 2 and (2) CPU frequency 3

  • A reward function R(s) ∶ S → R

→whether a user-defined target (in heartbeats/s) is met or not

2sched setaffinity system call 3cpufrequtils package

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

24/31 – mistlab.ca

slide-80
SLIDE 80

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Tests Set-Up

Reinforcement Learning Framework

  • A finite set of states S

→ heart rate of the PARSEC benchmark application measured through Heart Rate Monitor (HRM) APIs [5]

  • A finite set of actions A

→ (1) number of cores on which the PARSEC benchmark application is scheduled 2 and (2) CPU frequency 3

  • A reward function R(s) ∶ S → R

→whether a user-defined target (in heartbeats/s) is met or not

2sched setaffinity system call 3cpufrequtils package

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

24/31 – mistlab.ca

slide-81
SLIDE 81

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Tests Set-Up

Reinforcement Learning Framework

  • A finite set of states S

→ heart rate of the PARSEC benchmark application measured through Heart Rate Monitor (HRM) APIs [5]

  • A finite set of actions A

→ (1) number of cores on which the PARSEC benchmark application is scheduled 2 and (2) CPU frequency 3

  • A reward function R(s) ∶ S → R

→whether a user-defined target (in heartbeats/s) is met or not

2sched setaffinity system call 3cpufrequtils package

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

24/31 – mistlab.ca

slide-82
SLIDE 82

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Tests Set-Up

Reinforcement Learning Framework

  • A finite set of states S

→ heart rate of the PARSEC benchmark application measured through Heart Rate Monitor (HRM) APIs [5]

  • A finite set of actions A

→ (1) number of cores on which the PARSEC benchmark application is scheduled 2 and (2) CPU frequency 3

  • A reward function R(s) ∶ S → R

→whether a user-defined target (in heartbeats/s) is met or not

2sched setaffinity system call 3cpufrequtils package

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

24/31 – mistlab.ca

slide-83
SLIDE 83

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Tests Set-Up

Reinforcement Learning Framework

  • A finite set of states S

→ heart rate of the PARSEC benchmark application measured through Heart Rate Monitor (HRM) APIs [5]

  • A finite set of actions A

→ (1) number of cores on which the PARSEC benchmark application is scheduled 2 and (2) CPU frequency 3

  • A reward function R(s) ∶ S → R

→whether a user-defined target (in heartbeats/s) is met or not

2sched setaffinity system call 3cpufrequtils package

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

24/31 – mistlab.ca

slide-84
SLIDE 84

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Tests Set-Up

Reinforcement Learning Framework

  • A finite set of states S

→ heart rate of the PARSEC benchmark application measured through Heart Rate Monitor (HRM) APIs [5]

  • A finite set of actions A

→ (1) number of cores on which the PARSEC benchmark application is scheduled 2 and (2) CPU frequency 3

  • A reward function R(s) ∶ S → R

→whether a user-defined target (in heartbeats/s) is met or not

2sched setaffinity system call 3cpufrequtils package

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

24/31 – mistlab.ca

slide-85
SLIDE 85

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-configuration

1 2 3 4 5 6 7 8 9 10

5 10 15 20 25

  • perf. (M options/s)

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 9501000 1 2 3 4

time(s) cores blackscholes managed exploiting ADP and core allocation.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

25/31 – mistlab.ca

slide-86
SLIDE 86

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-configuration

1 2 3 4 5 6 7 8 9 10

5 10 15 20 25

  • perf. (M options/s)

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 9501000 1 2 3 4

time(s) cores blackscholes managed exploiting ADP and core allocation.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

25/31 – mistlab.ca

slide-87
SLIDE 87

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-configuration

1 2 3 4 5 6 7 8 9 10

5 10 15 20 25

  • perf. (M options/s)

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 9501000 1 2 3 4

time(s) cores blackscholes managed exploiting ADP and core allocation.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

25/31 – mistlab.ca

slide-88
SLIDE 88

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-configuration

1 2 3 4 5 6 7 8 9 10

5 10 15 20 25

  • perf. (M options/s)

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 9501000 1 2 3 4

time(s) cores blackscholes managed exploiting ADP and core allocation .

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

25/31 – mistlab.ca

slide-89
SLIDE 89

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-configuration

1 2 3 4 5 6 7 8 9 10

5 10 15 20 25

  • perf. (M options/s)

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 9501000 1 2 3 4

time(s) cores blackscholes managed exploiting ADP and core allocation.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

25/31 – mistlab.ca

slide-90
SLIDE 90

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-configuration

1 2 3 4 5 6 7 8 9 10

5 10 15 20 25

  • perf. (M options/s)

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 9501000 1 2 3 4

time(s) cores blackscholes managed exploiting ADP and core allocation.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

25/31 – mistlab.ca

slide-91
SLIDE 91

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-configuration

1 2 3 4 5 6 7 8 9 10

5 10 15 20 25

  • perf. (M options/s)

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 9501000 1 2 3 4

time(s) cores blackscholes managed exploiting ADP and core allocation.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

25/31 – mistlab.ca

slide-92
SLIDE 92

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-configuration

1 2 3 4 5 6 7 8 9 10

5 10 15 20 25

  • perf. (M options/s)

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 9501000 1 2 3 4

time(s) cores blackscholes managed exploiting ADP and core allocation.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

25/31 – mistlab.ca

slide-93
SLIDE 93

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-configuration

1 2 3 4 5 6 7 8 9 10

5 10 15 20 25

  • perf. (M options/s)

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 9501000 1 2 3 4

time(s) cores blackscholes managed exploiting ADP and core allocation.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

25/31 – mistlab.ca

slide-94
SLIDE 94

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-configuration

1 2 3 4 5 6 7 8 9 10

5 10 15 20 25

  • perf. (M options/s)

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 9501000 1 2 3 4

time(s) cores blackscholes managed exploiting ADP and core allocation.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

25/31 – mistlab.ca

slide-95
SLIDE 95

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-configuration

1 2 3 4 5 6 7 8 9 10

5 10 15 20 25

  • perf. (M options/s)

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 9501000 1 2 3 4

time(s) cores blackscholes managed exploiting ADP and core allocation.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

25/31 – mistlab.ca

slide-96
SLIDE 96

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-optimization

1 2 3 4 5 6 7 8 9 10

0.5 1 1.5 2 2.5

  • perf. (M exchanges/s)

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 1 2 3 4

time (s) cores canneal managed exploiting ADP and core allocation.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

26/31 – mistlab.ca

slide-97
SLIDE 97

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-optimization

1 2 3 4 5 6 7 8 9 10

0.5 1 1.5 2 2.5

  • perf. (M exchanges/s)

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 1 2 3 4

time (s) cores canneal managed exploiting ADP and core allocation.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

26/31 – mistlab.ca

slide-98
SLIDE 98

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-healing

1 2 3 4 5 6 7 8 9 10

0.5 1 1.5 2 2.5

  • perf. (M exchanges/s)

1 2 3 4

cores

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 1 14

time (s) frequency canneal managed exploiting ADP, core allocation, and frequency scaling.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

27/31 – mistlab.ca

slide-99
SLIDE 99

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-healing

1 2 3 4 5 6 7 8 9 10

0.5 1 1.5 2 2.5

  • perf. (M exchanges/s)

1 2 3 4

cores

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 1 14

time (s) frequency canneal managed exploiting ADP, core allocation, and frequency scaling .

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

27/31 – mistlab.ca

slide-100
SLIDE 100

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-healing

1 2 3 4 5 6 7 8 9 10

0.5 1 1.5 2 2.5

  • perf. (M exchanges/s)

1 2 3 4

cores

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 1 14

time (s) frequency canneal managed exploiting ADP, core allocation, and frequency scaling.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

27/31 – mistlab.ca

slide-101
SLIDE 101

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Self-healing

1 2 3 4 5 6 7 8 9 10

0.5 1 1.5 2 2.5

  • perf. (M exchanges/s)

1 2 3 4

cores

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 1 14

time (s) frequency canneal managed exploiting ADP, core allocation, and frequency scaling.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

27/31 – mistlab.ca

slide-102
SLIDE 102

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Conclusions

  • Reinforcement learning and its relation with other machine learning methodologies

and behavioural psychology

  • Properties of self-adaptive computing
  • How to exploit reinforcement learning for self-adaptive computing
  • Experimental results showing reinforcement learning enabling self-adaptive

computing properties

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

28/31 – mistlab.ca

slide-103
SLIDE 103

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Conclusions

  • Reinforcement learning and its relation with other machine learning methodologies

and behavioural psychology

  • Properties of self-adaptive computing
  • How to exploit reinforcement learning for self-adaptive computing
  • Experimental results showing reinforcement learning enabling self-adaptive

computing properties

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

28/31 – mistlab.ca

slide-104
SLIDE 104

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Conclusions

  • Reinforcement learning and its relation with other machine learning methodologies

and behavioural psychology

  • Properties of self-adaptive computing
  • How to exploit reinforcement learning for self-adaptive computing
  • Experimental results showing reinforcement learning enabling self-adaptive

computing properties

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

28/31 – mistlab.ca

slide-105
SLIDE 105

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Conclusions

  • Reinforcement learning and its relation with other machine learning methodologies

and behavioural psychology

  • Properties of self-adaptive computing
  • How to exploit reinforcement learning for self-adaptive computing
  • Experimental results showing reinforcement learning enabling self-adaptive

computing properties

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

28/31 – mistlab.ca

slide-106
SLIDE 106

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Conclusions

  • Reinforcement learning and its relation with other machine learning methodologies

and behavioural psychology

  • Properties of self-adaptive computing
  • How to exploit reinforcement learning for self-adaptive computing
  • Experimental results showing reinforcement learning enabling self-adaptive

computing properties

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

28/31 – mistlab.ca

slide-107
SLIDE 107

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

Q&A

4

4http://www.dilbert.com/

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

29/31 – mistlab.ca

slide-108
SLIDE 108

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

References I

  • D. S. Hayden, S. Chien, D. R. Thompson, and R. Casta˜

no, “Using clustering and metric learning to improve science return of remote sensed imagery,” ACM Trans.

  • Intell. Syst. Technol., vol. 3, no. 3, pp. 51:1–51:19, May 2012. [Online]. Available:

http://doi.acm.org/10.1145/2168752.2168765

  • B. F. Skinner, Science and human behavior.

Free Press, 1965.

  • J. Kephart and D. Chess, “The vision of autonomic computing,” Computer,
  • vol. 36, no. 1, pp. 41–50, 2003.
  • C. Bienia, “Benchmarking modern multiprocessors,” Ph.D. dissertation, Princeton,

NJ, USA, 2011, aAI3445564.

  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

30/31 – mistlab.ca

slide-109
SLIDE 109

POLYTECHNIQUE MONTR´ EAL Rationale Reinforcement Learning Self-adaptive Computing Case Study Conclusions References

References II

  • F. Sironi, D. B. Bartolini, S. Campanoni, F. Cancare, H. Hoffmann, D. Sciuto, and
  • M. D. Santambrogio, “Metronome: operating system level performance

management via self-adaptive computing,” in Proceedings of the 49th Annual Design Automation Conference, ser. DAC ’12. New York, NY, USA: ACM, 2012,

  • pp. 856–865. [Online]. Available: http://doi.acm.org/10.1145/2228360.2228514
  • J. Panerati et al. – On Self-adaptive Resource Allocation through Reinforcement Learning

31/31 – mistlab.ca