Evaluating the Performance of Reinforcement Learning Algorithms - - PowerPoint PPT Presentation

β–Ά
evaluating the performance of reinforcement learning
SMART_READER_LITE
LIVE PREVIEW

Evaluating the Performance of Reinforcement Learning Algorithms - - PowerPoint PPT Presentation

Evaluating the Performance of Reinforcement Learning Algorithms Scott Jordan , Yash Chandak, Daniel Cohen, Mengxue Zhang, Philip Thomas Why do we care? Performance evaluations: 1. Justify novel algorithms or enhancements 2. Tell us what


slide-1
SLIDE 1

Evaluating the Performance of Reinforcement Learning Algorithms

Scott Jordan, Yash Chandak, Daniel Cohen, Mengxue Zhang, Philip Thomas

slide-2
SLIDE 2

Why do we care?

Performance evaluations: 1. Justify novel algorithms or enhancements 2. Tell us what algorithms to use If done correctly:

  • Can identify solved problems
  • Place emphasis on areas that need more research
slide-3
SLIDE 3

RL Algorithms for the Real-world

Want: 1. High levels of performance 2. No expert knowledge required As a result: 1. less time tuning algorithms 2. More time solving harder problems

slide-4
SLIDE 4

Algorithm Performance Evaluations

Typical evaluation procedure:

  • 1. Tune each algorithm’s hyperparameters (e.g., policy structure, learning rate)
  • 2. Run several trials of using tune parameters
  • 3. Report performance (metrics, learning curve, etc.)

This does not fit our needs:

  • Ignores the difficulty of applying algorithms

Need a new evaluation procedure!

slide-5
SLIDE 5
  • Evaluation Pipeline

Account for difficulty applying an algorithm Balance importance

  • f each environment
slide-6
SLIDE 6

A General Evaluation Question

Which algorithm(s) perform well across a wide variety of environments with little or no environment–specific tuning? Existing evaluation procedures cannot answer this question We develop techniques for: 1. Sampling performance metrics that reflect knowledge of how to use the algorithm 2. Normalizing scores to account for the intrinsic difficulties of each environment 3. Balancing the importance of each environment in the aggregate measure 4. Computing uncertainty over the whole process

slide-7
SLIDE 7

Sampling Performance Without Tuning

  • Formalize knowledge to use an algorithm

Complete algorithm definition

slide-8
SLIDE 8

Sampling Performance Without Tuning

An algorithm is complete on an environment, when defined such that the only required input to the algorithm is the environment.

π‘Œ ∼ alg (𝑁)

Performance sample algorithm environment

No hyperparameters!

slide-9
SLIDE 9

Making Complete Algorithm Definitions

  • Open research question!

random sampling methods

M a n u a l

smart heuristics adaptive methods

slide-10
SLIDE 10

Performance of Complete Algorithms

Diverging runs Well tuned

Can measure improvements in usability!

Better algorithm

slide-11
SLIDE 11

Comparisons Over Multiple Environments

Problem:

  • No common measure of performance

Desired normalization properties:

  • Same scale and center
  • Capture intrinsic difficulty

Use cumulative distribution function

slide-12
SLIDE 12

Normalizing Scores

slide-13
SLIDE 13

Normalizing Scores

slide-14
SLIDE 14

Normalizing Scores

slide-15
SLIDE 15

Normalizing Scores

Large change in difficulty Small change in difficulty

Use weighted combination of all CDFs Which algorithm to normalize against?

slide-16
SLIDE 16

Aggregating Performance Measures

  • Need to weight normalization

functions

  • Need to weight environments
  • Avoid unintentional bias in weightings

Use game theory!

π‘Ž! = #

" β„³

π‘Ÿ" #

$ 𝒝

π‘Ÿ$E 𝐺!(𝑦 𝑛 )

Aggregate performance

  • f algorithm

x Environment weights Normalization weights Normalization function

slide-17
SLIDE 17

Two-Player Game

Player p Player q

𝑍 𝑁 π‘Œ

E 𝐺

"(π‘Œ 𝑁 )

max min π‘ž π‘Ÿ Normalizing Distribution Executing Algorithm Environment Use π‘Ÿ from equilibrium solution to evaluate each algorithm π‘ž Executing Algorithm

Gridworld, Chain, Cart-Pole, Mountain Car, Acrobot Bicycle

π‘Ÿ Normalization Algorithm Environment

slide-18
SLIDE 18
  • Quantifying Uncertainty

Sources of uncertainty Confidence intervals

slide-19
SLIDE 19

Quantifying Uncertainty

Valid for any distribution Assumptions of normality No guarantee Adapt step sizes Lots of hyperparameters

slide-20
SLIDE 20

Takeaways

No need to tune hyperparameters Can measure improvement in usability Reliable estimates

  • f uncertainty
slide-21
SLIDE 21

Acknowledgements

Daniel Cohen

  • Prof. Philip S.

Thomas Yash Chandak Mengxue Zhang

slide-22
SLIDE 22

Ques Questions ns?

Scott Jordan sjordan@cs.umass.edu

http://cics.umass.edu/sjordan | @UMassScott