Presentation Data February 2015 CITATIONS READS 0 62 2 authors: - - PDF document

presentation
SMART_READER_LITE
LIVE PREVIEW

Presentation Data February 2015 CITATIONS READS 0 62 2 authors: - - PDF document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/271642128 Presentation Data February 2015 CITATIONS READS 0 62 2 authors: Mohamed K. Gunady Walid Gomaa University of Maryland,


slide-1
SLIDE 1

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/271642128

Presentation

Data · February 2015

CITATIONS READS

62

2 authors: Some of the authors of this publication are also working on these related projects: Vehicle Detection and Tracking in Videos for very crowded scenes employing Quadrotors View project Blink! A Smartphone based Automatic Accident Detection and Notification System View project Mohamed K. Gunady University of Maryland, College Park

7 PUBLICATIONS 14 CITATIONS

SEE PROFILE

Walid Gomaa Egypt-Japan University of Science and Technology

121 PUBLICATIONS 549 CITATIONS

SEE PROFILE

All content following this page was uploaded by Walid Gomaa on 01 February 2015.

The user has requested enhancement of the downloaded file.

slide-2
SLIDE 2

Reinforcement Learning Generalization Using State Aggregation with a Maze-Solving Problem Mohamed K. Gunady Walid Gomaa

Department of Computer Science Engineering Egypt-Japan University of Science and Technology (E-JUST) Alexandria, Egypt Presented by: Mohamed K. Geunady

slide-3
SLIDE 3

Overview

  • Introduction.
  • Q-Learning.
  • State Aggregation.
  • SAQL Algorithm.
  • Experimental Results.
  • Future Work.
  • Conclusion.

JEC-ECC March 2012, Alexandria, Egypt 2 SAQL - M. Gunady - EJUST

slide-4
SLIDE 4

Introduction

  • Reinforcement Learning.

– Learn by Trial & Error. – Interaction with the environment. – Best strategy to maximize the total rewards. – Based on MDP model.

  • current state s, action a, next state s', reward R(s, a)

– Many algorithms:

  • Q-learning, TD(λ), Sarsa-Learning..

JEC-ECC March 2012 Alexandria, Egypt 3 SAQL - M. Gunady - EJUST

slide-5
SLIDE 5

Q-Learning

  • Observing <s, a, r, s'>
  • Learn an optimal policy π* : S → A.
  • Max the cumulative discounted Reward:
  • Let

V*(s) = maxa Q(s,a)

then

JEC-ECC March 2012 Alexandria, Egypt 4 SAQL - M. Gunady - EJUST

Vπ(st) = rt + γ rt+1 + γ2 rt+2 +… (1) Q(s,a) = r(s,a) + γ maxa' Q(s',a') (2)

slide-6
SLIDE 6

Q-Learning (Contd.)

  • Guaranteed to converge. However, slow

convergence rate.

  • Lookup table for Q(s,a) , high computation,

space complexity.

  • curse of dimensionality in state-action space.
  • Thus, more compact representations.

– hence, Generalization Techniques.

JEC-ECC March 2012, Alexandria, Egypt SAQL - M. Gunady - EJUST 5

slide-7
SLIDE 7

State Aggregation

  • Generalization Techniques:

– Function Approximators: e.g. NN. – Hierarchical Learning. – State Aggregation.

  • Reduce time and storage requirements.
  • Mostly, smooth state space, i.e. the values of

the adjacent states are nearly equal.

– Combine similar states.

JEC-ECC March 2012, Alexandria, Egypt SAQL - M. Gunady - EJUST 6

slide-8
SLIDE 8

State Aggregation (Contd.)

  • Then, Two questions:

– How to determine the similarity between states? – How to learn over the new state-action space?

  • Terminologies:

– Ground space, the actual MDP. – Abstract space, the hypothetical reduced MDP.

  • Denote X for the abstract state space.
  • Ax for the abstract action space.
  • Rx for the new reward function.

JEC-ECC March 2012, Alexandria, Egypt SAQL - M. Gunady - EJUST 7

slide-9
SLIDE 9

SAQL Algorithm

  • Main Steps:

– Discover similar states. – Group them into one abstract state. – Learn over this single abstract state instead of the many similar ground states.

  • how to decide a group of states are similar, i.e.

consistent, enough to be grouped?

– if some neighbouring states have consistent reward payoffs.

JEC-ECC March 2012, Alexandria, Egypt SAQL - M. Gunady - EJUST 8

slide-10
SLIDE 10

SAQL Algorithm

Consistency Test

  • Let abstract state x,
  • s, s’ ground states to be grouped into x.
  • Consistency Rule for x:

IF THEN group x is consistent, construct abstract state x.

JEC-ECC March 2012, Alexandria, Egypt SAQL - M. Gunady - EJUST 9

(3)

slide-11
SLIDE 11

SAQL Algorithm

Abstract-Ground Mapping

  • Abstract action space Ax
  • ax abstract action, from abstract state x to x',
  • ne of the neighboring abstract states to x.
  • Map ax to the equivalent ground actions within

x, i.e. Internal Actions Ain

  • Internal actions have to be planned

algorithmically not by learning.

– Use simple group topology, e.g. square-shaped

JEC-ECC March 2012, Alexandria, Egypt SAQL - M. Gunady - EJUST 10

slide-12
SLIDE 12

SAQL Algorithm

System Architecture

JEC-ECC March 2012, Alexandria, Egypt SAQL - M. Gunady - EJUST 11

  • Check the paper for the full

algorithm.

slide-13
SLIDE 13

Experimental Results

  • Maze-solving problem as a test bed.
  • Problem Settings:

– * mark starting state, location (row, col). – X mark absorbing state (goal). – {N, S, E, W} actions. – Reward function:

  • +10 for goal state.
  • - 10 for obstacle state.
  • - 0.02 otherwise.

JEC-ECC March 2012, Alexandria, Egypt SAQL - M. Gunady - EJUST 12

slide-14
SLIDE 14

Experimental Results

Convergence Rate

  • Learning Episodes.

– Each contains many iterations.

  • 60x60 maze.

JEC-ECC March 2012, Alexandria, Egypt SAQL - M. Gunady - EJUST 13

slide-15
SLIDE 15

Experimental Results

State Space Size

  • Different maze sizes.
  • 100, 900, 3600 states.
  • QL suffers high increase in number of

iterations by increasing state space size.

  • SAQL suffers much less.
  • Speedup 3.9x, for state space of size 3600,

within the first 100 episodes.

JEC-ECC March 2012, Alexandria, Egypt SAQL - M. Gunady - EJUST 14

slide-16
SLIDE 16

Experimental Results

State Space Size

JEC-ECC March 2012, Alexandria, Egypt SAQL - M. Gunady - EJUST 15

slide-17
SLIDE 17

Experimental Results

Maze Complexity

  • Aggregation depends on the reward function

w.r.t. the neighboring states.

  • The more consistent regions, the more

aggregation efficiency => more reduction.

  • i.e. number & distribution of obstacles

JEC-ECC March 2012, Alexandria, Egypt SAQL - M. Gunady - EJUST 16

slide-18
SLIDE 18

Experimental Results

Maze Complexity

JEC-ECC March 2012, Alexandria, Egypt SAQL - M. Gunady - EJUST 17

  • 30x30 maze, different obstacles
slide-19
SLIDE 19

Experimental Results

Iteration Complexity

  • Tradeoffs: the learning iteration became more

complex.

  • Extra computational work:

– Consistency test. – States merging. – Mapping between ground/abstract states and actions.

  • But usually, the number of learning actions is

highly costly compared to computations.

JEC-ECC March 2012, Alexandria, Egypt SAQL - M. Gunady - EJUST 18

slide-20
SLIDE 20

Future Work

  • Simple grouping technique and shapes.
  • Arbitrary shapes rather than squared ones.

– How to plan internal actions, and quickly.

  • More complex groping technique

– E.g. allow group breaking and regrouping.

  • Extend to probabilistic and dynamic

environments.

JEC-ECC March 2012, Alexandria, Egypt SAQL - M. Gunady - EJUST 19

slide-21
SLIDE 21

Conclusion

  • RL Generalization with state aggregation is

promising.

  • Modified QL, SAQL, algorism and system

architecture is introduced.

  • Speedup of 4x.
  • 60% state space reduction.

JEC-ECC March 2012, Alexandria, Egypt SAQL - M. Gunady - EJUST 20

slide-22
SLIDE 22

ًاركش Thank You

  • View publication stats
View publication stats