CS325 Artificial Intelligence Ch. 17 Planning Under Uncertainty - - PowerPoint PPT Presentation

cs325 artificial intelligence ch 17 planning under
SMART_READER_LITE
LIVE PREVIEW

CS325 Artificial Intelligence Ch. 17 Planning Under Uncertainty - - PowerPoint PPT Presentation

CS325 Artificial Intelligence Ch. 17 Planning Under Uncertainty Cengiz Gnay, Emory Univ. Spring 2013 Gnay Ch. 17 Planning Under Uncertainty Spring 2013 1 / 17 Is This AI Course a Bit Schizo? Classical AI vs. Machine Learning


slide-1
SLIDE 1

CS325 Artificial Intelligence

  • Ch. 17 – Planning Under Uncertainty

Cengiz Günay, Emory Univ. Spring 2013

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 1 / 17

slide-2
SLIDE 2

Is This AI Course a Bit Schizo?

Classical AI vs. Machine Learning

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 2 / 17

slide-3
SLIDE 3

Is This AI Course a Bit Schizo?

Classical AI vs. Machine Learning

Classical AI Symbolic logic (propositional, first-order) Algorithms Thinking and programming

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 2 / 17

slide-4
SLIDE 4

Is This AI Course a Bit Schizo?

Classical AI vs. Machine Learning

Classical AI Symbolic logic (propositional, first-order) Algorithms Thinking and programming Probabilities Math Machine Learning Automated methods, power of math

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 2 / 17

slide-5
SLIDE 5

Is This AI Course a Bit Schizo?

Classical AI vs. Machine Learning

Classical AI Symbolic logic (propositional, first-order) Algorithms Thinking and programming Probabilities Math Machine Learning Automated methods, power of math

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 2 / 17

slide-6
SLIDE 6

Planning Under Uncertainty

Into Thrun territory Aim is to use more math, probabilities achieve learnability for hard-to-program scenarios (that is, real-life)

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 3 / 17

slide-7
SLIDE 7

Planning Under Uncertainty

Into Thrun territory Aim is to use more math, probabilities achieve learnability for hard-to-program scenarios (that is, real-life)

Planning Uncertainty Learning RL plan +exec, MDP

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 3 / 17

slide-8
SLIDE 8

Entry/Exit Surveys

Exit survey: Planning

Why do we need to alternate between plan and execution? Why do we need a belief state?

Entry survey: Planning Under Uncertainty (0.25 points of final grade)

What algorithm would you use to plan under uncertain conditions? How do you think machine learning can be used in planning?

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 4 / 17

slide-9
SLIDE 9

So What’s Wrong with Classical Planning?

1 2 3 4 a G b xt c S Grid World: S: Start G: Goal

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 5 / 17

slide-10
SLIDE 10

So What’s Wrong with Classical Planning?

1 2 3 4 a G b xt c S Grid World: S: Start G: Goal It’s too slow Branching factor can get large Search tree gets too deep (may have loops) Same states can be repeated multiple times (although can be avoided with dynamic programming)

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 5 / 17

slide-11
SLIDE 11

Start with Certainty: Deterministic Grid World

1 2 3 4 a +1 b xt c S Reward function: R(s) = +1 @ a4 Remember utility values? State, s Action, a Optimal policy π(s) → a?

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 6 / 17

slide-12
SLIDE 12

Start with Certainty: Deterministic Grid World

1 2 3 4 a +1 b xt −1 c S Reward function: R(s) = +1 @ a4 Remember utility values? State, s Action, a Optimal policy π(s) → a?

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 6 / 17

slide-13
SLIDE 13

Start with Certainty: Deterministic Grid World

1 2 3 4 a +1 b xt −1 c S Reward function: R(s) = +1 @ a4 Remember utility values? State, s Action, a Optimal policy π(s) → a?

@ a3? @ b3? @ c4?

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 6 / 17

slide-14
SLIDE 14

Start with Certainty: Deterministic Grid World

1 2 3 4 a → +1 b xt ↑ −1 c S ↑ ← Reward function: R(s) = +1 @ a4 Remember utility values? State, s Action, a Optimal policy π(s) → a?

@ a3? @ b3? @ c4?

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 6 / 17

slide-15
SLIDE 15

Value Iteration: Movement Cost

1 2 3 4 a +1 b xt −1 c S Reward function: R(s) =      +1 @ a4 −1 @ b4 −.1 everywhere else

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 7 / 17

slide-16
SLIDE 16

Value Iteration: Movement Cost

1 2 3 4 a +1 b xt −1 c S Reward function: R(s) =      +1 @ a4 −1 @ b4 −.1 everywhere else Optimal policy π(s) → a?

@ a3? @ b3? @ c4?

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 7 / 17

slide-17
SLIDE 17

Value Iteration: Movement Cost

1 2 3 4 a 0.9 +1 b xt −1 c S Reward function: R(s) =      +1 @ a4 −1 @ b4 −.1 everywhere else Optimal policy π(s) → a?

@ a3? @ b3? @ c4?

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 7 / 17

slide-18
SLIDE 18

Value Iteration: Movement Cost

1 2 3 4 a 0.9 +1 b xt 0.8 −1 c S Reward function: R(s) =      +1 @ a4 −1 @ b4 −.1 everywhere else Optimal policy π(s) → a?

@ a3? @ b3? @ c4?

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 7 / 17

slide-19
SLIDE 19

Value Iteration: Movement Cost

1 2 3 4 a 0.9 +1 b xt 0.8 −1 c S 0.7 0.6 Reward function: R(s) =      +1 @ a4 −1 @ b4 −.1 everywhere else Optimal policy π(s) → a?

@ a3? @ b3? @ c4?

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 7 / 17

slide-20
SLIDE 20

Value Iteration: Movement Cost

1 2 3 4 a 0.9 +1 b xt 0.8 −1 c S 0.7 0.6 Reward function: R(s) =      +1 @ a4 −1 @ b4 −.1 everywhere else Optimal policy π(s) → a?

@ a3? @ b3? @ c4?

Value function: V (s) ←

  • arg max

a

V (s′)

  • + R(s)

where s′ is neighboring states.

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 7 / 17

slide-21
SLIDE 21

Value Iteration Video

Value iteration video

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 8 / 17

slide-22
SLIDE 22

Value Iteration: Discount Factor

1 2 3 4 a 0.9 +1 b xt 0.8 −1 c S 0.7 0.6 Reward function: R(s) =      +1 @ a4 −1 @ b4 everywhere else Recursive definition V (s) ←

  • arg max

a

V (s′)

  • + R(s) ,

can be also written as expected reward V (s) ← arg max

π

E ∞

  • t=0

γtRt |so = s

  • .

Instead of movement cost, it uses discount factor, γ, to decay future reward.

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 9 / 17

slide-23
SLIDE 23

Value Iteration: Discount Factor

1 2 3 4 a 0.9 +1 b xt 0.8 −1 c S 0.7 0.6 Reward function: R(s) =      +1 @ a4 −1 @ b4 everywhere else Recursive definition V (s) ←

  • arg max

a

V (s′)

  • + R(s) ,

can be also written as expected reward V (s) ← arg max

π

E ∞

  • t=0

γtRt |so = s

  • .

Instead of movement cost, it uses discount factor, γ, to decay future reward. Helps to keep it bounded ≤

1 1−γ |Rmax|

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 9 / 17

slide-24
SLIDE 24

Value Iteration: Bellman Equation

General case (Bellman, 1957) is stochastic V (s) ←

  • arg max

a

γ

  • s′

P(s′|a)V (s′)

  • + R(s) .

Recursive Used iteratively Converges to solution

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 10 / 17

slide-25
SLIDE 25

Value Iteration: Bellman Equation

General case (Bellman, 1957) is stochastic V (s) ←

  • arg max

a

γ

  • s′

P(s′|a)V (s′)

  • + R(s) .

Recursive Used iteratively Converges to solution Why stochastic? Remember we want to plan under uncertainty

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 10 / 17

slide-26
SLIDE 26

Markov Decision Processes

Andrey Andreyevich Markov (1856–1922) Russian mathematician Stochastic processes

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 11 / 17

slide-27
SLIDE 27

Markov Decision Processes

Andrey Andreyevich Markov (1856–1922) Russian mathematician Stochastic processes Markov Decision Processes (MDPs) Value iteration with stochasticity (Bellman, 1957)

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 11 / 17

slide-28
SLIDE 28

Markov Decision Processes

Andrey Andreyevich Markov (1856–1922) Russian mathematician Stochastic processes Markov Decision Processes (MDPs) Value iteration with stochasticity (Bellman, 1957) Later Q-learning (1989) → (next class)

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 11 / 17

slide-29
SLIDE 29

Robots in Real Life

Video: Robots gone wild

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 12 / 17

slide-30
SLIDE 30

Uncertain Movement in Grid World

1 2 3 4 a +1 b xt −1 c Reward function: R(s) =      +1 @ a4 −1 @ b4 else

80% 10% 10%

Optimal policy π(s) → a?

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 13 / 17

slide-31
SLIDE 31

Uncertain Movement in Grid World

1 2 3 4 a +1 b xt −1 c Reward function: R(s) =      +1 @ a4 −1 @ b4 else

80% 10% 10%

Optimal policy π(s) → a?

@ a3? @ b3? @ c4?

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 13 / 17

slide-32
SLIDE 32

Uncertain Movement in Grid World

1 2 3 4 a → +1 b xt −1 c Reward function: R(s) =      +1 @ a4 −1 @ b4 else

80% 10% 10%

Optimal policy π(s) → a?

@ a3? @ b3? @ c4?

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 13 / 17

slide-33
SLIDE 33

Uncertain Movement in Grid World

1 2 3 4 a → +1 b xt ← −1 c Reward function: R(s) =      +1 @ a4 −1 @ b4 else

80% 10% 10%

Optimal policy π(s) → a?

@ a3? @ b3? @ c4?

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 13 / 17

slide-34
SLIDE 34

Uncertain Movement in Grid World

1 2 3 4 a → → → +1 b ↑ xt ← −1 c ↑ ← ← ↓ Reward function: R(s) =      +1 @ a4 −1 @ b4 else

80% 10% 10%

Optimal policy π(s) → a?

@ a3? @ b3? @ c4?

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 13 / 17

slide-35
SLIDE 35

Stochastic Value Iteration

1 2 3 4 a +1 b xt −1 c Reward function: R(s) =      +1 @ a4 −1 @ b4 −.03 else

80% 10% 10%

V (s) ←

  • arg max

a

γ

  • s′

P(s′|a)V (s′)

  • + R(s), γ = 1

Optimal policy π(s) → a?

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 14 / 17

slide-36
SLIDE 36

Stochastic Value Iteration

1 2 3 4 a +1 b xt −1 c Reward function: R(s) =      +1 @ a4 −1 @ b4 −.03 else

80% 10% 10%

V (s) ←

  • arg max

a

γ

  • s′

P(s′|a)V (s′)

  • + R(s), γ = 1

Optimal policy π(s) → a? @ a3? @ b3?

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 14 / 17

slide-37
SLIDE 37

Stochastic Value Iteration

1 2 3 4 a .77 +1 b xt −1 c Reward function: R(s) =      +1 @ a4 −1 @ b4 −.03 else

80% 10% 10%

V (s) ←

  • arg max

a

γ

  • s′

P(s′|a)V (s′)

  • + R(s), γ = 1

Optimal policy π(s) → a? @ a3? @ b3?

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 14 / 17

slide-38
SLIDE 38

Stochastic Value Iteration

1 2 3 4 a .77 +1 b xt .48 −1 c Reward function: R(s) =      +1 @ a4 −1 @ b4 −.03 else

80% 10% 10%

V (s) ←

  • arg max

a

γ

  • s′

P(s′|a)V (s′)

  • + R(s), γ = 1

Optimal policy π(s) → a? @ a3? @ b3?

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 14 / 17

slide-39
SLIDE 39

Values and Policy Examples

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 15 / 17

slide-40
SLIDE 40

Values and Policy Examples

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 15 / 17

slide-41
SLIDE 41

Values and Policy Examples

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 15 / 17

slide-42
SLIDE 42

Values and Policy Examples

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 15 / 17

slide-43
SLIDE 43

Values and Policy Examples

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 15 / 17

slide-44
SLIDE 44

Values and Policy Examples

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 15 / 17

slide-45
SLIDE 45

Values and Policy Examples

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 15 / 17

slide-46
SLIDE 46

Markov Decision Processes Summary

Fully observable: s1, . . . , sn a1, . . . , am Stochastic P(s′|a, s) Reward R(s) Objective maxπ E [∞

t=0 γtRt |so = s ] .

Value iteration V (s) Converges to optimal policy, π = arg max . . .

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 16 / 17

slide-47
SLIDE 47

Partially Observable MDPs

Günay

  • Ch. 17 – Planning Under Uncertainty

Spring 2013 17 / 17