[PPT] - De Decision cision Th Theo eory: ry: Se Sequ quential ential PowerPoint Presentation

SLIDE 1

De Decision cision Th Theo eory: ry: Se Sequ quential ential De Decisions cisions

Co Computer ter Sc Science ce cpsc3 c322, 22, Lectur ture e 34 (Te Text xtbo book

k Chpt 9.3)

April, l, 12, 2010

SLIDE 2

“Single” Action vs. Sequence of Actions

Set of primitive decisions that can be treated as a single macro decision to be made before acting

Agent makes observations
Decides on an action
Carries out the action

SLIDE 3

Lecture cture Ov Overview view

Sequential Decisions
Representation
Policies
Finding Optimal Policies

SLIDE 4

Sequential decision problems

A sequential decision problem consists of a

sequence of decision variables D1 ,…..,Dn.

Each Di has an information set of variables pDi,

whose value will be known at the time decision Di is made.

SLIDE 5

Sequential decisions : Simplest possible

Only one decision! (but different from one-off decisions)
Early in the morning. Shall I take my umbrella today? (I’ll

have to go for a long walk at noon)

Relevant Random Variables?

SLIDE 6

Policies for Sequential Decision Problem: Intro

A policy specifies what an agent should do under each

circumstance (for each decision, consider the parents of the decision node) In the Umbrella “degenerate” case:

D1 pD1 How many policies? One possible Policy

SLIDE 7

Sequential decision problems: “complete” Example

A sequential decision problem consists of a sequence of

decision variables D1 ,…..,Dn.

Each Di has an information set of variables pDi, whose

value will be known at the time decision Di is made. No-forgetting decision network:

decisions are totally ordered
if a decision Db comes before Da ,then
Db is a parent of Da
any parent of Db is a parent of Da

SLIDE 8

Policies for Sequential Decision Problems

A policy is a sequence of δ1 ,….., δn decision functi

ction

ns

δi : dom(pDi ) → dom(Di )

This policy means that when the agent has observed

O dom(pDi ) , it will do δi(O) Example:

Report port Check Smoke Report port CheckSm Smoke SeeSm Smoke Call true true true true true false true false true true false false false true true false true false false false true false false false true false true false true false false false

How many policies?

SLIDE 9

Lecture cture Ov Overview view

Recap
Sequential Decisions
Finding Optimal Policies

SLIDE 10

When does a possible world satisfy a policy?

A possible world specifies a value for each random

variable and each decision variable.

Po

Possibl ible e wo world ld w satisf isfies ies poli licy δ , written w ╞ δ if the value of each decision variable is the value selected by its decision function in the policy (when applied in w).

Report port Chec eck Smoke true false true false

Report t CheckSmo ckSmoke ke SeeSmo moke ke

Call

true true true true true false true false true true false false false true true false true false false false true false false false true false true false true false false false

VARs Fire Tampering Alarm Leaving Report Smoke SeeSmoke Chec eckSm Smok

ke

Call true false true true false true true true true

Decision function for… Decision function for…

SLIDE 11

When does a possible world satisfy a policy?

Possible world w satisfies policy δ , written w ╞ δ if the

value of each decision variable is the value selected by its decision function in the policy (when applied in w).

Report port Chec eck Smoke true false true false

Report t CheckSmo ckSmoke ke SeeSmo moke ke

Call

true true true true true false true false true true false false false true true false true false false false true false false false true false true false true false false false

Decision function for… Decision function for…

VARs Fire Tampering Alarm Leaving Report Smoke SeeSmoke Chec eckSmo Smoke Call true false true true true true true true true

SLIDE 12

Expected Value of a Policy

Each possible world w has a probability P(w) and a utility

U(w)

The expected utility of policy δ is
The optimal policy is one with the expected utility.

SLIDE 13

Lecture cture Ov Overview view

Recap
Sequential Decisions
Finding Optimal Policies

(Efficiently)

SLIDE 14

Complexity of finding the optimal policy: how many policies?

If a decision D has k binary parents, how many assignments of values

to the parents are there?

If there are b possible actions (possible values for D), how many

different decision functions are there?

If there are d decisions, each with k binary parents and b possible

actions, how many policies are there?

How many assignments to parents?
How many decision functions? (binary

decisions)

How many policies?

SLIDE 15

Finding the optimal policy more efficiently: VE

1. Create a factor for each conditional probability table and a

factor for the utility. 2.

2. Sum out random variables that are not parents of a decision

node. 3.

3. Eliminat

ate (aka sum out) the decision variables 4.

4. Sum out the remaining random variables.

5.

5. Multi

tiply ply the factors tors: this is the expected utility of the optimal policy.

SLIDE 16

Eliminate the decision Variables: step3 details

Select a variable D that corresponds to the latest decision

to be made

this variable will appear in only one factor with its parents
Eliminate D by maximizing. This returns:
The optimal decision function for D, arg maxD f
A new factor to use in VE, maxD f
Repeat till there are no more decision nodes.

Report t CheckSmok

ke

Value lue

true true true false false true false false

5.0
5.6
23.7
17.5

Examp mple: e: Eliminate ate Ch CheckSmo Smoke ke

Report

Chec eckSm Smok

ke

true false Report

Value lue

true false

New factor Decision Function

SLIDE 17

VE elimination reduces complexity of finding the

ptimal policy
We have seen that, if a decision D has k binary parents, there

are b possible actions, If there are d decisions,

Then there are: (b 2k)d

policies

Doing variable elimination lets us find the optimal policy after

considering only d .b 2k policies (we eliminate one decision at a time)

VE is

is much more effici icien ent than searching through policy space.

However, this complexity is still doubly-exp

expon

nenti

ential al we'll only be able to handle relatively small problems.

SLIDE 18

SLIDE 19

CPSC 322, Lecture 4 Slide 19

Learning Goals for today’s class

Yo You u can an:

Represent sequen

entia tial l decisi sion

n problems

ems as decision networks. And explain the non forgettin etting g proper erty ty

Verify whether a possib

ible le world satis isfie fies s a policy cy and define the expecte cted d value of a policy cy

Compute the number of policie

cies s for a decision problem

Compute

te the optimal al policy cy by Variable Elimination

SLIDE 20

La Last t cla lass

Va

Value of Informa rmatio tion n and contro trol l – textb tboo

ok

k sect t 9.4

Course

se summary ry

Assign4 due
Q4 non required – solution has been provided. Try to

solve it as you prepare for the final.

Solutions will be provided on Thur. @4
After that start Preparing for the Final
Tomorrow I will post a set of review questions and

De Decision cision Th Theo eory: ry: Se Sequ quential ential De Decisions cisions

Co Computer ter Sc Science ce cpsc3 c322, 22, Lectur ture e 34 (Te Text xtbo book

April, l, 12, 2010

“Single” Action vs. Sequence of Actions

Set of primitive decisions that can be treated as a single macro decision to be made before acting

Lecture cture Ov Overview view

Sequential decision problems

sequence of decision variables D1 ,…..,Dn.

whose value will be known at the time decision Di is made.

Sequential decisions : Simplest possible

have to go for a long walk at noon)

Policies for Sequential Decision Problem: Intro

circumstance (for each decision, consider the parents of the decision node) In the Umbrella “degenerate” case:

D1 pD1 How many policies? One possible Policy

Sequential decision problems: “complete” Example

decision variables D1 ,…..,Dn.

value will be known at the time decision Di is made. No-forgetting decision network:

Policies for Sequential Decision Problems

ction

δi : dom(pDi ) → dom(Di )

O dom(pDi ) , it will do δi(O) Example:

How many policies?

Lecture cture Ov Overview view

When does a possible world satisfy a policy?

variable and each decision variable.

Possibl ible e wo world ld w satisf isfies ies poli licy δ , written w ╞ δ if the value of each decision variable is the value selected by its decision function in the policy (when applied in w).

When does a possible world satisfy a policy?

value of each decision variable is the value selected by its decision function in the policy (when applied in w).

Expected Value of a Policy

U(w)

Lecture cture Ov Overview view

(Efficiently)

Complexity of finding the optimal policy: how many policies?

Finding the optimal policy more efficiently: VE

factor for the utility. 2.

node. 3.

ate (aka sum out) the decision variables 4.

5.

tiply ply the factors tors: this is the expected utility of the optimal policy.

Eliminate the decision Variables: step3 details

to be made

Examp mple: e: Eliminate ate Ch CheckSmo Smoke ke

VE elimination reduces complexity of finding the

are b possible actions, If there are d decisions,

policies

considering only d .b 2k policies (we eliminate one decision at a time)

is much more effici icien ent than searching through policy space.

expon

ential al we'll only be able to handle relatively small problems.

Learning Goals for today’s class

Yo You u can an:

entia tial l decisi sion

ems as decision networks. And explain the non forgettin etting g proper erty ty

ible le world satis isfie fies s a policy cy and define the expecte cted d value of a policy cy

cies s for a decision problem

te the optimal al policy cy by Variable Elimination

La Last t cla lass

Value of Informa rmatio tion n and contro trol l – textb tboo

k sect t 9.4

se summary ry

solve it as you prepare for the final.

two practice exercises on decision networks