Computer Science CPSC 322
Lectur ture 2 e 24 Decis isio ion Networks a and Sequen uenti tial al Decision
- n Probl
blem ems
1
Lectur ture 2 e 24 Decis isio ion Networks a and Sequen uenti - - PowerPoint PPT Presentation
Computer Science CPSC 322 Lectur ture 2 e 24 Decis isio ion Networks a and Sequen uenti tial al Decision on Probl blem ems 1 Lect cture re Overvi rview Recap Computing single-stage optimal decision Sequential Decision
1
2
0.01 0.99 0.2 0.8 0.01 0.99 0.2 0.8
Utility
35 35 95
Probability E[U|D]
83 35 30 75 35 3 100 35 80 74.55 80.6 79.2
E(U | D = d ) = ∑ w╞ (D = d )P(w) U(w) = P(w1)×U(w1) + ….+ P(wn)×U(wn)
macro decision to be made before acting
0.01 0.99 0.2 0.8 0.01 0.99 0.2 0.8
Utility
35 35 95
Conditional probability E[U|D]
83 35 30 75 35 3 100 35 80 74.55 80.6 79.2
its parents
represent information available when the decision is made
Which way Accident Wear Pads Utility long true true long true false long false true long false false short true true short true false short false true short false false 30 75 80 35 3 95 100 Which Way W Accident A P(A|W) long long short short true false true false 0.01 0.99 0.2 0.8
Explicitly shows dependencies. E.g., which variables affect the probability
Which Way t f
The Belief and Decision Networks we have seen previously allows you to load predefined Decision networks for various domains and run queries on them. Select one of the available examples via “File -> Load Sample Problem For Deci cisi sion Netw tworks ks
ew t the C he CPT/Decision t tabl able/Utility t tabl able f for
a chanc hance/dec ecision/utility node node
at the bottom, and then click “Brief”.
the toolbar and select Brief in the dialogue box that will appear
network
See available help pages and video tutorials for more details on how to use the Bayes applet (http://www.aispace.org/bayes/)
9
Denote
1. This creates a factor on D that gives the expected utility for each di
n
X X n
,..., 1
1
=
n
X X n i i i
,..., 1
1
Includes decision vars
Which way W Accident A Pads P Utility long true true long true false long false true long false false short true true short true false short false true short false false 30 75 80 35 3 95 100 Which Way W Accident A P(A|W) long long short short true false true false 0.01 0.99 0.2 0.8
A
A
2 1
Abbreviations: W = Which Way P = Wear Pads A = Accident
the domain of the product is the union of the multiplicands’ domains
Which way W Accident A Pads P f(A,W,P) long true true long true false long false true long false false short true true short true false short false true short false false
Which way W Accident A Pads P f2(A,W,P) long true true long true false long false true long false false short true true short true false short false true short false false 30 75 80 35 3 95 100 Which way W Accident A f1(A,W) long long short short true false true false 0.01 0.99 0.2 0.8
f (A=a,P=p,W=w) = f1(A=a,W=w) × f2(A=a,W=w,P=p)
????
15
information for future actions For example: diagnostic tests
the parents of decision nodes can include random variables
Decision node: Agent decides Chance node: Chance decides
smoke, but this takes time and will delay calling
depend on a random variable (e.g. SeeSmoke )
Di is a parent of Dj any parent of Di is a parent of Dj pa(CheckSmoke) = {Report} pa(Call) = {Report, CheckSmoke, See Smoke}
What is observed often depends on previous actions
This conditional plan is referred to as a policy
Definitio ion (Poli licy) A policy specifies, for each decision node, which value it should take for every possible combination of values for its parents For instance, in our Alarm problem, specifying a policy means selecting specific values for the two decision nodes, given all possible combinations of values of their parents
Definitio ion (Poli licy) A policy specifies, for each decision node, which value it should take for every possible combination of values for its parents That is, selecting a policy means selecting either T or F for each of these entries
Definitio ion (Poli licy) A policy specifies, for each decision node, which value it should take for every possible combination of values for its parents Why do we want to do that? Because we want to enable an agent to know what to do under every “possible world” that can be observed
This policy means that when the agent has observed
This policy means that when the agent has observed
Definitio ion (Poli licy) A policy specifies, for each decision node, which value it should take for every possible combination of values for its parents Why do we want to do that? Because we want to enable an agent to know what to do under every “possible world” that can be observed
what to do for the decision to check smoke
with the symbol δcs followed by a specific number
CheckS kSmo moke ke Repor
δcs1 δcs2 δcs3 δcs4 T F
Definitio ion (Poli licy) A policy π is a sequence of δ1 ,….., δn decision functions δi : dom(pa(Di )) → dom(Di)
what to do for the decision to check smoke
with the symbol δcs followed by a specific number
CheckS kSmo moke ke Repor
δcs1 δcs2 δcs3 δcs4 T T T F F F T F T F
Definitio ion (Poli licy) A policy π is a sequence of δ1 ,….., δn decision functions δi : dom(pa(Di )) → dom(Di)
Definitio ion (Poli licy) A policy specifies, for each decision node, which value it should take for every possible combination of values for its parents That is, selecting a policy means selecting either T or F for each of these entries
Decisions for One-Off decisions
for any step of the decision sequence given any possible combination of available information
Which way Wear Pads d1 d1 d2 d2 d3 d3 d4 d4 long long short short true false true false CheckSmoke
Repo eport δcs1 δcs2 δcs3 δcs4 T T T F F F T F T F
Call
Repo eport CheckS ckS SeeS eeS δcall1 … … δcallk ….. δcalln T T T T F F F F T T F F T T F F T F T F T F T F T F T F T F T F … … … …. T T T F T F T F T T T T T T T T T
Sample policy Sample policy Sample decision
each such instantiation, the decision function could pick any of the b values
– because there are b2k possible decision functions for each decision, and a policy is a combination of d such decision functions
30
31
Repor
Ch Check ck Smoke true false true false
Repo eport CheckS ckSmoke ke SeeS eeSmoke
Ca Call
true true true true true false true false true true false false false true true false true false false false true false false false false false true false true false false false
VARs Fire Tampering Alarm Leaving Report Smoke SeeSmoke CheckS kSmo moke ke Ca Call true false true true true true true true true
Decision function δcs2 Decision function δcall20 …
Repor
Ch Check ck Smoke true false true false
Repo eport CheckS ckSmoke ke SeeS eeSmoke
Ca Call
true true true true true false true false true true false false false true true false true false false false true false false false false false true false true false false false
VARs Fire Tampering Alarm Leaving Report Smoke SeeSmoke CheckS kSmo moke ke Ca Call true false true true true true true true true
Decision function δcs2 … Decision function δcall20 …
Definition (Satisfaction of a policy) A possible world w satisfies a policy π, written w ⊧ π, if the value of each decision variable in w is the value selected by its decision function in policy π (when applied to w)
Repor
Ch Check ck Smoke true false true false
Repo eport CheckS ckSmoke ke SeeS eeSmoke
Ca Call
true true true true true false true false true true false false false true true false true false false false true false false false false false true false true false false false
VARs Fire Tampering Alarm Leaving Report Smoke SeeSmoke CheckS kSmo moke ke Ca Call true false true true true true true true true
Decision function δcs2 … Decision function δcall20
Definition (Satisfaction of a policy) A possible world w satisfies a policy π, written w ⊧ π, if the value of each decision variable in w is the value selected by its decision function in policy π (when applied to w)
B A C f3(A,B,C) t t t 0.03 t t f 0.07 f t t 0.54 f t f 0.36 t f t 0.06 t f f 0.14 f f t 0.48 f f f 0.32 A C f4(A,C) t t 0.54 t f f t f f
2 1 ) ( 2
1 1
j X dom x j X
∈
B A C f3(A,B,C) t t t 0.03 t t f 0.07 f t t 0.54 f t f 0.36 t f t 0.06 t f f 0.14 f f t 0.48 f f f 0.32 A C f4(A,C) t t 0.54 t f ? f t f f
2 1 ) ( 2
1 1
j X dom x j X
∈
C 0.32
D 0.8
Consider the last decision D to be made Find optimal decision D=d for each instantiation of D’s parents
– For each instantiation of D’s parents, this is just a single-stage decision problem
Create a factor of these maximum values: max out D
– I.e., for each instantiation of the parents, what is the best utility I can achieve by making this last decision optimally?
Recursive call to find optimal policy for reduced network (now with
node.
Keep track of decision function
1. Create a factor for each CPT and a factor for the utility 2. While there are still decision variables
Keep track of decision function
3. Sum out any remaining variable: this is the expected utility of the optimal policy. Let’s start with a simple example: only
2a: Sum out random variables that are not parents
Weather 1: Create a factor for
utility function
2b Max out last decision variable D in the total ordering Actual utility of each decision value for Umbrella
Sum out any remaining variable: this is the expected utility of the optimal policy.
1. Create a factor for each CPT and a factor for the utility 2. While there are still decision variables
Keep track of decision function
3. Sum out any remaining variable: this is the expected utility of the optimal policy. Now let’s look at a more complex example
1. Create a factor for each CPT and a factor for the utility utility function
2a: Sum out random variables that are not parents of a decision node (pink nodes in the applet)
After summing out leaving, tampering and smoke
2b
Select a variable D that corresponds to the latest decision to be made this variable will appear in a factor that
condition), no children
2b
Select a variable D that corresponds to the latest decision to be made this variable will appear in a factor that
condition), no children Here, “call”
Eliminate D (call here) by maximizing. This returns: The optimal decision function for D, arg maxD f A new factor to use in VE, maxD f
2a: Sum out random variables that are not parents of a decision node 2b Select a variable D that corresponds to the latest decision to be made:
this variable will appear in a factor that
condition), no childred here “checkSmoke
Sum out any remaining variable: this is the expected utility of the
One-Off decisions
(sequential) decisions
Sequential decision networks
Policies
for sequential decision problems with one decision variable
(just general ideas)
Source: R.E. Neapolitan, 2007
Slide 61
and allocation of resources
Energy: when and where to produce green energy most economically? Which parcels of land to purchase to protect endangered species? Urban planning: how to use budget for best development in 30 years?
Source: http://www.computational-sustainability.org/
models of Patient-Caregiver Interactions During Activities of Daily Living
decision networks that model the temporal evolution of the world)
cognitive disabilities (such as Alzheimer's) when they:
tasks that need to be completed
have already completed
Source: Jesse Hoey UofT 2007
Representation Reasoning Technique
Part short questions, part longer problems List from which I will draw the short questions is posted on Connect Practice problems also posted in Connect (“Final” folder)
See list of posted learning goals for what you should know Material on Sequential Decisions will be covered in the exam only via short questions, not problem solving questions
Practice exercises, assignments, short questions, lecture notes, book, … AISpace and Practice Exercises there! Use Office Hours (same schedule)