Learning Structured Decision Problems with Unawareness Craig Innes - - PowerPoint PPT Presentation

learning structured decision problems with unawareness
SMART_READER_LITE
LIVE PREVIEW

Learning Structured Decision Problems with Unawareness Craig Innes - - PowerPoint PPT Presentation

Learning Structured Decision Problems with Unawareness Craig Innes (craig.innes@ed.ac.uk), Alex Lascarides (alex@inf.ed.ac.uk) Institute for Language, Cognition and Computation University of Edinburgh 1 Why Unawareness? Fertiliser


slide-1
SLIDE 1

Learning Structured Decision Problems with Unawareness

Craig Innes (craig.innes@ed.ac.uk), Alex Lascarides (alex@inf.ed.ac.uk)

Institute for Language, Cognition and Computation University of Edinburgh 1

slide-2
SLIDE 2

Why Unawareness?

Fertiliser Yield Grain Protein Precipitation R

X = {Prec, Protein, Yield} A = {Grain, Fert} scope(R) = {Yield, Protein} PaProt = {Grain} P(Prot = p|Grain = g) = θp|g

2

slide-3
SLIDE 3

Why Unawareness?

Local Concern Bad Press R Gross Crops Yield Fertiliser Nitrogen Protein Pesticide Infestation Fungicide Fungus Harrow Weeds Grain Insect Prevalence Precipitation Soil Type Temperature

X 0 ⊆ X + A0 ⊆ A+ scope0(R) ⊆ scope+(R) PaProt = {Grain} P(Prot = p|Grain = g) = θp|g

2

slide-4
SLIDE 4

Contributions

Our agent learns an interpretable model of a decision problem incrementally via evidence from domain trials and expert advice.

3

slide-5
SLIDE 5

Contributions

Our agent learns an interpretable model of a decision problem incrementally via evidence from domain trials and expert advice. Evidence may reveal actions/variables the agent was completely unaware of prior to learning.

3

slide-6
SLIDE 6

Contextual Advice

Types of Advice

  • 1. Advice on Better Actions
  • 2. Resolving Misunderstandings
  • 3. Unexpected Rewards
  • 4. Unknown Effects

4

slide-7
SLIDE 7

Contextual Advice - Better Action

If agent’s performance in last k trials is below threshold β of true policy π+, say:

5

slide-8
SLIDE 8

Contextual Advice - Better Action

If agent’s performance in last k trials is below threshold β of true policy π+, say: “At time t you should have done a′ = A1 = 0, A2 = 1, A3 = 0 rather than at”

5

slide-9
SLIDE 9

Contextual Advice - Better Action

If agent’s performance in last k trials is below threshold β of true policy π+, say: “At time t you should have done a′ = A1 = 0, A2 = 1, A3 = 0 rather than at”

  • Action variable A3 is part of the problem

(A3 ∈ A)

  • A3 is relevant

(∃X ∈ scope(R), anc(A3, X))

  • There exists a better reward (∃s, s[Bt] = st[Bt] ∧ R+(s) > rt)
  • a′ has a greater expected utility than at

(EU(a′|s) > EU(at|s))

5

slide-10
SLIDE 10

Conserving Previous Beliefs

Fertiliser Yield Grain Protein Precipitation R

P(PaYield|D0:t) PaYield = ∅ PaYield = {Fert} . . . PaYield = {Fert, Prec, Grain}

6

slide-11
SLIDE 11

Conserving Previous Beliefs

Fertiliser Yield Grain Protein Precipitation Fungus R

P(PaYield|D0:t) PaYield = ∅ PaYield = {Fungus} PaYield = {Fert} PaYield = {Fert, Fungus} . . . PaYield = {Fert, Prec, Grain} PaYield = {Fert, Prec, Grain, Fungus}

6

slide-12
SLIDE 12

Conserving Previous Beliefs

Fertiliser Yield Grain Protein Precipitation Fungus R

P(PaYield|D0:t) PaYield = ∅ PaYield = {Fungus} PaYield = {Fert} PaYield = {Fert, Fungus} . . . PaYield = {Fert, Prec, Grain} PaYield = {Fert, Prec, Grain, Fungus} Pnew(PaX) =    (1 − ρ)Pold(PaX|D0:t) if Fungus / ∈ PaX ρPold(Pa

X|D0:t)

if PaX = Pa

X ∪ {Fungus} 6

slide-13
SLIDE 13

Experiments

Randomly Generated Networks: 12 - 36 Variables

  • 12 - 36 Variables
  • 3000 Trials
  • ǫ-greedy strategy
  • Expert Aid β = 0.1

O2 R

Start

A1 O10 A2 O9 A3 O5 A4 A5 A6 O1 A7 O7 A8 O6 A9 A10 O4 A11 O3 A12 B11 R B12 O12 B10 O2 B7 B8 O11 B9 B1 B2 B3 B6 B4 B5 O8

Learning Goal

7

slide-14
SLIDE 14

Results

500 1000 1500 2000 2500 3000

t

10 20 30 40 50 60

Cumulative Reward default truePolicy random

8

slide-15
SLIDE 15

Results

500 1000 1500 2000 2500 3000

t

40.0 42.5 45.0 47.5 50.0 52.5 55.0 57.5 60.0

Cumulative Reward default nonCon nonRelevant

8

slide-16
SLIDE 16

Results

500 1000 1500 2000 2500 3000

t

40.0 42.5 45.0 47.5 50.0 52.5 55.0 57.5 60.0

Cumulative Reward default lowTolerance highTolerance

8

slide-17
SLIDE 17

Conclusions and Contact Details + Paper Link

Paper

Learning Structured Decision Problems with Unawareness

Authors

Craig Innes (craig.innes@ed.ac.uk) Alex Lascarides (alex@inf.ed.ac.uk)

Poster Session:

6:30pm-9pm, Pacific Ballroom #35

9