Learning Structured Decision Problems with Unawareness Craig Innes - PowerPoint PPT Presentation

Learning Structured Decision Problems with Unawareness Craig Innes (craig.innes@ed.ac.uk), Alex Lascarides (alex@inf.ed.ac.uk) Institute for Language, Cognition and Computation University of Edinburgh 1

Why Unawareness? Fertiliser Precipitation Grain Yield Protein R X = { Prec , Protein , Yield } A = { Grain , Fert } scope ( R ) = { Yield , Protein } Pa Prot = { Grain } P ( Prot = p | Grain = g ) = θ p | g 2

Why Unawareness? Local Soil Insect Fertiliser Pesticide Precipitation Concern Type Prevalence Bad Nitrogen Grain Harrow Fungicide Temperature Infestation Press Gross Protein Fungus Weeds Crops Yield R X 0 ⊆ X + A 0 ⊆ A + scope 0 ( R ) ⊆ scope + ( R ) Pa Prot = { Grain } P ( Prot = p | Grain = g ) = θ p | g 2

Contributions Our agent learns an interpretable model of a decision problem incrementally via evidence from domain trials and expert advice . 3

Contributions Our agent learns an interpretable model of a decision problem incrementally via evidence from domain trials and expert advice . Evidence may reveal actions/variables the agent was completely unaware of prior to learning. 3

Contextual Advice Types of Advice 1. Advice on Better Actions 2. Resolving Misunderstandings 3. Unexpected Rewards 4. Unknown Effects 4

Contextual Advice - Better Action If agent’s performance in last k trials is below threshold β of true policy π + , say: 5

Contextual Advice - Better Action If agent’s performance in last k trials is below threshold β of true policy π + , say: “At time t you should have done a ′ = � A 1 = 0 , A 2 = 1 , A 3 = 0 � rather than a t ” 5

Contextual Advice - Better Action If agent’s performance in last k trials is below threshold β of true policy π + , say: “At time t you should have done a ′ = � A 1 = 0 , A 2 = 1 , A 3 = 0 � rather than a t ” • Action variable A 3 is part of the problem ( A 3 ∈ A ) • A 3 is relevant ( ∃ X ∈ scope ( R ) , anc ( A 3 , X )) • There exists a better reward ( ∃ s , s [ B t ] = s t [ B t ] ∧ R + ( s ) > r t ) • a ′ has a greater expected utility than a t ( EU ( a ′ | s ) > EU ( a t | s )) 5

Conserving Previous Beliefs P (Pa Yield | D 0: t ) Pa Yield = ∅ Fertiliser Precipitation Grain Pa Yield = { Fert } Yield Protein . . . R Pa Yield = { Fert , Prec , Grain } 6

Conserving Previous Beliefs P (Pa Yield | D 0: t ) Pa Yield = ∅ Pa Yield = { Fungus } Fertiliser Precipitation Grain Fungus Pa Yield = { Fert } Pa Yield = { Fert , Fungus } Yield Protein . . . R Pa Yield = { Fert , Prec , Grain } Pa Yield = { Fert , Prec , Grain , Fungus } 6

Conserving Previous Beliefs P (Pa Yield | D 0: t ) Pa Yield = ∅ Pa Yield = { Fungus } Fertiliser Precipitation Grain Fungus Pa Yield = { Fert } Pa Yield = { Fert , Fungus } Yield Protein . . . R Pa Yield = { Fert , Prec , Grain } Pa Yield = { Fert , Prec , Grain , Fungus }  (1 − ρ ) P old ( Pa X | D 0: t ) if Fungus / ∈ Pa X  P new ( Pa X ) = ′ ′ ρ P old ( Pa X | D 0: t ) if Pa X = Pa X ∪ { Fungus }  6

Experiments Randomly Generated Networks: 12 - 36 Variables • 12 - 36 Variables • 3000 Trials • ǫ -greedy strategy • Expert Aid β = 0 . 1 B3 O2 A11 B7 B2 A3 B6 A10 A1 A5 B1 O3 A7 A12 B9 B8 B5 O5 B12 O4 A8 O10 O8 A2 A4 A9 B10 O7 O11 A6 B4 O12 O6 B11 O9 O2 O1 R R Start Learning Goal 7

Results 60 50 Cumulative Reward 40 30 20 default 10 truePolicy random 0 0 500 1000 1500 2000 2500 3000 t 8

Results 60.0 57.5 55.0 Cumulative Reward 52.5 50.0 47.5 45.0 default nonCon 42.5 nonRelevant 40.0 0 500 1000 1500 2000 2500 3000 t 8

Results 60.0 57.5 Cumulative Reward 55.0 52.5 50.0 47.5 default 45.0 lowTolerance 42.5 highTolerance 40.0 0 500 1000 1500 2000 2500 3000 t 8

Conclusions and Contact Details + Paper Link Paper Learning Structured Decision Problems with Unawareness Authors Craig Innes (craig.innes@ed.ac.uk) Alex Lascarides (alex@inf.ed.ac.uk) Poster Session: 6:30pm-9pm, Pacific Ballroom #35 9

Learning Structured Decision Problems with Unawareness Craig Innes - PowerPoint PPT Presentation

Learning Structured Decision Problems with Unawareness Craig Innes (craig.innes@ed.ac.uk), Alex Lascarides (alex@inf.ed.ac.uk) Institute for Language, Cognition and Computation University of Edinburgh 1 Why Unawareness? Fertiliser

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

A Study on the Unawareness of Shared Photos in Social Network Services Benjamin Henne * , Marcel

Unawareness In Multi-Agent Systems with Partial Valuations Line van den Berg, Manuel Atencia and

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Structured Electronic Design Structured Electronic Design ET 8016 5 ECTS credits 1

L101: Introduction to Structured Prediction Ryan Cotterell What is structured prediction?

Semi-structured data Data is not just text, but is not as well- Semi-structured data

Introduction to SparkSQL Structured Data Processing in Spark 1 Structured Data Processing A

Variational Inference for Tutorial Outline Structured NLP Models 1. Structured Models and Factor

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

PCP Lecture 26 And Hardness of Approximation 1 Promise Problems 2 Promise Problems Decision

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Solving Percent Problems Word Problems Find a Pattern Estimation Problems Fraction Problems

Facing the Unimaginable: Are You Insured for Business Disruption During a Pandemic? Business

Biotechnology for the Eucalyptus Biorefinery Ana Gutirrez, Jos C del Ro IRNAS, CSIC,

Preventing Valley Fever in Construction Workers Occupational Health Branch Audio Instructions To

SciForum Mol2Net Isolation of native Aspergillus niger from Ecuadorian Amazon to produce citric

Paired t-test STAT 401 - Statistical Methods for Research Workers Jarad Niemi Iowa State

Getting Rid of Data Tova Milo Tel Aviv University The Big Data Era From sports, to health

Defining your own build system With Shake Neil Mitchell http://shakebuild.com Who has heard of

Addressing Air Quality in Dental Offices with a focus on COVID-19 Richard Greenwood There are