Learning more from end-users and teachers
Oregon State University AI and EUSES Groups
End-Users and Teachers 1
Tom Dietterich
- n behalf of
Learning more from end-users and teachers Oregon State University - - PowerPoint PPT Presentation
Learning more from end-users and teachers Oregon State University AI and EUSES Groups Tom Dietterich on behalf of Alan Fern, Kshitij Judah, Saikat Roy, Joe Selman Weng-Keen Wong, Ian Oberst, Shubumoy Das, Travis Moore, Simone Stumpf, Kevin
End-Users and Teachers 1
End-Users and Teachers 2
End-Users and Teachers 3
Features are words, n-grams, etc.
End-Users and Teachers 4
End-Users and Teachers 5
End-Users and Teachers 6
End-Users and Teachers 7
End-Users and Teachers 8
End-Users and Teachers 9
End-Users and Teachers 10
End-Users and Teachers 11
End-Users and Teachers 12
End-Users and Teachers 13
End-Users and Teachers 14
End-Users and Teachers 15
End-Users and Teachers 16
End-Users and Teachers 17
End-Users and Teachers 18
LWLR-FL Gains Over Baseline
‐0.2 ‐0.1 0.1 0.2
Participants (not in the same order) Gain over Baseline (Macro‐F1)
SVM-M1M2 Gains Over Baseline
‐0.2 ‐0.1 0.1 0.2
Participants (not in the same order) Gain over Baseline (Macro‐F1)
Variation in Macro-F1 with r for SVM-M1M2
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.002 0.006 0.010 0.040 0.080 0.500 1.500 2.500 3.500 4.500
r Macro F1
Participant 23165 Participant 19162 Participant 19094
Variation in Macro-F1 with k for LWLR-FL
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
k2 Macro F1
Participant 23165 Participant 19162 Participant 19094
End-Users and Teachers 19
End-Users and Teachers 20
End-Users and Teachers 21
End-Users and Teachers 22
Set of Horn clauses
No functions No constants (only variables)
Subsumption is required to be a one-to-one mapping For example:
Theory: P(X,Y), P(Y,Z) ⇒ Q(X,Z) D-subsumes P(1,2), P(2,3) ⇒ Q(1,3) Does not D-subsume P(a,b), P(b,b) ⇒ Q(a,b)
Every theory under normal semantics has an equivalent theory that
End-Users and Teachers 23
Propositional Horn theories can be learned in polynomial time using
Equivalence Query (EQ):
Ask teacher if theory T is equivalent to the correct theory
If No, returns a counter-example
Membership Query (MQ):
Ask teacher if example X is a positive example of the correct theory
Non-recursive function free first-order Horn definitions (single
General first-order Horn theories can be learned in polynomial time
End-Users and Teachers 24
End-Users and Teachers 25
End-Users and Teachers 26
Given two positive examples and , returns if there is no
: 1, , 1, ,
: 2, , 2, ,
Mapping:
End-Users and Teachers 27
End-Users and Teachers 28
End-Users and Teachers 29
End-Users and Teachers 30
, … , ,
, … , ,
End-Users and Teachers 31
If visits state with non-zero probability, then return action Else, return , which means “bad state”
End-Users and Teachers 32
End-Users and Teachers 33
End-Users and Teachers 34
End-Users and Teachers 35
Sample Average approximation (Pegasus-approach) for stochastic
End-Users and Teachers 36
Executes policy until confidence falls below an automatically-adjusted
End-Users and Teachers 37
End-Users and Teachers 38
End-Users and Teachers 39
End-Users and Teachers 40
End-Users and Teachers 41
End-Users and Teachers 42
End-Users and Teachers 43
End-Users and Teachers 44
End-Users and Teachers 45
End-Users and Teachers 46
End-Users and Teachers 47
Learner has access to an MDP and can learn via standard exploration
Good actions in state : Bad actions in state : Feedback: , , Either or can be empty
End-Users and Teachers 48
Wargus: Open Source
We provide a GUI
End-Users and Teachers 49
= Critiquing examples Observed , ,
, tuples along the Learner’s
Θ, , Θ, 1 , where
Θ, is the estimated expected return of policy
Evaluated via off-policy importance sampling [Peshkin & Shelton, 2002]
Θ, is the log likelihood of the T
Θ, ∑ log 1
Map 1 Map 2
End-Users and Teachers 50
End-Users and Teachers 51
Where shrinks the weights, which causes the policy to become
End-Users and Teachers 52
Pure Supervised = no practice session ( 0) Pure RL = no critiques 1 Combined = includes practice and critiques 0.3
6 with CS background 4 no CS background
30 minutes total for supervised 60 minutes for combined (30 minutes of practice)
End-Users and Teachers 53
End-Users and Teachers 54
End-Users and Teachers 55
End-Users and Teachers 56
End-Users and Teachers 57
End-Users and Teachers 58
End-Users and Teachers 59
1 2 3 4 5 6 7 8 9 10
50 80 100 Number of Users Health Difference
Supervised Combined
End-Users and Teachers 60
1 2 3 4 5 6 7 8 9 10
50 80 100 Number of Users Health Difference
Supervised Combined
End-Users and Teachers 61
1 2 3 4 5 6 7 8 9 10
50 80 100 Number of Users Health Difference
Supervised Combined
End-Users and Teachers 62
End-Users and Teachers 63
End-Users and Teachers 64
End-Users and Teachers 65
End-Users and Teachers 66