 
              ✘ ✔ ✍ ☛ ✎ ✡ ✏ ✠ ✑ ✟ ✒ ✞ ✓ ✝ ✆ ✌ ✕ ☎ ✖ ✄ ✗ ✂ ✘ ✁ ✙ � ✘ ✘ ✘ ☞ 1 Introduction 2 Introduction E R S V I T I N A U S Learning Bayesian Networks S S I A S R N A E V I With Hidden Variables for User Modeling Barbara Großmann-Hutter, Anthony Jameson, and Frank Wittig Department of Computer Science, University of Saarbrücken, Germany http://w5.cs.uni-sb.de/~ready/ (Slides, etc.) Overview 1. Example domain and 1. Conclusions experiment 2. (Optional:) Why learn about 2. Modeling the results by users−in−general? learning Bayes nets Proposal 1 and issues Proposal 2 and issues ... Table of Contents 2 Introduction 1 [Title Page] 1 Table of Contents 2 Experiment: Method 3 Experimental Setup 3 Stepwise vs. Bundled Instructions 5 Variables in Experiment 6 Experiment: Results 7 Main Results 7 Learning Bayes Nets 9 1: Modeling Only Observable Variables 9 2: Hidden Theoretical Variable 11 3: Modeling Individual Differences 14 4: Constraining the Nature of Relationships 19 5: Choosing Learning Methods Flexibly 23 Conclusions 25 Conclusions 25 Why Learn About Users-in-General? 26 Learning About Individual Users 26 Learning About Users in General 27 Which Approach to Use? 28
3 Experiment: Method 4 Experiment: Method Experimental Setup (1) 3 Experimental Setup (2) 4
✜ ✚ ✜ ✜ ✚ ✜ ✚ ✚ 5 Experiment: Method 6 Stepwise vs. Bundled Instructions 5 Stepwise: Bundled: : Set X to 3 : Set X to 3 , set M to 1 , set V to 4 ✛ : ... OK ✛ : ... ... ... Done : Set M to 1 ✛ : ... OK : Set V to 4 ✛ : ... Done Variables in Experiment 6 Independent variables Dependent variables (selection) 1. Presentation mode Stepwise vs. bundled 1. Total time to execute an instruction sequence Including "OK"s, etc. 2. Number of steps in task 2, 3 or 4 steps 2. Error in main task Buttons not pressed, or 3. Distraction by secondary task wrongly pressed No secondary task vs. monitor the flashing lights
✦ ✢ ✥ ✤ ✣ 7 Experiment: Results 8 Experiment: Results Main Results (1) 7 Three-step sequences: Three-step sequences: 6000 6000 40 Execution time (msec) Execution time (msec) 5000 5000 30 4000 4000 Errors (%) 3000 3000 20 2000 2000 10 1000 1000 0 0 0 No No Yes Yes No Yes Distraction? Distraction? Distraction? Main Results (2) 8 Four-step sequences: 50 7000 Three-step sequences: 6000 40 6000 40 Execution time (msec) Execution time (msec) 5000 5000 Errors (%) 30 30 Two-step sequences: 4000 4000 Execution time (msec) Errors (%) 3000 20 3000 20 3000 20 Errors (%) 2000 2000 2000 10 10 10 1000 1000 1000 0 0 0 0 0 0 No Yes No Yes No Yes No Yes No Yes No Yes Distraction? Distraction? Distraction? Distraction? Distraction? Distraction?
✜ ✜ ✜ ✜ ✜ ✜ ✜ 9 Learning Bayes Nets 10 Learning Bayes Nets 1: Modeling Only Observable Variables (1) 9 Definition Structure is specified on the basis of theoretical considerations This holds for all nets discussed here Only observable variables of experiment are included in network Positive points Learning can be done straightforwardly with many BN tools Learning is very fast (e.g., < 1 sec) Negative points Little theoretical interpretability Relatively inefficient evaluation Too many parents per node Doesn’t take into account systematic individual differences 1: Modeling Only Observable Variables (2) 10
✧ ✜ ✜ ✜ ✜ ✜ ✜ ✧ ✜ ✧ 11 Learning Bayes Nets 12 2: Hidden Theoretical Variable (1) 11 Definition Hidden variable "Working Memory Load" added Basis: − Psychological theory − Previous experimental results Learning with Russell et al.’s APN algorithm ⇒ Gradient descent Positive points Better theoretical interpretability ⇒ Easier to leverage existing psychological knowledge ⇒ Possible to add or replace variables without relearning everything from scratch Relatively efficient evaluation 2: Hidden Theoretical Variable (2) 12 Negative points Learning times several orders of magnitude greater (hours or nights) Note: Partly due to current limitations of Netica, soon to be removed Some aspects of CPTs involving the hidden variable are implausible E.g., strangely nonmonotonic relationships Individual differences are still not taken into account
✜ ✜ 13 Learning Bayes Nets 14 2: Hidden Theoretical Variable (3) 13 3: Modeling Individual Differences (1) 14 Procedure Add to each observation in the dataset a new observable feature: "Overall average execution time of the user in question" Distinction Variables that are naturally observable in an application setting Variables that can be made observable in an experimental setting How to do this: Exploit possibilities for measuring and controlling variables Ensure an appropriate number of observations from each subject and/or in each condition
✜ ✜ ★ ✜ 15 Learning Bayes Nets 16 3: Modeling Individual Differences (2) 15 Positive points Accuracy of learned net is greater Here: 50% (vs. 44%) accurate prediction of ’s execution time in training set (Not in itself surprising or significant) When the individual-speed variable can be assessed (with uncertainty) in an application situation, prediction accuracy will be improved Negative points CPTs are still sometimes implausible 3: Modeling Individual Differences (3) 16
17 Learning Bayes Nets 18 3: Modeling Individual Differences (4) 17 3: Modeling Individual Differences (5) 18
✜ ✜ ✜ ✩ ✜ 19 Learning Bayes Nets 20 4: Constraining the Nature of Relationships (1) 19 Basic idea Formulate theoretically motivated qualitative constraints E.g., "More steps ⇒ Higher WM load" Ensure that only networks that (almost) satisfy these constraints can be learned Procedure 1. Translate qualitative formulations of constraints into quantitative inequalities concerning conditional probabilities See Druzdzel & van der Gaag (UAI95) 2. Define a corresponding penalty term for nets that violate a constraint 3. Factor in the penalty term when determining the next step in the gradient descent 4. (Strategy tried up to now:) Give the penalty term less weight as the search proceeds Motivation: Otherwise it might take forever to find a solution 4: Constraining the Nature of Relationships (2) 20 Positive points The learned nets do satisfy the constraints better Negative points There are still some constraint violations
✩ ✩ 21 Learning Bayes Nets 22 4: Constraining the Nature of Relationships (3) 21 4: Constraining the Nature of Relationships (4) 22
✩ ✩ ✜ ✜ 23 Learning Bayes Nets 24 5: Choosing Learning Methods Flexibly (1) 23 Basic idea Each CPT can be seen as a learning problem with its own specific features So why not choose the most suitable learning technique for each CPT (cf. Musick, KDD96)? Example: If you think that A and B have a linear influence on C , use linear regression to estimate the parameters Simple application here 1. For CPTs that involve only observable variables, use simple methods 2. Then fix these CPTs before starting to use gradient descent Positive points Saves a lot of learning time Here: about 1/3 Perhaps better prediction of extreme observations? 5: Choosing Learning Methods Flexibly (2) 24
✫ ✜ ✫ ✫ ✫ ✫ ✩ ✜ ✜ ✪ ✜ ✜ ✜ ✜ ✩ ✪ 25 Conclusions 26 Conclusions 25 What have we done? First(?) example of learning an BN with a hidden variable for user modeling Example of using BN learning to explain results of a psychological experiment Identification of several problems that seem especially important for BN learning in this context Outline of briefly tested possible solutions to these problems What do we have to do now? Investigate possible answers more thoroughly In particular perform thorough and systematic evaluations Look into further issues of this sort E.g., What is the best criterion here for evaluating a learned net? Should it be evaluated in terms of success at the particular tasks for which the net is to be used? Cf. Greiner et al. (UAI97); Kontkanen et al. (UAI99) Why Learn About Users-in-General? Learning About Individual Users 26 ’S PREFERENCES OR BEHAVIORAL REGULARITIES APPLICATION OF LEARNED LEARNING ABOUT KNOWLEDGE ABOUT USAGE DATA FROM A DECISION-RELEVANT SINGLE USER PREDICTIONS FOR
✪ ✜ ✜ ✬ ✜ ✬ ✬ ✬ ✜ ✜ ✩ ✜ ✬ ✩ 27 Why Learn About Users-in-General? 28 Learning About Users in General 27 MODEL EMBODYING KNOWLEDGE OF USERS IN GENERAL LEARNING ABOUT USERS IN GENERAL GENERALLY RELEVANT PROPERTIES OF USAGE DATA FROM A REPRESENTATIVE SAMPLE OF USERS INTERPRETATION OF ’S PREDICTIONS FOR ON DATA WITH GENERAL BASIS OF GENERAL MODEL MODEL USAGE DATA FROM A DECISION-RELEVANT SINGLE USER PROPERTIES OF Which Approach to Use? 28 When to learn for users in general? Useful generalizations can be made about all users These generalizations are not obvious but must be learned from data Only limited data is available about any given user When to learn for each individual user? There are few nontrivial generalizations Individual users differ not only in details but in their overall structure, strategies, etc. A reasonably large about of data is available for each user
Recommend
More recommend