Introduction E R S V I T I N A U S Learning Bayesian - - PDF document

introduction
SMART_READER_LITE
LIVE PREVIEW

Introduction E R S V I T I N A U S Learning Bayesian - - PDF document

1 Introduction 2 Introduction E R S V I T I N A U S Learning Bayesian Networks S S


slide-1
SLIDE 1

1 Introduction 2

Introduction

U

  • N

I

V

E

R

S

I

T

A

S

S

A

R

A

V

I

E

N

S

I

S

Learning Bayesian Networks With Hidden Variables for User Modeling

Barbara Großmann-Hutter, Anthony Jameson, and Frank Wittig Department of Computer Science, University of Saarbrücken, Germany http://w5.cs.uni-sb.de/~ready/ (Slides, etc.)

Overview

  • 1. Example domain and

experiment

  • 2. Modeling the results by

learning Bayes nets Proposal 1 and issues Proposal 2 and issues ...

  • 1. Conclusions
  • 2. (Optional:) Why learn about

users−in−general?

Table of Contents

2

Introduction 1

[Title Page] 1 Table of Contents 2

Experiment: Method 3

Experimental Setup 3 Stepwise vs. Bundled Instructions 5 Variables in Experiment 6

Experiment: Results 7

Main Results 7

Learning Bayes Nets 9

1: Modeling Only Observable Variables 9 2: Hidden Theoretical Variable 11 3: Modeling Individual Differences 14 4: Constraining the Nature of Relationships 19 5: Choosing Learning Methods Flexibly 23

Conclusions 25

Conclusions 25

Why Learn About Users-in-General? 26

Learning About Individual Users 26

Learning About Users in General 27

Which Approach to Use? 28

slide-2
SLIDE 2

3 Experiment: Method 4

Experiment: Method

Experimental Setup (1)

3

Experimental Setup (2)

4

slide-3
SLIDE 3

5 Experiment: Method 6

Stepwise vs. Bundled Instructions

5

Stepwise:

: Set X to 3

✛ : ... OK ✚

: Set M to 1

✛ : ... OK ✚

: Set V to 4

✛ : ... Done

Bundled:

: Set X to 3, set M to 1, set V to 4

✛ : ... ... ... Done

Variables in Experiment

6

Independent variables

  • 1. Presentation mode

Stepwise vs. bundled

  • 2. Number of steps in task

2, 3 or 4 steps

  • 3. Distraction by secondary task

No secondary task vs. monitor the flashing lights

Dependent variables (selection)

  • 1. Total time to execute an

instruction sequence Including "OK"s, etc.

  • 2. Error in main task

Buttons not pressed, or wrongly pressed

slide-4
SLIDE 4

7 Experiment: Results 8

Experiment: Results

Main Results (1)

7

Distraction?

No Yes

Execution time (msec)

1000 2000 3000 4000 5000 6000

Three-step sequences:

Distraction?

No Yes

Errors (%)

10 20

30

40

Distraction?

No Yes

Execution time (msec)

1000 2000 3000 4000 5000 6000

Three-step sequences:

Main Results (2)

8 Distraction?

No Yes

Errors (%)

10 20

Distraction?

No Yes

Execution time (msec)

1000 2000 3000

Two-step

sequences:

Distraction?

No Yes

Errors (%)

10 20 30 40

Distraction?

No Yes

Execution time (msec)

1000 2000 3000 4000 5000 6000

Three-step sequences:

Distraction?

No Yes

Errors (%)

10 20 30 40 50

Distraction?

No Yes

Execution time (msec)

1000 2000 3000 4000 5000 6000 7000

Four-step sequences:

slide-5
SLIDE 5

9 Learning Bayes Nets 10

Learning Bayes Nets

1: Modeling Only Observable Variables (1)

9

Definition

Structure is specified on the basis of theoretical considerations This holds for all nets discussed here

Only observable variables of experiment are included in network

Positive points

Learning can be done straightforwardly with many BN tools

Learning is very fast (e.g., < 1 sec)

Negative points

Little theoretical interpretability

Relatively inefficient evaluation Too many parents per node

Doesn’t take into account systematic individual differences

1: Modeling Only Observable Variables (2)

10

slide-6
SLIDE 6

11 Learning Bayes Nets 12

2: Hidden Theoretical Variable (1)

11

Definition

Hidden variable "Working Memory Load" added Basis: − Psychological theory − Previous experimental results

Learning with Russell et al.’s APN algorithm ⇒

Gradient descent

Positive points

Better theoretical interpretability ⇒

Easier to leverage existing psychological knowledge ⇒

Possible to add or replace variables without relearning everything from scratch

Relatively efficient evaluation

2: Hidden Theoretical Variable (2)

12

Negative points

Learning times several orders of magnitude greater (hours or nights) Note: Partly due to current limitations of Netica, soon to be removed

Some aspects of CPTs involving the hidden variable are implausible E.g., strangely nonmonotonic relationships

Individual differences are still not taken into account

slide-7
SLIDE 7

13 Learning Bayes Nets 14

2: Hidden Theoretical Variable (3)

13

3: Modeling Individual Differences (1)

14

Procedure

Add to each observation in the dataset a new observable feature: "Overall average execution time of the user in question" Distinction

Variables that are naturally observable in an application setting

Variables that can be made observable in an experimental setting How to do this: Exploit possibilities for measuring and controlling variables Ensure an appropriate number of observations from each subject and/or in each condition

slide-8
SLIDE 8

15 Learning Bayes Nets 16

3: Modeling Individual Differences (2)

15

Positive points

Accuracy of learned net is greater Here: 50% (vs. 44%) accurate prediction of

’s execution time in training set (Not in itself surprising or significant)

When the individual-speed variable can be assessed (with uncertainty) in an application situation, prediction accuracy will be improved

Negative points

CPTs are still sometimes implausible

3: Modeling Individual Differences (3)

16

slide-9
SLIDE 9

17 Learning Bayes Nets 18

3: Modeling Individual Differences (4)

17

3: Modeling Individual Differences (5)

18

slide-10
SLIDE 10

19 Learning Bayes Nets 20

4: Constraining the Nature of Relationships (1)19

Basic idea

Formulate theoretically motivated qualitative constraints E.g., "More steps ⇒ Higher WM load"

Ensure that only networks that (almost) satisfy these constraints can be learned

Procedure

  • 1. Translate qualitative formulations of constraints into quantitative

inequalities concerning conditional probabilities See Druzdzel & van der Gaag (UAI95)

  • 2. Define a corresponding penalty term for nets that violate a constraint
  • 3. Factor in the penalty term when determining the next step in the

gradient descent

  • 4. (Strategy tried up to now:)

Give the penalty term less weight as the search proceeds Motivation: Otherwise it might take forever to find a solution

4: Constraining the Nature of Relationships (2)20

Positive points

The learned nets do satisfy the constraints better

Negative points

There are still some constraint violations

slide-11
SLIDE 11

21 Learning Bayes Nets 22

4: Constraining the Nature of Relationships (3)21

4: Constraining the Nature of Relationships (4)22

slide-12
SLIDE 12

23 Learning Bayes Nets 24

5: Choosing Learning Methods Flexibly (1)

23

Basic idea

Each CPT can be seen as a learning problem with its own specific features So why not choose the most suitable learning technique for each CPT (cf. Musick, KDD96)? Example: If you think that A and B have a linear influence on C, use linear regression to estimate the parameters

Simple application here

  • 1. For CPTs that involve only observable variables, use simple

methods

  • 2. Then fix these CPTs before starting to use gradient descent

Positive points

Saves a lot of learning time Here: about 1/3

Perhaps better prediction of extreme observations?

5: Choosing Learning Methods Flexibly (2)

24

slide-13
SLIDE 13

25 Conclusions 26

Conclusions

25

What

have we done?

First(?) example of learning an BN with a hidden variable for user modeling

Example of using BN learning to explain results of a psychological experiment

Identification of several problems that seem especially important for BN learning in this context

Outline of briefly tested possible solutions to these problems

What

do we have to do now?

Investigate possible answers more thoroughly

In particular perform thorough and systematic evaluations

Look into further issues of this sort E.g., What is the best criterion here for evaluating a learned net? Should it be evaluated in terms of success at the particular tasks for which the net is to be used?

  • Cf. Greiner et al. (UAI97); Kontkanen et al. (UAI99)

Why Learn About Users-in-General?

Learning About Individual Users

26

USAGE DATA FROM A SINGLE USER

DECISION-RELEVANT PREDICTIONS FOR

✫ ✫

’S PREFERENCES OR BEHAVIORAL REGULARITIES LEARNING ABOUT

APPLICATION OF LEARNED KNOWLEDGE ABOUT

slide-14
SLIDE 14

27 Why Learn About Users-in-General? 28

Learning About Users in General

27

USAGE DATA FROM A REPRESENTATIVE SAMPLE OF USERS MODEL EMBODYING KNOWLEDGE OF USERS IN GENERAL LEARNING ABOUT USERS IN GENERAL

USAGE DATA FROM A SINGLE USER

DECISION-RELEVANT PROPERTIES OF

GENERALLY RELEVANT PROPERTIES OF

INTERPRETATION OF

’S DATA WITH GENERAL MODEL PREDICTIONS FOR

ON BASIS OF GENERAL MODEL

Which Approach to Use?

28

When

to learn for users in general?

Useful generalizations can be made about all users

These generalizations are not obvious but must be learned from data

Only limited data is available about any given user

When to learn for each individual user?

There are few nontrivial generalizations

Individual users differ not only in details but in their overall structure, strategies, etc.

A reasonably large about of data is available for each user