Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE - - PowerPoint PPT Presentation

choosing sample size for knowledge tracing
SMART_READER_LITE
LIVE PREVIEW

Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE - - PowerPoint PPT Presentation

Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE Motivation BKT parameters are inferred from data But best solution for a given data set may not quite match the parameters that actually generated it ( sampling error )


slide-1
SLIDE 1

Choosing Sample Size for Knowledge Tracing Models

DERRICK COETZEE

slide-2
SLIDE 2

Motivation

  • BKT parameters are inferred from data
  • But best solution for a given data set may not quite match

the parameters that actually generated it (sampling error)

0,0,0,0,0 0,0,0,0,0 0,1,1,0,1 0,1,0,0,0 0,0,1,1,0 5 students, 5 problems each, 25 bits of data prior = 0.205 learning = 0.010 guess = 0.142 slip = 0.031 4 parameters, 3 decimal digits each, 39.9 bits of data

Not even possible for all parameter sets to be represented!

slide-3
SLIDE 3

Questions

  • So how much data is needed for accurate estimates?
  • And do the parameter values affect how much you need?
  • Can we give confidence intervals for parameters?
slide-4
SLIDE 4
slide-5
SLIDE 5

Normal distribution over samples

  • Mean is almost always near true

generating value

  • Standard deviation can be used to

describe variation of estimates

  • Can use 68–95–99.7 rule for

confidence intervals

slide-6
SLIDE 6
slide-7
SLIDE 7

Variation does depend on parameter values

  • Each parameter behaves

differently

  • Best estimates for parameters

near zero/one, worst in 05-0.8 range

slide-8
SLIDE 8
slide-9
SLIDE 9

There are interactions between parameter values

  • Can’t just precompute a table of

stddevs for each parameter 

  • Complex relationship, analytical

approach probably infeasible

  • But at least there is continuity

with small rates of change

slide-10
SLIDE 10
slide-11
SLIDE 11

Sample size recommendations

  • Stddev proportional to 1/sqrt(n)
  • Must increase sample size by factor
  • f 4 to improve error by factor of 2
  • Small data sets (<1000 students)

will not give even one sigfig in all parameters

  • Question systems based on small

classes!

slide-12
SLIDE 12
slide-13
SLIDE 13

No interaction between sample size and parameters

  • Change sample size without changing

parameters → predictable variation in error

  • Gives an approach to estimate error on

real-world data sets:

  • Take samples with replacement, infer

parameters for each, compute stddev

  • Scale using 1/sqrt(n) to estimate stddevs at
  • ther sample sizes
slide-14
SLIDE 14

Knowledge Tracing for Interacting Student Pairs

DERRICK COETZEE

slide-15
SLIDE 15

Motivation

  • Standard Bayesian knowledge tracing uses fixed

learning rate parameter to capture all learning

slide-16
SLIDE 16

Motivation

  • One way to improve: use information on course

materials viewed

slide-17
SLIDE 17

Motivation

  • What about peer interaction (e.g. forums/chat)?
  • Not fixed/static like instructional materials
  • The level of knowledge of the other student is important
  • Use our BKT model of the other student’s knowledge!
slide-18
SLIDE 18

Pair interaction scenario

  • Simple case of student interaction
  • Two students are paired and always interact between

each item (no interactions with others)

Do exercise Learn independently Interact with partner Do exercise Learn independently

slide-19
SLIDE 19

Pair interaction scenario

  • Model independent learning and interaction stages
slide-20
SLIDE 20

Pair interaction scenario

  • Model independent learning and interaction stages
  • New parameters: teach, mislead

Knows Other student knows Probability knows after interaction No No Yes Yes 1 No Yes teach Yes No 1−mislead

slide-21
SLIDE 21

Results: Preliminary simulations

  • 5-parameter system (prior, learn, guess, slip, teach)
  • forget, mislead parameters fixed at zero
  • Generate synthetic data, run EM from generating values
  • Same behavior as classic system when teach = 0
  • Unstable when teach > 0
  • Converges to trivial solution prior=learn=teach=1, slip=proportion

incorrect responses

  • Occurs for both small and large teach parameters
slide-22
SLIDE 22

Results: Preliminary simulations

  • 4-parameter system (learn, guess, slip, teach)
  • forget, mislead, prior fixed at zero
  • For small teach values (e.g. 0.05), teach converges to zero
  • Yields nontrivial solutions for large teach values, but other

parameters absorb some of the teach:

  • learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000, 100 students →

learn=0.1586, guess=0.1648, slip=0.0856, teach=0.6481

  • learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000, 1000 students →

learn=0.1643, guess=0.1940, slip=0.1102, teach=0.7225

slide-23
SLIDE 23

Results: Preliminary simulations

  • 4-parameter system (learn, guess, slip, teach) with 10000

students and high teach

  • prior=0.0000, learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000 →

prior=0.2184, learn=0.0841, guess=0.1239, slip=0.2658, teach=0.8793

  • prior and slip have high error, but learning/guess/teach are good
  • teach accuracy increases dramatically with sample size
slide-24
SLIDE 24

Possible solutions

  • Answer items between

independent learning and interaction (more observed data)

  • Mentor/mentee model:

knowledge flows in only one direction

  • Eliminate different parameters, or

combine parameters to create lower-dimensional space

slide-25
SLIDE 25

Future work

  • Determine whether interaction model produces better

predictions on synthetic data

  • Gather real-world pair interaction data using MOOCchat

tool

  • Determine whether pair interaction produces better predictions
  • Typical values, appropriate interpretations for teach and mislead

parameters?

  • Generalize to more complex interactions