choosing sample size for knowledge tracing
play

Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE - PowerPoint PPT Presentation

Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE Motivation BKT parameters are inferred from data But best solution for a given data set may not quite match the parameters that actually generated it ( sampling error )


  1. Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE

  2. Motivation ◦ BKT parameters are inferred from data ◦ But best solution for a given data set may not quite match the parameters that actually generated it ( sampling error ) 0,0,0,0,0 prior = 0.205 5 students, 0,0,0,0,0 4 parameters, learning = 0.010 5 problems each, 0,1,1,0,1 3 decimal digits each, guess = 0.142 0,1,0,0,0 25 bits of data 39.9 bits of data slip = 0.031 0,0,1,1,0 Not even possible for all parameter sets to be represented!

  3. Questions ◦ So how much data is needed for accurate estimates? ◦ And do the parameter values affect how much you need? ◦ Can we give confidence intervals for parameters?

  4. Normal distribution over samples ◦ Mean is almost always near true generating value ◦ Standard deviation can be used to describe variation of estimates ◦ Can use 68 – 95 – 99.7 rule for confidence intervals

  5. Variation does depend on parameter values ◦ Each parameter behaves differently ◦ Best estimates for parameters near zero/one, worst in 05-0.8 range

  6. There are interactions between parameter values ◦ Can’t just precompute a table of stddevs for each parameter  ◦ Complex relationship, analytical approach probably infeasible ◦ But at least there is continuity with small rates of change

  7. Sample size recommendations ◦ Stddev proportional to 1/sqrt(n) ◦ Must increase sample size by factor of 4 to improve error by factor of 2 ◦ Small data sets (<1000 students) will not give even one sigfig in all parameters ◦ Question systems based on small classes!

  8. No interaction between sample size and parameters ◦ Change sample size without changing parameters → predictable variation in error ◦ Gives an approach to estimate error on real-world data sets: ◦ Take samples with replacement, infer parameters for each, compute stddev ◦ Scale using 1/sqrt(n) to estimate stddevs at other sample sizes

  9. Knowledge Tracing for Interacting Student Pairs DERRICK COETZEE

  10. Motivation ◦ Standard Bayesian knowledge tracing uses fixed learning rate parameter to capture all learning

  11. Motivation ◦ One way to improve: use information on course materials viewed

  12. Motivation ◦ What about peer interaction (e.g. forums/chat)? ◦ Not fixed/static like instructional materials ◦ The level of knowledge of the other student is important ◦ Use our BKT model of the other student’s knowledge!

  13. Pair interaction scenario ◦ Simple case of student interaction ◦ Two students are paired and always interact between each item (no interactions with others) Learn Do exercise independently Interact with partner Learn Do exercise independently

  14. Pair interaction scenario ◦ Model independent learning and interaction stages

  15. Pair interaction scenario ◦ Model independent learning and interaction stages ◦ New parameters: teach, mislead Knows Other Probability student knows after knows interaction No No 0 Yes Yes 1 No Yes teach Yes No 1−mislead

  16. Results: Preliminary simulations ◦ 5-parameter system (prior, learn, guess, slip, teach) ◦ forget, mislead parameters fixed at zero ◦ Generate synthetic data, run EM from generating values ◦ Same behavior as classic system when teach = 0 ◦ Unstable when teach > 0 ◦ Converges to trivial solution prior=learn=teach=1, slip=proportion incorrect responses ◦ Occurs for both small and large teach parameters

  17. Results: Preliminary simulations ◦ 4-parameter system (learn, guess, slip, teach) ◦ forget, mislead, prior fixed at zero ◦ For small teach values (e.g. 0.05), teach converges to zero ◦ Yields nontrivial solutions for large teach values, but other parameters absorb some of the teach: ◦ learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000, 100 students → learn=0.1586, guess=0.1648, slip=0.0856, teach=0.6481 ◦ learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000, 1000 students → learn=0.1643, guess=0.1940, slip=0.1102, teach=0.7225

  18. Results: Preliminary simulations ◦ 4-parameter system (learn, guess, slip, teach) with 10000 students and high teach ◦ prior=0.0000, learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000 → prior=0.2184, learn=0.0841, guess=0.1239, slip=0.2658, teach=0.8793 ◦ prior and slip have high error, but learning/guess/teach are good ◦ teach accuracy increases dramatically with sample size

  19. Possible solutions ◦ Answer items between independent learning and interaction (more observed data) ◦ Mentor/mentee model: knowledge flows in only one direction ◦ Eliminate different parameters, or combine parameters to create lower-dimensional space

  20. Future work ◦ Determine whether interaction model produces better predictions on synthetic data ◦ Gather real-world pair interaction data using MOOCchat tool ◦ Determine whether pair interaction produces better predictions ◦ Typical values, appropriate interpretations for teach and mislead parameters? ◦ Generalize to more complex interactions

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend