Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE

Motivation ◦ BKT parameters are inferred from data ◦ But best solution for a given data set may not quite match the parameters that actually generated it ( sampling error ) 0,0,0,0,0 prior = 0.205 5 students, 0,0,0,0,0 4 parameters, learning = 0.010 5 problems each, 0,1,1,0,1 3 decimal digits each, guess = 0.142 0,1,0,0,0 25 bits of data 39.9 bits of data slip = 0.031 0,0,1,1,0 Not even possible for all parameter sets to be represented!

Questions ◦ So how much data is needed for accurate estimates? ◦ And do the parameter values affect how much you need? ◦ Can we give confidence intervals for parameters?

Normal distribution over samples ◦ Mean is almost always near true generating value ◦ Standard deviation can be used to describe variation of estimates ◦ Can use 68 – 95 – 99.7 rule for confidence intervals

Variation does depend on parameter values ◦ Each parameter behaves differently ◦ Best estimates for parameters near zero/one, worst in 05-0.8 range

There are interactions between parameter values ◦ Can’t just precompute a table of stddevs for each parameter  ◦ Complex relationship, analytical approach probably infeasible ◦ But at least there is continuity with small rates of change

Sample size recommendations ◦ Stddev proportional to 1/sqrt(n) ◦ Must increase sample size by factor of 4 to improve error by factor of 2 ◦ Small data sets (<1000 students) will not give even one sigfig in all parameters ◦ Question systems based on small classes!

No interaction between sample size and parameters ◦ Change sample size without changing parameters → predictable variation in error ◦ Gives an approach to estimate error on real-world data sets: ◦ Take samples with replacement, infer parameters for each, compute stddev ◦ Scale using 1/sqrt(n) to estimate stddevs at other sample sizes

Knowledge Tracing for Interacting Student Pairs DERRICK COETZEE

Motivation ◦ Standard Bayesian knowledge tracing uses fixed learning rate parameter to capture all learning

Motivation ◦ One way to improve: use information on course materials viewed

Motivation ◦ What about peer interaction (e.g. forums/chat)? ◦ Not fixed/static like instructional materials ◦ The level of knowledge of the other student is important ◦ Use our BKT model of the other student’s knowledge!

Pair interaction scenario ◦ Simple case of student interaction ◦ Two students are paired and always interact between each item (no interactions with others) Learn Do exercise independently Interact with partner Learn Do exercise independently

Pair interaction scenario ◦ Model independent learning and interaction stages

Pair interaction scenario ◦ Model independent learning and interaction stages ◦ New parameters: teach, mislead Knows Other Probability student knows after knows interaction No No 0 Yes Yes 1 No Yes teach Yes No 1−mislead

Results: Preliminary simulations ◦ 5-parameter system (prior, learn, guess, slip, teach) ◦ forget, mislead parameters fixed at zero ◦ Generate synthetic data, run EM from generating values ◦ Same behavior as classic system when teach = 0 ◦ Unstable when teach > 0 ◦ Converges to trivial solution prior=learn=teach=1, slip=proportion incorrect responses ◦ Occurs for both small and large teach parameters

Results: Preliminary simulations ◦ 4-parameter system (learn, guess, slip, teach) ◦ forget, mislead, prior fixed at zero ◦ For small teach values (e.g. 0.05), teach converges to zero ◦ Yields nontrivial solutions for large teach values, but other parameters absorb some of the teach: ◦ learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000, 100 students → learn=0.1586, guess=0.1648, slip=0.0856, teach=0.6481 ◦ learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000, 1000 students → learn=0.1643, guess=0.1940, slip=0.1102, teach=0.7225

Results: Preliminary simulations ◦ 4-parameter system (learn, guess, slip, teach) with 10000 students and high teach ◦ prior=0.0000, learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000 → prior=0.2184, learn=0.0841, guess=0.1239, slip=0.2658, teach=0.8793 ◦ prior and slip have high error, but learning/guess/teach are good ◦ teach accuracy increases dramatically with sample size

Possible solutions ◦ Answer items between independent learning and interaction (more observed data) ◦ Mentor/mentee model: knowledge flows in only one direction ◦ Eliminate different parameters, or combine parameters to create lower-dimensional space

Future work ◦ Determine whether interaction model produces better predictions on synthetic data ◦ Gather real-world pair interaction data using MOOCchat tool ◦ Determine whether pair interaction produces better predictions ◦ Typical values, appropriate interpretations for teach and mislead parameters? ◦ Generalize to more complex interactions

Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE - PowerPoint PPT Presentation

Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE Motivation BKT parameters are inferred from data But best solution for a given data set may not quite match the parameters that actually generated it ( sampling error )

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Knowledge Tracing Machines: Factorization Machines for Knowledge Tracing Jill-Jnn Vie Hisashi

Advanced Ray Tracing 1 2/8/2006 Distributed Ray Tracing Distributed ray tracing is an

SAMPLE SIZE IN TRIAXIAL LOADS How sample size affects the frictional behavior Photo by H.

Computer Graphics - Ray-Tracing II - Hendrik Lensch Computer Graphics WS07/08 Ray Tracing II

1 minute Path tracing Bidirectional path tracing Progressive photon mapping 1 minute

MIT 6.837 - Ray Tracing Ray Tracing MIT EECS 6.837 Most slides are taken from Frdo Durand and

Advanced Ray Tracing Stochastic ray tracing: distribute rays stochastically across pixel

61A Extra Lecture 9 Announcements Pixels (Demo) Ray Tracing Ray Tracing A technique for

Week 4 Video 2 Knowledge Inference: Bayesian Knowledge Tracing Bayesian Knowledge Tracing (BKT)

Sample 2 Inlet in western (Sunset) Bay 0 Sample 3 Inlet behind Christian Island 1 Sample

Your Plan After High School Choosing a Career Choosing a College College Admissions

Computer Graphics - Ray Tracing I - Hendrik Lensch Computer Graphics WS07/08 Ray Tracing I

Introduction to Path Tracing Marc Sunet Table of contents From Ray Tracing to Path Tracing The

Ray Tracing 1 Ray Tracing Ray Tracing kills two birds with one stone: Solves the Hidden

Tracing with Perf tools Namhyung Kim 2013-11-13 Wed Namhyung Kim Tracing with Perf tools

Modelling of Ensemble Covariances Meteorological Research Division Environment Canada Mark

Agenda Should we trust the results? What are the results telling us about education in the

2 Introduction Topics To Cover Review of Sampler Design How is sampling inaccurate

Project Jeremy Trenhaile, King County Metro Project Goal Utilize an inclusive planning process

Agenda for the Innovation Forum 1. Opening Address 2. EFInA: Introduction & Summary of Key

Business Outlook Presentation for RATIC Jon Bennion Montana Chamber of Commerce US Chamber of

4Q FY2018 Earnings Announcement April 25, 2019 This presentation may contain forward-looking

Republic of Turkey Undersecretariat of Treasury DEBT MANAGEMENT OF TURKEY Choosing between