Adaptive Testing using a General Diagnostic Model Jill 1 -Jnn 2 Vie - PowerPoint PPT Presentation

Adaptive Testing using a General Diagnostic Model Jill 1 -Jênn 2 Vie 3 Fabrice Popineau 1 Yolaine Bourda 1 Éric Bruillard 2 1 CentraleSupélec, Gif-sur-Yvette 2 ENS Cachan/Paris-Saclay 3 Université Paris-Saclay

Filipe 1 1 0 0 0 0 Henry 1 1 0 0 0 0 0 0 Gwen 1 1 1 1 1 0 1 0 0 1 1 1 1 0 1 1 1 Ken 0 1 0 0 1 Ian 1 0 Jill 0 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 Bob 1 0 0 0 1 1 0 0 Alice 8 7 6 5 4 3 2 1 Questions 0 0 0 0 1 0 0 0 1 Everett 1 1 1 1 1 0 1 1 Daisy 0 0 0 0 0 1 0 1 Charles 1 Context We consider dichotomous data of learners over questions or tasks. ◮ Tests are too long, students are overtested ◮ Asking all questions to every learner → boredom

How to personalize this process? Q5 Q1 Q2 Q3 Q4 Q3 Q12 Q1 Q4 Q7 Q14 Non-Adaptive Test Adaptive Test

Computerized Adaptive Testing (CAT) Choose the next question based on previous answers. ⇒ Reduce test length while providing an accurate measurement. While some termination criterion is not satisfied Ask the “best” next question Psychometry, item response theory (summative) ◮ Answers can be explained by continuous hidden variables ◮ What parameters can we measure to predict performance? ◮ Infer them directly from student data Cognitive models (formative) ◮ Answers can be explained by the mastery or non-mastery of some knowledge components (KC) ◮ Expert maps KCs and items ◮ Infer the KCs mastered ⇒ predict performance

Applications of test-size reduction ◮ How to ask k questions only, that have predictive power over the rest of the test? ◮ i.e., k questions that summarize the question set. Low-stake self-assessment ◮ Learners get feedback: the KCs that are mastered ◮ Filter the KCs before assessment ◮ Practice testing benefits learning (Dunlosky, 2013) Adaptive pretest at the beginning of a MOOC ◮ You seem to lack KCs 1 and 3 that are prerequisites of this course. ◮ Personalize course content accordingly ◮ Recommend relevant resources

Our questions ◮ How to use a test history data to provide shorter assessments? ◮ What adaptive testing models exist? ◮ How to compare them on the same real data? Outline ◮ Summative CATs (1983) and formative CATs (2008) ◮ Comparison framework ◮ Our new model: GenMA

0.50 –0.35 Q1 Q2 Q3 0.45 Q19 Q20 Difficulty –0.45 –0.40 Summative CATs for standardized tests (GMAT, GRE) Rasch model for 20 questions · · · · · · Question 10 is asked. Incorrect. ⇒ Ability estimate = − 0 . 401 Question 2 is asked. Correct! ⇒ Ability estimate = − 0 . 066 Question 9 is asked. Correct! ⇒ Ability estimate = 0 . 224 Question 14 is asked. Correct! ⇒ Ability estimate = 0 . 478 Feedback and inference Your ability estimate is 0.478. ◮ Q1–7 can be solved with proba 0.7 ◮ Q8–15 can be solved with proba 0.6 ◮ Q16–20 can be solved with proba 0.5

T3 T2 T4 url copy Sharing a link url form Filling a form mail form form Sending a mail T1 url copy mail form Knowledge components Entering a URL Formative CATs for cognitive diagnosis DINA model for 4 tasks, 4 KCs + slip / guess Task 1 is assigned. Correct! ⇒ form and mail may be mastered. No need to assign Task 2. Task 4 is asked. Incorrect. ⇒ url may not be mastered. No need to use Task 3. Feedback and inference ◮ You master form and mail but not url . ◮ You should read my book on the subject. It’s only $200.

Comparison between summative and formative models Cognitive diagnosis Rasch model C 1 C 2 C 3 Q 1 1 0 0 Q 2 0 1 1 Q 3 1 1 0 . . . . . . . . . . . . ◮ KCs required for each ◮ Difficulty of questions question ◮ Ability of learners ◮ Mastery or non-mastery of ◮ Learners can be ranked every KC for each learner ◮ No need of domain ◮ Learners get feedback knowledge ◮ No need of prior data

GenMA: combining MIRT and a q-matrix Rasch model Pr. of success i over j ◮ Perf. depends on difference between Φ( θ i − d j ) learner ability and question difficulty ◮ Same as Elo ratings � d Multidimensional Item Response Theory � Φ( � θ i · � � d j ) = Φ θ ik d jk ◮ Depends on correlation between ability k = 1 and question parameters ( θ ik ) k : ability of learner i ◮ Hard to converge ( d jk ) k : difficulty of question j GenMA � d � � ◮ Depends on correlation between ability Φ θ ik q jk d jk + δ j and question parameters, but only for k = 1 non-zero q-matrix entries ( q jk ) k : q-matrix entry δ j : bias of question j ◮ Easy to converge MIRT

0 0 1 0 0 0 0 Henry Test 1 1 0 1 0 0 0 0 Gwen 1 1 1 1 1 0 1 0 1 Filipe 1 1 1 0 1 1 1 Ken 0 1 0 0 1 Ian 1 0 Jill 0 1 1 0 1 1 1 1 Questions 1 1 1 0 1 1 0 1 Bob 1 0 0 0 1 1 0 0 Alice Train 8 7 6 5 4 3 2 1 0 1 0 1 0 1 0 0 0 1 Everett 1 1 1 1 0 Charles 0 1 Daisy 0 0 0 0 0 1 0 1 0 Experimental protocol ◮ Train student set 80% ◮ Test student set 20% ◮ Validation question set 25%

F T T F F Performance evaluation .6 .1 .6 .7 .9 .1 .5 .5 .3 .7 .9 .4 .1 .6 .6 .7 .3 .7 .6 .3 .8 .4 .8 .6 .4 2 correct predictions over 5 → F T F T .6 .7 .6 .7 .9 .2 .6 .7 .4 .8 .9 .5 .6 .9 .9 .8 .4 .8 .6 .4 .6 .4 .8 .4 .4 3 correct predictions over 5 → F T F T Actually, we use log loss: n logloss ( y ∗ , y ) = 1 � log ( 1 − | y ∗ k − y k | ) . n k = 1

GenMA Feedback ◮ The estimated ability � θ i = ( θ i 1 , . . . , θ iK ) ◮ Proficiency over several KCs Inference ◮ Compute the probability of success over the remaining questions Example ◮ After 4 questions have been asked ◮ Predicted performance: [ . 62 , . 12 , . 42 , . 13 , . 12 ] ◮ True performance: [ T , F , T , F , F ] ◮ Computed logloss (error) is 0.350.

Real dataset: Fraction subtraction (DeCarlo, 2010) ◮ 536 middle-school students ◮ 20 questions of fraction subtraction ◮ 8 KCs Description of the KCs ◮ convert a whole number to a fraction ◮ simplify before subtracting ◮ find a common denominator ◮ . . .

Results Comparing models for adaptive testing (dataset: fraction) 3.0 DINA GenMA 2.5 Rasch Incorrect predictions count 2.0 1.5 1.0 0.5 0.0 0 2 4 6 8 10 12 14 16 Number of questions asked 4 questions over 15 are enough to get a mean accuracy of 4/5.

Summing up Rasch model ◮ Really simple, competitive with other models ◮ But unidimensional, needs prior data, not formative DINA model ◮ Formative, can work without prior data ◮ Needs a q-matrix GenMA ◮ Multidimensional ◮ Formative because dimensions match KCs ◮ Needs a q-matrix and prior data ◮ Faster convergence than MIRT

Further work Considering graphs of prerequisites over KCs Attribute Hierarchy Model, Knowledge Space Theory. Adapting the process according to a group of answers Multistage Testing. Doing a pretest with a group of questions, then a CAT So that first estimate has less bias. Considering other interfaces for assessment Evidence-Centered Design, Stealth Assessment (Shute, 2011)

Thank you for your attention! github.com/jilljenn jjv@lri.fr Do you have any questions?

Adaptive Testing using a General Diagnostic Model Jill 1 -Jnn 2 Vie - PowerPoint PPT Presentation

Adaptive Testing using a General Diagnostic Model Jill 1 -Jnn 2 Vie 3 Fabrice Popineau 1 Yolaine Bourda 1 ric Bruillard 2 1 CentraleSuplec, Gif-sur-Yvette 2 ENS Cachan/Paris-Saclay 3 Universit Paris-Saclay Filipe 1 1 0 0 0 0 Henry

A Framework for Comparing Models for Adaptive Testing Jill-Jnn Vie February 19, 2016 Models

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

The Kansas Unified Testing Strategy Using diagnostic, screening, and surveillance COVID-19 testing

Diagnostic et Prise Prise en Charge des en Charge des Diagnostic et Echecs De Thrombolyse De

Manufacturing Diagnostic Tool Manufacturing Diagnostic Tool An on board on board low cost

PARCC Diagnostic Assessments for Mathematics Comprehension: A Diagnostic Classification Model

Software Testing Overview What is software testing? General testing criteria Testing

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Testing Terminology System testing Types of errors Function testing Structure

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

How Time Variability of Testing the Model Current Map with 8 . . . Testing the Model . . .

Software Testing Software testing 1 V model Software testing 2 Program testing goals To

Towards a Dichotomy for the Possible Winner Problem in Elections Based on Scoring Rules Britta

Stone duality, more duality, and dynamics in Will Brian May 22, 2014 Will Brian Stone

Every graph is easy or hard: dichotomy theorems for graph problems Dniel Marx 1 1 Institute for

Ill Have What Shes Having: Network Formation and Social Spillovers on Film Consumption on

and a model theory dichotomy in GDST Miguel Moreno (joint work with Gabriel Fernandes and

Symmetry gaps in Riemannian geometry Wouter van Limbeek and minimal orbifolds Wouter van

Focus Groups A Focus Group Is . . . What A carefully planned discussion To obtain

The Open Dihypergraph Dichotomy for Definable Subsets of Generalized Baire Spaces Dorottya