 
              Bayesian Networks in Educational Testing Jiˇ r´ ı Vomlel Laboratory for Intelligent Systems Prague University of Economics This presentation is available at: http://www.utia.cas.cz/vomlel/slides/lisp2002.pdf
Contents: • Educational testing is a “big business”. • What is a fixed test and an adaptive test? • An example: a test of basic operations with fractions. • Optimal and myopically optimal tests. • Construction of a myopically optimal fixed test. • Results of experiments. • Ane example showing that modeling dependence between skills is important. • Conclusions.
Educational Testing Service (ETS) • Educational Testing Service is the world’s largest private educational testing organization with 2,300 regular employees. • Volumes for ETS’s Largest Exams in 2000-2001: 3,185,000 SAT I Reasoning Test and SAT II: Subject Area Tests (the SAT test is the standard college admission test in US) 2,293,000 PSAT: Preliminary SAT/National Merit Scholarship Qualifying Test 1,421,000 AP: Advanced Placement Program 801,000 The Praxis Series: Professional Assessments for Beginning Teach- ers and Pre-Professional Skills Tests 787,000 TOEFL: Test of English as a Foreign Language 449,000 GRE: Graduate Record Examinations General Test etc.
Fixed Test vs. Adaptive Test Q 1 Q 2 Q 5 wrong correct Q 3 Q 4 Q 8 Q 4 wrong correct wrong correct Q 5 Q 7 Q 6 Q 9 Q 2 Q 6 wrong correct wrong correct wrong correct wrong correct Q 7 Q 1 Q 3 Q 6 Q 8 Q 4 Q 7 Q 7 Q 10 Q 8 Q 9 Q 10
Computerized Adaptive Testing (CAT) Objective: An optimal test for each examinee Two basic steps: (1) examinee’s knowledge level is estimated (2) questions appropriate for the level are selected. R. Almond and R. Mislevy from ETS proposed to use graphical models in CAT. • one student model (relations between skills, abilities, etc.) • several evidence models , one for each task or question.
CAT for basic operations with fractions Examples of tasks: � � 3 4 · 5 − 1 15 24 − 1 8 = 5 8 − 1 8 = 4 8 = 1 T 1 : = 6 8 2 6 + 1 1 12 + 1 2 12 = 1 3 12 = T 2 : = 12 4 4 · 1 1 1 1 4 · 3 2 = 3 T 3 : = 2 8 � � � � 1 2 · 1 1 3 + 1 1 4 · 2 12 = 1 2 · 3 = T 4 : = 6 . 2 3
Elementary and operational skills 2 > 1 1 3 > 1 2 CP Comparison (common nu- 3 , 3 merator or denominator) 7 = 1 + 2 1 7 + 2 = 3 AD Addition (comm. denom.) 7 7 5 = 2 − 1 5 − 1 2 = 1 SB Subtract. (comm. denom.) 5 5 1 2 · 3 3 5 = MT Multiplication 10 � � � � 1 2 , 2 6 , 4 3 = Common denominator CD 3 6 6 = 2 · 2 4 2 · 3 = 2 CL Cancelling out 3 2 = 3 · 2 + 1 7 = 3 1 CIM Conv. to mixed numbers 2 2 2 = 3 · 2 + 1 3 1 = 7 CMI Conv. to improp. fractions 2 2
Misconceptions Label Description Occurrence d = a + c a b + c MAD 14.8% b + d d = a − c a b − c MSB 9.4% b − d b = a · c a b · c MMT1 14.1% b b · c a b = a + c MMT2 8.1% b · b d = a · d a b · c MMT3 15.4% b · c a · c a b · c d = MMT4 8.1% b + d c = a · b a b MC 4.0% c
Process that lead to the student model • decision on what skills will be tested, preparation of paper tests • paper tests given to students at Brønderslev high school, 149 students did the test. • analysis of results, finding misconceptions, summarizing results into a data file, • learning a Bayesian network model using the PC-algorithm and the EM-algorithm, • attempts to explain some relations between skills and misconceptions using hidden variables, • a new learning phase with hidden variables included, certain edges required to be part of the learned model.
Student model HV2 HV1 ACMI ACIM ACL ACD AD SB CMI CIM CL CD MT CP MAD MSB MC MMT1 MMT2 MMT3 MMT4
Evidence model for task T 1 � 3 � 4 · 5 − 1 8 = 15 24 − 1 8 = 5 8 − 1 8 = 4 8 = 1 6 2 ⇔ MT & CL & ACL & SB & ¬ MMT 3 & ¬ MMT 4 & ¬ MSB T 1 CL ACL MT SB MMT3 MSB T 1 MMT4 P ( X 1 | T 1 ) X 1
Student + Evidence model HV2 HV1 ACMI ACIM ACL ACD AD SB CMI CIM CL CD MT CP MAD MSB MC MMT1 MMT2 MMT3 MMT4 T 1 X 1
Example of an adaptive test X 3 = yes X 3 : 1 4 < 2 5 ? X 2 = yes X 3 = no X 2 : 1 5 < 1 4 ? X 1 = yes X 2 = no X 1 : 1 5 < 2 5 ? X 1 = no Entropy of a probability distribution P ( S i ) − ∑ P ( S i = s i ) · log P ( S i = s i ) H ( P ( S i )) = s i ∈ S i Total entropy in a node n : H ( e n ) = ∑ S i ∈S H ( P ( S i | e n )) . Expected entropy at the end of a test t is EH ( t ) = ∑ ℓ ∈L ( t ) P ( e ℓ ) · H ( e ℓ ) .
Let T be the set of all possible tests. X 2 A test t ⋆ is optimal iff X 3 X 1 t ⋆ = t ∈T EH ( t ) . arg min X 2 X 3 A selected test X 1 A myopically optimal test t is a test X 3 where each question X ⋆ of t minimizes X 2 X 1 the expected value of entropy after the X 3 question is answered: X 1 X ⋆ = X ∈X EH ( t ↓ X ) , arg min X 2 X 3 X 1 i.e. it works as if the test finished after X 2 the selected question X ⋆ .
Myopic construction of a fixed test X 2 e list : = [ ∅ ] ; X 3 X 1 test : = [ ] ; X 2 for i : = 1 to |X | do counts [ i ] : = 0; X 3 P ( X 2 = 0 ) X 1 for position : = 1 to test lenght do X 3 new e list : = [ ] ; X 2 X 1 for all e ∈ e list do X 3 P ( X 2 = 1 ) i : = most in f ormative X ( e ) ; X 1 counts [ i ] : = counts [ i ] + P ( e ) ; X 2 X 3 X 1 for all x i ∈ X i do X 2 append ( new e list , { e ∪ { X i = x i }} ) ; e list : = new e list ; = {{ X 2 = 0 } , { X 2 = 1 }} e list i ⋆ : = arg max i counts [ i ] ; counts [ 3 ] = P ( X 2 = 0 ) = 0.7 append ( test , X i ⋆ ) ; counts [ 1 ] = P ( X 2 = 1 ) = 0.3 counts [ i ⋆ ] : = 0; X 2 X 3 . . . return ( test ) ;
Skill Prediction Quality 92 adaptive average descending 90 ascending 88 Quality of skill predictions 86 84 82 80 78 76 74 0 2 4 6 8 10 12 14 16 18 20 Number of answered questions
Total entropy of probability of skills 12 adaptive average descending 11 ascending 10 9 Entropy on skills 8 7 6 5 4 0 2 4 6 8 10 12 14 16 18 20 Number of answered questions
Question Prediction Quality 100 adaptive average descending ascending 95 Quality of question predictions 90 85 80 75 70 0 2 4 6 8 10 12 14 16 18 Number of answered questions
An example of a simple diagnostic task Diagnosis of the absence or the presence of three skills S 1 , S 2 , S 3 by use of a bank of three questions X 1,2 , X 1,3 , X 2,3 . such that  if ( s i , s j ) = ( 1, 1 ) 1  P ( X i , j = 1 | S i = s i , S j = s j ) = 0 otherwise.  Assume answers to all questions from the item bank are wrong, i.e. X 1,2 = 0, X 1,3 = 0, X 2,3 = 0 .
Reasoning assuming skill independency X 1 , 3 All skills are independent S 1 S 3 P ( S 1 ) · P ( S 2 ) · P ( S 3 ) P ( S 1 , S 2 , S 3 ) = and P ( S i ) , i = 1, 2, 3 are uniform. X 1 , 2 S 2 X 2 , 3 Then the probabilities for j = 1, 2, 3 are: P ( S j = 0 | X 1,2 = 0, X 1,3 = 0, X 2,3 = 0 ) = 0.75 , i.e. we can not decide which skills are present and which are missing.
Modeling dependence between skills X 1 , 3 with deterministic hierarchy S 1 S 3 S 1 ⇒ S 2 , S 2 ⇒ S 3 X 1 , 2 S 2 X 2 , 3 P ( S 1 = 0 | X 1,2 = 0, X 1,3 = 0, X 2,3 = 0 ) = 1 P ( S 2 = 0 | X 1,2 = 0, X 1,3 = 0, X 2,3 = 0 ) = 1 P ( S 3 = 0 | X 1,2 = 0, X 1,3 = 0, X 2,3 = 0 ) = 0.5 Observe, that for i = 1, 2, 3 P ( S i | X 1,2 = 0, X 1,3 = 0, X 2,3 = 0 ) P ( S i | X 2,3 = 0 ) , i.e. = X 2,3 = 0 gives the same information as X 1,2 = 0, X 1,3 = 0, X 2,3 = 0.
Conclusions • Empirical evidence shows that educational testing can benefit from application of Bayesian networks . • Adaptive tests may substantially reduce the number of questions that are necessary to be asked. • The new method for the design of a fixed test provided good results on tested data. It may be regarded as a good cheap alternative to computerized adaptive tests when they are not suitable. • One theoretical problem related to application of Bayesian networks to educational testing is efficient inference exploiting deterministic relations in the model. This problem was addressed in our UAI 2002 paper.
... and this is the END. It’s time to have a beer. ... or are there any questions?
Recommend
More recommend