Adaptive Quiz Generation Using Thompson Sampling
Fuhua (Oscar) Lin, PhD Athabasca University, Canada Third Workshop eliciting Adaptive Sequences for Learning (WASL 2020) Cyberspace, 6 July, 2020 Co-located with the AIED 2020
Adaptive Quiz Generation Using Thompson Sampling Fuhua (Oscar) Lin, - - PowerPoint PPT Presentation
Adaptive Quiz Generation Using Thompson Sampling Fuhua (Oscar) Lin, PhD Athabasca University, Canada Third Workshop eliciting Adaptive Sequences for Learning ( WASL 2020 ) Cyberspace, 6 July, 2020 Co-located with the AIED 2020 Outline
Fuhua (Oscar) Lin, PhD Athabasca University, Canada Third Workshop eliciting Adaptive Sequences for Learning (WASL 2020) Cyberspace, 6 July, 2020 Co-located with the AIED 2020
(Group, 1999).
Classroom
Online learning environments
(EDM)
Thompson, William R. "On the likelihood that one unknown probability exceeds another in view of the evidence of two samples". Biometrika, 25(3–4):285–294, 1933.
Domain Model: ∆ = {𝜀1, 𝜀2, … , 𝜀𝑜}, 𝜀𝑗 is called knowledge unit
Assessment Model A
Quiz Model
Λ1= 𝜀1 𝜀2 𝜀3 . 𝜀𝑜 0.0 0.2 0.5 0.0 0.0 0.3 0.0 . . . . 0.0 ⋯ 1.0 1.0 1.0 ⋮ ⋱ ⋮ 0.0 0.0 0.0 ⋯ 1.0 𝑢0 𝑢1 𝑢2 … … 𝑢𝑛
𝑀𝑃1= 𝑚𝑝11 𝑚𝑝12 𝑚𝑝13 . 𝑚𝑝1𝑜 0.0 0.5 0.8 0.0 0.4 0.6 0.0 . . . . 0.5 ⋯ 1.0 1.0 1.0 ⋮ ⋱ ⋮ 0.0 0.3 0.7 ⋯ 1.0 𝑢0 𝑢1 𝑢2 … … 𝑢𝑛 𝐵11= 𝑏111 𝑏112 𝑏113 . 𝑏11𝑜 1 1 1 . . ⋯ 1 1 1 ⋮ ⋱ ⋮ ⋯ 1 𝑢0 𝑢1 𝑢2 … … 𝑢𝑛
K actions: {1, … , 𝐿} Rewards: {0, 1}
▪ when played, an action 𝑙 ∈ {1, … , 𝐿} produces a reward rt of
▪ 𝜄𝑙 success probability or mean reward.
(𝜄1, … , 𝜄𝐿)
▪ unknown to the agent, fixed over time ▪ can be learned by experimentation, denoted their estimated values as: ( መ 𝜄1, መ 𝜄2, … , መ 𝜄𝐿)
The objective is to maximize σ𝑠=1
𝑈
𝑠
𝑢 , where T >> K.
{0, 1} {0, 1} {0, 1}
𝑢 = 1 = 𝜄1
𝑅 = 1 2 3 1 … … 1 1 … … ⋮ 1 … … 𝑢0 𝑢1 𝑢2 … … 𝑢𝑛
LO = {lo1, lo2, …, loK}. At the 𝑠𝑢ℎ question of a quiz, reward 𝑦𝑠 ∈ {0, 1}. Take priors to be beta-distributed with parameters 𝛽 = 𝛽1, … , 𝛽𝐿
and 𝛾 = 𝛾1, … , 𝛾𝐿 .
▪ 𝛽𝑙 𝑏𝑜𝑒 𝛾𝑙 correspond to the counts when we succeeded or failed in learning objective 𝑚𝑝𝑙 to get a reward, respectively.
Each learning objective k corresponds to an unknown success
probability 𝜈𝑙 :
▪ 𝑞 𝑦𝑠 = 1 𝑠; 𝑚𝑝𝑙 = 𝜈𝑙, k ∈ {1, 2, …, K}.
The prior probability density function of 𝜈𝑙 is
𝑞(𝜈𝑙) =
𝛥(𝛽𝑙+𝛾𝑙) 𝛥(𝛽𝑙)𝛥(𝛾𝑙) 𝜈𝑙 𝛽𝑙−1(1 − 𝜈𝑙)𝛾𝑙−1,
where 𝛥 denotes the gamma function.
The optimal policy is to choose a question on one learning objective
for which 𝜈𝑙 attains its smallest value, i.e. 𝑚𝑝∗ = 𝑏𝑠𝑛𝑗𝑜𝑙∈𝐿𝜈𝑙.
https://ecstep.com/beta-function/
The success probability estimate Ƹ
𝜈𝑙 is
posterior distribution, which is a beta distribution with parameters 𝛽𝑙 and 𝛾𝑙, rather than taken to be the expectation 𝛽𝑙/(𝛽𝑙 + 𝛾𝑙) used in the greedy algorithm.
Ƹ 𝜈𝑙 represents a statistically plausible success probability.
TS-based adaptive quiz generation algorithm
▪ Bayesian approach ▪ Maximizing the accuracy of identifying lacking areas ▪ Prior knowledge
Data Structure and Algorithms as a testbed
▪ Initial stage ▪ Deploying and testing
Benchmarking
▪ Positive predictive value (PPV)