SLIDE 1
Predicting Performance on MOOC Assessments using Multi-Regression - - PowerPoint PPT Presentation
Predicting Performance on MOOC Assessments using Multi-Regression - - PowerPoint PPT Presentation
Predicting Performance on MOOC Assessments using Multi-Regression Models Zhiyun Ren, Huzefa Rangwala, Aditya Johri George Mason University 4400 University Drive, Fairfax, Virginia 22030 Outline q Background q Personal Linear Multi-Regression
SLIDE 2
SLIDE 3
Background
SLIDE 4
Background
SLIDE 5
Overview
q Information we have: MOOC server log q Things we want to do: Predict student’s performance
SLIDE 6
Challenge
q Various kinds of participants q High attrition rate q Flexible timetable q Baselines we have tried: Linear regression model, meanscore
SLIDE 7
Personal Linear Multi-Regression Models
!",$ = &" + &( + )"
*+, "$ = &" + &( +
()",. ,
"$,/0.,/ 12 /34
)
6 .34
𝑞𝑡 𝑋 𝑔𝑡𝑏
𝒎 --Number
- f regression
models 𝒐𝑮 -- Number
- f features
!"#"!"$% ((, *, +) 1 2/ (01,2 − 01,2)4
5 678
+ :( * ;
4 + ( ; 4) + <( * + ( )
SLIDE 8
Data structure
(a) Homework and quiz (b) Video (c) Study session
SLIDE 9
Feature selection
q quiz related features q time related features q interval-based features q homework related features
SLIDE 10
Feature selection
q Video related features q Session features
SLIDE 11
Experimental setup
q Different motivations part the data into two groups. q Different models are applied for different data types.
SLIDE 12
Experimental protocol
q PreviousHW-based prediction q PreviousOneHW-based prediction
HW1 HW2 HW3 HW4 …... HW1 HW2 HW3 HW4 …...
SLIDE 13
Experimental baseline: KT-IDEM
K K K Q Q Q
P(L0) P(T) P(T) P(G1) P(S1) P(G2) P(S2) P(G3) P(S3)
I I I
……
Model parameters P(L0) = Initial Knowledge P(T) = Probability of learning P(G1…n) = Probability of guess per question P(S1…n) = Probability of slip per question n denotes the number of all questions.
SLIDE 14
Comparative Performance
q Prediction results with varying number of regression models for student group with continuous grade value
SLIDE 15
Comparative Performance
q Prediction results with varying number of regression models for student group with binary grade value
SLIDE 16
Comparative Performance
q The comparison of the accuracy and F1 scores with baseline approaches.
SLIDE 17
Feature Importance
SLIDE 18
Feature Importance
SLIDE 19
Conclusion and future work
q Predict algorithm: personalized multiple linear regression model. q Experimental results: improved performance compared to baseline methods. q Other contribution: analysis of feature importance. q Future work: to set up an early warning system to help improve student’s performance
SLIDE 20