Temporal Models for Predicting Student Dropout in Massive Open - - PowerPoint PPT Presentation

temporal models for predicting student dropout in massive
SMART_READER_LITE
LIVE PREVIEW

Temporal Models for Predicting Student Dropout in Massive Open - - PowerPoint PPT Presentation

Temporal Models for Predicting Student Dropout in Massive Open Online Courses Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology (HKUST) fmi@ust.hk (fei.mi@epfl.ch) November, 14th, 2015 Fei Mi, Dit-Yan Yeung (HKUST) ICDM


slide-1
SLIDE 1

Temporal Models for Predicting Student Dropout in Massive Open Online Courses

Fei Mi, Dit-Yan Yeung

Hong Kong University of Science and Technology (HKUST) fmi@ust.hk (fei.mi@epfl.ch)

November, 14th, 2015

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 1 / 17

slide-2
SLIDE 2

Outline

1

Background and Motivation

2

Temporal Models

3

Experiments

4

Conclusion

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 2 / 17

slide-3
SLIDE 3

Outline

1

Background and Motivation

2

Temporal Models

3

Experiments

4

Conclusion

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17

slide-4
SLIDE 4

Overview

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17

slide-5
SLIDE 5

Overview

1 What can we do? Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17

slide-6
SLIDE 6

Overview

1 What can we do?

Performance evaluation (Peer Grading)

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17

slide-7
SLIDE 7

Overview

1 What can we do?

Performance evaluation (Peer Grading) Help students engage and perform better (Dropout Prediction)

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17

slide-8
SLIDE 8

Overview

1 What can we do?

Performance evaluation (Peer Grading) Help students engage and perform better (Dropout Prediction) Build personalized platform (Recommendation)

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17

slide-9
SLIDE 9

Overview

1 What can we do?

Performance evaluation (Peer Grading) Help students engage and perform better (Dropout Prediction) Build personalized platform (Recommendation)

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 4 / 17

slide-10
SLIDE 10

Motivation of our work

1 High attrition rate commonly on MOOC platforms (60% 80%) Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 5 / 17

slide-11
SLIDE 11

Motivation of our work

1 High attrition rate commonly on MOOC platforms (60% 80%) 2 Current methods: SVM, Logistic Regression

Activity features (lecture video, discussion forum) Static models

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 5 / 17

slide-12
SLIDE 12

Contribution of our work

1 A sequence labeling perspective

Week 1 Week 2 Week 3 Week 4 Week t 𝒚1 𝒚2 𝒚3 𝒚4 𝒚𝑢 𝑧1 𝑧2 𝑧3 𝑧4 𝑧𝑢

Labels Activities

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 6 / 17

slide-13
SLIDE 13

Contribution of our work

1 A sequence labeling perspective

Week 1 Week 2 Week 3 Week 4 Week t 𝒚1 𝒚2 𝒚3 𝒚4 𝒚𝑢 𝑧1 𝑧2 𝑧3 𝑧4 𝑧𝑢

Labels Activities

2 Compare different temporal machine learning models Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 6 / 17

slide-14
SLIDE 14

Contribution of our work

1 A sequence labeling perspective

Week 1 Week 2 Week 3 Week 4 Week t 𝒚1 𝒚2 𝒚3 𝒚4 𝒚𝑢 𝑧1 𝑧2 𝑧3 𝑧4 𝑧𝑢

Labels Activities

2 Compare different temporal machine learning models

Input-output Hidden Markov Model (IOHMM) Recurrent Neural Network (RNN) RNN with long short-term memory (LSTM) cells

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 6 / 17

slide-15
SLIDE 15

Outline

1

Background and Motivation

2

Temporal Models

3

Experiments

4

Conclusion

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 7 / 17

slide-16
SLIDE 16

How to capture temporal information?

Sliding window structures (NLP tasks):

1 Features aggregated using sliding window structure 2 Temporal span fixed by sliding window Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 7 / 17

slide-17
SLIDE 17

How to capture temporal information?

Sliding window structures (NLP tasks):

1 Features aggregated using sliding window structure 2 Temporal span fixed by sliding window

Temporal models:

1 Learn from the previous inputs and the current input 2 Temporal pathway allows a “memory” of the previous inputs to

persist in the internal state

3 Flexible temporal span, learn from data Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 7 / 17

slide-18
SLIDE 18

Input-output Hidden Markov Model (IOHMM)

Originated from HMM Learn to map input sequences to output sequences

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 8 / 17

slide-19
SLIDE 19

Input-output Hidden Markov Model (IOHMM)

Originated from HMM Learn to map input sequences to output sequences ht = Aht−1 + Bxt + N(0, Q) yt = Cht + N(0, R) (1)

!" !"#$ %& %&'( !"'$ )&'( )& )&#(

Hidden&states Dropoutlabels Input&features

%&#(

… … … …

IOHMM 1

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 8 / 17

slide-20
SLIDE 20

Vanilla Recurrent Neural Network (Vanilla RNN)

RNN allows the network connections to form cycles. ht = H(W1xt + W2ht−1 + bh) yt = F(W3ht + by) (2)

Left: Vanilla RNN structure; Right: Vanilla RNN unfolded

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 9 / 17

slide-21
SLIDE 21

Drawbacks of RNN

1 Influence of an input either decays or blows up as it cycles the

recurrent connection

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 10 / 17

slide-22
SLIDE 22

Drawbacks of RNN

1 Influence of an input either decays or blows up as it cycles the

recurrent connection

2 Vanishing gradient problem Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 10 / 17

slide-23
SLIDE 23

Drawbacks of RNN

1 Influence of an input either decays or blows up as it cycles the

recurrent connection

2 Vanishing gradient problem 3 The range of temporality that can be accessed in practice is usually

quite limited

4 Dynamic state of regular RNN is short-term memory Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 10 / 17

slide-24
SLIDE 24

Long Short-Term Memory Cell (LSTM)

1

Hochreiter & Schimidhuber (1997) solved the problem of getting an RNN to remember things for a long time.

m n

1

Information get into a cell whenever the “input” gate is on

2

Information stays in the cell so long as the “forget” gate is closed

3

Information can read from the cell by turning the “output” gate on

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 11 / 17

slide-25
SLIDE 25

Update Functions of LSTM

m n

it = σ(Wxixt + Whiht−1 + Wcict−1 + bi) ft = σ(Wxf xt + Whf ht−1 + Wcf ct−1 + bf ) ct = ft ⌦ ct−1 + it ⌦ tanh(Wxcxt + Whcht−1 + bc)

  • t = σ(Wxoxt + Whoht−1 + Wcoct−1 + bo)

ht = ot ⌦ tanh(ct) (3)

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 12 / 17

slide-26
SLIDE 26

Hybrid of LSTM Memory Cells and RNN (LSTM Network)

… …

… … … …

Left: Hybrid of LSTM and RNN (LSTM network); Right: LSTM network unfolded

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 13 / 17

slide-27
SLIDE 27

Outline

1

Background and Motivation

2

Temporal Models

3

Experiments

4

Conclusion

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 14 / 17

slide-28
SLIDE 28

Datasets for Dropout Prediction

1

“Science of Gastronomy”, six-week course (Coursera).

2

85394 ! 39877

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 14 / 17

slide-29
SLIDE 29

Datasets for Dropout Prediction

1

“Science of Gastronomy”, six-week course (Coursera).

2

85394 ! 39877

1

“Introduction to Java Programming”, ten-week course (edX).

2

46972 ! 27629

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 14 / 17

slide-30
SLIDE 30

Dropout Definitions

1

Three definitions capture different contexts of the student status in a course

DEF1 Participation in the final week: whether a student will stay to the end of the course [Yang et al.2013, Ramesh et al.2014, He et al.2015] DEF2 Last week of engagement: whether the current week is the last week the student has activities [Amnueypornsakul et al.2014, Kloft et al.2014, Sinha et al.2014, Sharkey and Sanders2014, Taylor et al.2014] DEF3 Participation in the next week: whether a student has activities in the comming week

Three dropout definitions

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 15 / 17

slide-31
SLIDE 31

Dropout Definitions

1

Three definitions capture different contexts of the student status in a course

DEF1 Participation in the final week: whether a student will stay to the end of the course [Yang et al.2013, Ramesh et al.2014, He et al.2015] DEF2 Last week of engagement: whether the current week is the last week the student has activities [Amnueypornsakul et al.2014, Kloft et al.2014, Sinha et al.2014, Sharkey and Sanders2014, Taylor et al.2014] DEF3 Participation in the next week: whether a student has activities in the comming week

Three dropout definitions

Time Week 1 Week 2 Week 3 Week 4 Week 5 Features [7,34,9,2,0,7,5] Zeros [6,3,12,4,1,8,3] Zeros Zeros DEF1 1 1 1 1 1 DEF2 1 1 null DEF3 1 1 1 null

An illustrative example for DEF1-DEF3

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 15 / 17

slide-32
SLIDE 32

Model Performance Comparison

Week 1 2 3 4 5 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF1)

LSTM Network Vanilla RNN IOHMM 1 IOHMM 2 Nonlinear SVM Logistic Regression

Week 1 2 3 4 5 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF2) Week 1 2 3 4 5 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF3)

AUC scores of all models for Coursera course

Week 1 2 3 4 5 6 7 8 9 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF1)

LSTM Network Vanilla RNN IOHMM 1 IOHMM 2 Nonlinear SVM Logistic Regression

Week 1 2 3 4 5 6 7 8 9 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF2) Week 1 2 3 4 5 6 7 8 9 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF3)

AUC scores of all models for edX course

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 16 / 17

slide-33
SLIDE 33

Model Performance Comparison

Week 1 2 3 4 5 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF1)

LSTM Network Vanilla RNN IOHMM 1 IOHMM 2 Nonlinear SVM Logistic Regression

Week 1 2 3 4 5 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF2) Week 1 2 3 4 5 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF3)

AUC scores of all models for Coursera course

Week 1 2 3 4 5 6 7 8 9 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF1)

LSTM Network Vanilla RNN IOHMM 1 IOHMM 2 Nonlinear SVM Logistic Regression

Week 1 2 3 4 5 6 7 8 9 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF2) Week 1 2 3 4 5 6 7 8 9 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF3)

AUC scores of all models for edX course

1 LSTM network performs consistently best Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 16 / 17

slide-34
SLIDE 34

Model Performance Comparison

Week 1 2 3 4 5 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF1)

LSTM Network Vanilla RNN IOHMM 1 IOHMM 2 Nonlinear SVM Logistic Regression

Week 1 2 3 4 5 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF2) Week 1 2 3 4 5 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF3)

AUC scores of all models for Coursera course

Week 1 2 3 4 5 6 7 8 9 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF1)

LSTM Network Vanilla RNN IOHMM 1 IOHMM 2 Nonlinear SVM Logistic Regression

Week 1 2 3 4 5 6 7 8 9 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF2) Week 1 2 3 4 5 6 7 8 9 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF3)

AUC scores of all models for edX course

1 LSTM network performs consistently best 2 IOHMMs performance worst Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 16 / 17

slide-35
SLIDE 35

Model Performance Comparison

Week 1 2 3 4 5 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF1)

LSTM Network Vanilla RNN IOHMM 1 IOHMM 2 Nonlinear SVM Logistic Regression

Week 1 2 3 4 5 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF2) Week 1 2 3 4 5 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF3)

AUC scores of all models for Coursera course

Week 1 2 3 4 5 6 7 8 9 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF1)

LSTM Network Vanilla RNN IOHMM 1 IOHMM 2 Nonlinear SVM Logistic Regression

Week 1 2 3 4 5 6 7 8 9 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF2) Week 1 2 3 4 5 6 7 8 9 0.5 0.6 0.7 0.8 0.9 1 Model Performance Comparison (DEF3)

AUC scores of all models for edX course

1 LSTM network performs consistently best 2 IOHMMs performance worst 3 Baselines ' vanilla RNN; Not consistent on two datasets Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 16 / 17

slide-36
SLIDE 36

Outline

1

Background and Motivation

2

Temporal Models

3

Experiments

4

Conclusion

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 17 / 17

slide-37
SLIDE 37

Takehome Message

1 Temporal perspective to dropout prediction problem 2 The effectiveness of RNN and LSTM network Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 17 / 17

slide-38
SLIDE 38

Takehome Message

1 Temporal perspective to dropout prediction problem 2 The effectiveness of RNN and LSTM network 3 Try not “dropout” the MOOC courses you are taking Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 17 / 17