Temporal Models for Predicting Student Dropout in Massive Open - PowerPoint PPT Presentation

Temporal Models for Predicting Student Dropout in Massive Open Online Courses Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology (HKUST) fmi@ust.hk (fei.mi@epfl.ch) November, 14th, 2015 Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 1 / 17

Outline Background and Motivation 1 Temporal Models 2 Experiments 3 Conclusion 4 Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 2 / 17

Overview Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17

Overview 1 What can we do? Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17

Overview 1 What can we do? Performance evaluation (Peer Grading) Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17

Overview 1 What can we do? Performance evaluation (Peer Grading) Help students engage and perform better (Dropout Prediction) Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17

Overview 1 What can we do? Performance evaluation (Peer Grading) Help students engage and perform better (Dropout Prediction) Build personalized platform (Recommendation) Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17

Overview 1 What can we do? Performance evaluation (Peer Grading) Help students engage and perform better (Dropout Prediction) Build personalized platform (Recommendation) Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 4 / 17

Motivation of our work 1 High attrition rate commonly on MOOC platforms (60% � 80%) Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 5 / 17

Motivation of our work 1 High attrition rate commonly on MOOC platforms (60% � 80%) 2 Current methods: SVM, Logistic Regression Activity features (lecture video, discussion forum) Static models Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 5 / 17

Contribution of our work 1 A sequence labeling perspective 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 𝑢 Labels Week 1 Week 2 Week 3 Week 4 Week t 𝒚 2 𝒚 3 𝒚 4 𝒚 𝑢 𝒚 1 Activities Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 6 / 17

Contribution of our work 1 A sequence labeling perspective 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 𝑢 Labels Week 1 Week 2 Week 3 Week 4 Week t 𝒚 2 𝒚 3 𝒚 4 𝒚 𝑢 𝒚 1 Activities 2 Compare di ff erent temporal machine learning models Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 6 / 17

Contribution of our work 1 A sequence labeling perspective 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 𝑢 Labels Week 1 Week 2 Week 3 Week 4 Week t 𝒚 2 𝒚 3 𝒚 4 𝒚 𝑢 𝒚 1 Activities 2 Compare di ff erent temporal machine learning models Input-output Hidden Markov Model (IOHMM) Recurrent Neural Network (RNN) RNN with long short-term memory (LSTM) cells Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 6 / 17

How to capture temporal information? Sliding window structures (NLP tasks): 1 Features aggregated using sliding window structure 2 Temporal span fixed by sliding window Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 7 / 17

How to capture temporal information? Sliding window structures (NLP tasks): 1 Features aggregated using sliding window structure 2 Temporal span fixed by sliding window Temporal models: 1 Learn from the previous inputs and the current input 2 Temporal pathway allows a “memory” of the previous inputs to persist in the internal state 3 Flexible temporal span, learn from data Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 7 / 17

Input-output Hidden Markov Model (IOHMM) Originated from HMM Learn to map input sequences to output sequences Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 8 / 17

Input-output Hidden Markov Model (IOHMM) Originated from HMM Learn to map input sequences to output sequences h t = Ah t − 1 + Bx t + N ( 0 , Q ) (1) y t = Ch t + N ( 0 , R ) … … ! "#$ Dropoutlabels ! "'$ ! " Hidden&states % &'( % & % &#( … … Input&features ) &'( ) & ) &#( IOHMM 1 Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 8 / 17

Vanilla Recurrent Neural Network (Vanilla RNN) RNN allows the network connections to form cycles. h t = H ( W 1 x t + W 2 h t − 1 + b h ) (2) y t = F ( W 3 h t + b y ) Left: Vanilla RNN structure; Right: Vanilla RNN unfolded Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 9 / 17

Drawbacks of RNN 1 Influence of an input either decays or blows up as it cycles the recurrent connection Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 10 / 17

Drawbacks of RNN 1 Influence of an input either decays or blows up as it cycles the recurrent connection 2 Vanishing gradient problem Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 10 / 17

Drawbacks of RNN 1 Influence of an input either decays or blows up as it cycles the recurrent connection 2 Vanishing gradient problem 3 The range of temporality that can be accessed in practice is usually quite limited 4 Dynamic state of regular RNN is short-term memory Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 10 / 17

Long Short-Term Memory Cell (LSTM) Hochreiter & Schimidhuber (1997) 1 solved the problem of getting an RNN to remember things for a long Information get into a cell 1 time. whenever the “input” gate is on Information stays in the cell so 2 long as the “forget” gate is closed Information can read from the 3 m n cell by turning the “output” gate on Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 11 / 17

Update Functions of LSTM m n i t = σ ( W xi x t + W hi h t − 1 + W ci c t − 1 + b i ) f t = σ ( W xf x t + W hf h t − 1 + W cf c t − 1 + b f ) c t = f t ⌦ c t − 1 + i t ⌦ tanh( W xc x t + W hc h t − 1 + b c ) (3) o t = σ ( W xo x t + W ho h t − 1 + W co c t − 1 + b o ) h t = o t ⌦ tanh( c t ) Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 12 / 17

Hybrid of LSTM Memory Cells and RNN (LSTM Network) … … … … … … Left: Hybrid of LSTM and RNN (LSTM network); Right: LSTM network unfolded Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 13 / 17

Datasets for Dropout Prediction “Science of Gastronomy”, six-week course (Coursera). 1 85394 ! 39877 2 Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 14 / 17

Datasets for Dropout Prediction “Science of Gastronomy”, six-week course (Coursera). 1 85394 ! 39877 2 “Introduction to Java Programming”, ten-week course (edX). 1 46972 ! 27629 2 Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 14 / 17

Dropout Definitions Three definitions capture di ff erent contexts of the student status in a course 1 Participation in the final week: whether a student will stay DEF1 to the end of the course [Yang et al.2013, Ramesh et al.2014, He et al.2015] DEF2 Last week of engagement: whether the current week is the last week the student has activities [Amnueypornsakul et al.2014, Kloft et al.2014, Sinha et al.2014, Sharkey and Sanders2014, Taylor et al.2014] DEF3 Participation in the next week: whether a student has activities in the comming week Three dropout definitions Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 15 / 17

Dropout Definitions Three definitions capture di ff erent contexts of the student status in a course 1 Participation in the final week: whether a student will stay DEF1 to the end of the course [Yang et al.2013, Ramesh et al.2014, He et al.2015] DEF2 Last week of engagement: whether the current week is the last week the student has activities [Amnueypornsakul et al.2014, Kloft et al.2014, Sinha et al.2014, Sharkey and Sanders2014, Taylor et al.2014] DEF3 Participation in the next week: whether a student has activities in the comming week Three dropout definitions Time Week 1 Week 2 Week 3 Week 4 Week 5 Features [7,34,9,2,0,7,5] [6,3,12,4,1,8,3] Zeros Zeros Zeros DEF1 1 1 1 1 1 DEF2 0 0 1 1 null DEF3 1 0 1 1 null An illustrative example for DEF1 - DEF3 Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 15 / 17

Temporal Models for Predicting Student Dropout in Massive Open - PowerPoint PPT Presentation

Temporal Models for Predicting Student Dropout in Massive Open Online Courses Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology (HKUST) fmi@ust.hk (fei.mi@epfl.ch) November, 14th, 2015 Fei Mi, Dit-Yan Yeung (HKUST) ICDM

Massive Data Algorithmics Lecture 1: Introduction Massive Data Algorithmics Lecture 1:

Jumpout : Improved Dropout for Deep Neural Networks with ReLUs Shengjie Wang, Tianyi Zhou, Jeff

Spatio-Temporal Statistics with R Chapter Two: Exploring Spatio-Temporal Data Spatio-Temporal

The FIFA Universe Massive scale, massive influence, massive corruption First, Some History.

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th eodore

Dropout as a Structured Shrinkage Prior Eric Nalisnick , Jos Miguel Hernndez-Lobato , Padhraic

AMMI Introduction to Deep Learning 6.3. Dropout Fran cois Fleuret

Follow the Leader with Dropout Perturbations Tim van Erven COLT, 2014 Joint work with: Wojciech

Deep learning 6.3. Dropout Fran cois Fleuret https://fleuret.org/ee559/ Nov 2, 2020 A first

Preve Prevention ntion of of Dro Dropout pout in Vo in Vocatio cational Training nal

Deep learning 6.3. Dropout Fran cois Fleuret https://fleuret.org/dlc/ Dec 20, 2020 A first

Student view socrative.com STUDENT LOGIN or LOGIN, STUDENT LOGIN Room name: ALLIANCE Student

Welcome Predicting Change Outcomes Leveraging SQL Server Profiler Lee Everest SQL Rx Predicting

Temporal, Spatial, and Spatio-temporal Granularities Gabriele Pozzani Department of Computer

Temporal Code Temporal Code Temporal Code (Acoustic Front-end) Human Recognition Machine

Temporal Privacy in Wireless Sensor Networks Temporal Privacy in Wireless Sensor Networks

The Hidden Stories Maria Wolters Reader in Design Informatics University of Edinburgh of

Hidden Markov ov Model (HMM) based S Speech Synthesis using ing HTS Toolkit. Presenter: Omer

arXiv:1508.01991v1 [cs.CL] 9 Aug 2015 els include LSTM networks, bidirectional layer on the

Data for Official Statistics Marco Puts, Piet Daas, Martijn Tennekes Road sensors Road sensor

Tampa Bay Water Piloting Utility Modeling Applications Alison Adams, Ph.D., P.E. Jeff Geurink,

Some computational and modeling issues for hierarchical models Andrew Gelman Dept of Statistics

Spatial Statistical Methods Paul Voss Carolina Population Center Odum Institute for Research in

Semiparametric regression with hierarchical models Yanwei (Wayne) Zhang Statistical Research CNA

Temporal Models for Predicting Student Dropout in Massive Open - PowerPoint PPT Presentation

Temporal Models for Predicting Student Dropout in Massive Open Online Courses Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology (HKUST) fmi@ust.hk (fei.mi@epfl.ch) November, 14th, 2015 Fei Mi, Dit-Yan Yeung (HKUST) ICDM

Massive Data Algorithmics Lecture 1: Introduction Massive Data Algorithmics Lecture 1:

Jumpout : Improved Dropout for Deep Neural Networks with ReLUs Shengjie Wang*, Tianyi Zhou*, Jeff

Spatio-Temporal Statistics with R Chapter Two: Exploring Spatio-Temporal Data Spatio-Temporal

The FIFA Universe Massive scale, massive influence, massive corruption First, Some History.

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th eodore

Dropout as a Structured Shrinkage Prior Eric Nalisnick , Jos Miguel Hernndez-Lobato , Padhraic

AMMI Introduction to Deep Learning 6.3. Dropout Fran cois Fleuret

Follow the Leader with Dropout Perturbations Tim van Erven COLT, 2014 Joint work with: Wojciech

Deep learning 6.3. Dropout Fran cois Fleuret https://fleuret.org/ee559/ Nov 2, 2020 A first

Preve Prevention ntion of of Dro Dropout pout in Vo in Vocatio cational Training nal

Deep learning 6.3. Dropout Fran cois Fleuret https://fleuret.org/dlc/ Dec 20, 2020 A first

Student view socrative.com STUDENT LOGIN or LOGIN, STUDENT LOGIN Room name: ALLIANCE Student

Welcome Predicting Change Outcomes Leveraging SQL Server Profiler Lee Everest SQL Rx Predicting

Temporal, Spatial, and Spatio-temporal Granularities Gabriele Pozzani Department of Computer

Temporal Code Temporal Code Temporal Code (Acoustic Front-end) Human Recognition Machine

Temporal Privacy in Wireless Sensor Networks Temporal Privacy in Wireless Sensor Networks

The Hidden Stories Maria Wolters Reader in Design Informatics University of Edinburgh of

Hidden Markov ov Model (HMM) based S Speech Synthesis using ing HTS Toolkit. Presenter: Omer

arXiv:1508.01991v1 [cs.CL] 9 Aug 2015 els include LSTM networks, bidirectional layer on the

Data for Official Statistics Marco Puts, Piet Daas, Martijn Tennekes Road sensors Road sensor

Tampa Bay Water Piloting Utility Modeling Applications Alison Adams, Ph.D., P.E. Jeff Geurink,

Some computational and modeling issues for hierarchical models Andrew Gelman Dept of Statistics

Spatial Statistical Methods Paul Voss Carolina Population Center Odum Institute for Research in

Semiparametric regression with hierarchical models Yanwei (Wayne) Zhang Statistical Research CNA

Jumpout : Improved Dropout for Deep Neural Networks with ReLUs Shengjie Wang, Tianyi Zhou, Jeff