Knowledge Tracing Machines: Families of models for predicting - PowerPoint PPT Presentation

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Knowledge Tracing Machines: Families of models for predicting student performance Jill-Jênn Vie RIKEN Center for Advanced Intelligence Project, Tokyo Optimizing Human Learning, June 12, 2018 Polytechnique Montréal, June 18, 2018

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Predicting student performance Data A population of students answering questions Events: “Student i answered question j correctly/incorrectly” Goal Learn the difficulty of questions automatically from data Measure the knowledge of students Potentially optimize their learning Assumption Good model for prediction → Good adaptive policy for teaching

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Learning outcomes of this tutorial Logistic regression is amazing Unidimensional Takes IRT, PFA as special cases Factorization machines are even more amazing Multidimensional Take MIRT as special case It makes sense to consider deep neural networks What does deep knowledge tracing model exactly?

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Families of models Factorization Machines (Rendle 2012) Multidimensional Item Response Theory Logistic Regression Item Response Theory Performance Factor Analysis Recurrent Neural Networks Deep Knowledge Tracing (Piech et al. 2015) Steffen Rendle (2012). “Factorization Machines with libFM”. In: ACM Transactions on Intelligent Systems and Technology (TIST) 3.3, 57:1–57:22. doi : 10.1145/2168752.2168771 Chris Piech et al. (2015). “Deep knowledge tracing”. In: Advances in Neural Information Processing Systems (NIPS) , pp. 505–513

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Problems Weak generalization Filling the blanks: some students did not attempt all questions Strong generalization Cold-start: some new students are not in the train set

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Dummy dataset user item correct 1 1 1 User 1 answered Item 1 correct 1 2 0 User 1 answered Item 2 incorrect 2 1 0 User 2 answered Item 1 incorrect 2 1 1 User 2 answered Item 1 correct 2 2 0 User 2 answered Item 2 ??? dummy.csv

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Task 1: Item Response Theory Learn abilities θ i for each user i Learn easiness e j for each item j such that: Pr (User i Item j OK) = σ ( θ i + e j ) logit Pr (User i Item j OK) = θ i + e j Logistic regression Learn w such that logit Pr ( x ) = � w , x � Usually with L2 regularization: || w || 2 2 penalty ↔ Gaussian prior

1 1 Items Users I j U i e j x w Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Graphically: IRT as logistic regression Encoding of “User i answered Item j ”: θ i logit Pr (User i Item j OK) = � w , x � = θ i + e j

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Encoding python encode.py --users --items Users Items U 0 U 1 U 2 I 0 I 1 I 2 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 data/dummy/X-ui.npz Then logistic regression can be run on the sparse features: python lr.py data/dummy/X-ui.npz

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Oh, there’s a problem python encode.py --users --items python lr.py data/dummy/X-ui.npz Users Items U 0 U 1 U 2 I 0 I 1 I 2 y pred y User 1 Item 1 OK 0 1 0 0 1 0 0.575135 1 User 1 Item 2 NOK 0 1 0 0 0 1 0.395036 0 User 2 Item 1 NOK 0 0 1 0 1 0 0.545417 0 User 2 Item 1 OK 0 0 1 0 1 0 0.545417 1 User 2 Item 2 NOK 0 0 1 0 0 1 0.366595 0 We predict the same thing when there are several attempts.

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Count successes and failures Keep track of what the student has done before: user item skill correct wins fails 1 1 1 1 0 0 1 2 2 0 0 0 2 1 1 0 0 0 2 1 1 1 0 1 2 2 2 0 0 0 data/dummy/data.csv

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Task 2: Performance Factor Analysis W ik : how many successes of user i over skill k ( F ik : #failures) Learn β k , γ k , δ k for each skill k such that: � logit Pr (User i Item j OK) = β k + W ik γ k + F ik δ k Skill k of Item j python encode.py --skills --wins --fails Skills Wins Fails S 0 S 1 S 2 S 0 S 1 S 2 S 0 S 1 S 2 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 data/dummy/X-swf.npz

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Better! python encode.py --skills --wins --fails python lr.py data/dummy/X-swf.npz Skills Wins Fails S 0 S 1 S 2 S 0 S 1 S 2 S 0 S 1 S 2 y pred y User 1 Item 1 OK 0 1 0 0 0 0 0 0 0 0.544 1 User 1 Item 2 NOK 0 0 1 0 0 0 0 0 0 0.381 0 User 2 Item 1 NOK 0 1 0 0 0 0 0 0 0 0.544 0 User 2 Item 1 OK 0 1 0 0 0 0 0 1 0 0.633 1 User 2 Item 2 NOK 0 0 1 0 0 0 0 0 0 0.381 0

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Task 3: a new model (but still logistic regression) python encode.py --items --skills --wins --fails python lr.py data/dummy/X-iswf.npz

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Here comes a new challenger How to model side information in, say, recommender systems? Logistic Regression Learn a bias for each feature (each user, item, etc.) Factorization Machines Learn a bias and an embedding for each feature

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion What can be done with embeddings?

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Interpreting the components

x w 1 1 e j U i I j Users Items Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Graphically: logistic regression θ i

v j s k Skills Items Users S k x w V 1 I j u i 1 e j U i 1 Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Graphically: factorization machines θ i β k + + + + +

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Formally: factorization machines Learn bias w k and embedding v k for each feature k such that: N � � logit p ( x ) = µ + + w k x k x k x l � v k , v l � k =1 1 ≤ k < l ≤ N � �� logistic regression pairwise interactions Particular cases Multidimensional item response theory: logit p = � u i , v j � + e j SPARFA: v j > 0 and v j sparse GenMA: v j is constrained by the zeroes of a q-matrix ( q ij ) i , j Andrew S Lan, Andrew E Waters, Christoph Studer, and Richard G Baraniuk (2014). “Sparse factor analysis for learning and content analytics”. In: The Journal of Machine Learning Research 15.1, pp. 1959–2008 Jill-Jênn Vie, Fabrice Popineau, Yolaine Bourda, and Éric Bruillard (2016). “Adaptive Testing Using a General Diagnostic Model”. In: European Conference on Technology Enhanced Learning . Springer, pp. 331–339

DINA K = 13 Nombre de questions posées GenMA K = 13 MIRT K = 2 Rasch Comparaison de modèles de tests adaptatifs (TIMSS) Log loss Nombre de questions posées DINA K = 8 GenMA K = 8 MIRT K = 2 Rasch Comparaison de modèles de tests adaptatifs (Fraction) Log loss Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Tradeoff expressiveness/interpretability NLL logit p 4 q 7 q 10 q Rasch θ i + e j 0.469 (79%) 0.457 (79%) 0.446 (79%) DINA 1 − s j or g j 0.441 (80%) 0.410 (82%) 0.406 (82%) MIRT � u i , v j � + e j 0.368 (83%) 0.325 (86%) 0.316 (86%) GenMA q j � + e j 0.459 (79%) 0.355 (85%) 0.294 (88%) � u i , ˜ 0 . 7 0 . 64 0 . 62 0 . 6 0 . 60 0 . 58 0 . 5 0 . 56 0 . 54 0 . 4 0 . 52 0 . 50 0 . 3 0 . 48 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Assistments 2009 dataset 278608 attempts of 4163 students over 196457 items on 124 skills. Download http://jiji.cat/weasel2018/data.csv Put it in data/assistments09 python fm.py data/assistments09/X-ui.npz etc. or make big AUC users + items skills + w + f items + skills + w + f LR 0.734 (IRT) 2s 0.651 (PFA) 9s 0.737 23s FM 0.730 2min9s 0.652 43s 0.739 2min30s Results obtained with FM d = 20

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Deep Factorization Machines Learn layers W ( ℓ ) and b ( ℓ ) such that: a 0 ( x ) = ( v user , v item , v skill , . . . ) a ( ℓ +1) ( x ) = ReLU( W ( ℓ ) a ( ℓ ) ( x ) + b ( ℓ ) ) ℓ = 0 , . . . , L − 1 y DNN ( x ) = ReLU( W ( L ) a ( L ) ( x ) + b ( L ) ) logit p ( x ) = y FM ( x ) + y DNN ( x ) Jill-Jênn Vie (2018). “Deep Factorization Machines for Knowledge Tracing”. In: The 13th Workshop on Innovative Use of NLP for Building Educational Applications . url : https://arxiv.org/abs/1805.00356

Knowledge Tracing Machines: Families of models for predicting - PowerPoint PPT Presentation

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Knowledge Tracing Machines: Families of models for predicting student performance Jill-Jnn Vie RIKEN Center for Advanced Intelligence Project, Tokyo

Knowledge Tracing Machines: Factorization Machines for Knowledge Tracing Jill-Jnn Vie Hisashi

Advanced Ray Tracing 1 2/8/2006 Distributed Ray Tracing Distributed ray tracing is an

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Computer Graphics - Ray-Tracing II - Hendrik Lensch Computer Graphics WS07/08 Ray Tracing II

1 minute Path tracing Bidirectional path tracing Progressive photon mapping 1 minute

MIT 6.837 - Ray Tracing Ray Tracing MIT EECS 6.837 Most slides are taken from Frdo Durand and

Advanced Ray Tracing Stochastic ray tracing: distribute rays stochastically across pixel

61A Extra Lecture 9 Announcements Pixels (Demo) Ray Tracing Ray Tracing A technique for

Week 4 Video 2 Knowledge Inference: Bayesian Knowledge Tracing Bayesian Knowledge Tracing (BKT)

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Computer Graphics - Ray Tracing I - Hendrik Lensch Computer Graphics WS07/08 Ray Tracing I

Introduction to Path Tracing Marc Sunet Table of contents From Ray Tracing to Path Tracing The

Ray Tracing 1 Ray Tracing Ray Tracing kills two birds with one stone: Solves the Hidden

Tracing with Perf tools Namhyung Kim 2013-11-13 Wed Namhyung Kim Tracing with Perf tools

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Developing Scale Scores & Cut Scores for On-Demand Assessments of Individual Standards Nathan

rs tst t

A variational method for the Rasch model Frank Rijmen and Ji r Vomlel Catholic

EXTRA SLIDES Model 2: Latent Regression LLTM + e Indices: p = person i = item j = person

Ninth ICTP Workshop on the Theory and Use of Regional Climate Models 28 May - 8 June 2018,

Analysis of variance and regression November 13, 2007 SAS language The SAS environments

Through Community-Engaged Experiential Learning Dr. Jonathan H. Westover Associate Professor of

Truecasing Clinical Narratives (Full Paper) Markus Kreuzthaler 1 , Stefan Schulz 1 , 2 1 Institute

Knowledge Tracing Machines: Families of models for predicting - PowerPoint PPT Presentation

Introduction Logistic Regression Factorization Machines Deep Learning Conclusion Knowledge Tracing Machines: Families of models for predicting student performance Jill-Jnn Vie RIKEN Center for Advanced Intelligence Project, Tokyo

Knowledge Tracing Machines: Factorization Machines for Knowledge Tracing Jill-Jnn Vie Hisashi

Advanced Ray Tracing 1 2/8/2006 Distributed Ray Tracing Distributed ray tracing is an

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Computer Graphics - Ray-Tracing II - Hendrik Lensch Computer Graphics WS07/08 Ray Tracing II

1 minute Path tracing Bidirectional path tracing Progressive photon mapping 1 minute

MIT 6.837 - Ray Tracing Ray Tracing MIT EECS 6.837 Most slides are taken from Frdo Durand and

Advanced Ray Tracing Stochastic ray tracing: distribute rays stochastically across pixel

61A Extra Lecture 9 Announcements Pixels (Demo) Ray Tracing Ray Tracing A technique for

Week 4 Video 2 Knowledge Inference: Bayesian Knowledge Tracing Bayesian Knowledge Tracing (BKT)

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Computer Graphics - Ray Tracing I - Hendrik Lensch Computer Graphics WS07/08 Ray Tracing I

Introduction to Path Tracing Marc Sunet Table of contents From Ray Tracing to Path Tracing The

Ray Tracing 1 Ray Tracing Ray Tracing kills two birds with one stone: Solves the Hidden

Tracing with Perf tools Namhyung Kim 2013-11-13 Wed Namhyung Kim Tracing with Perf tools

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Developing Scale Scores &amp; Cut Scores for On-Demand Assessments of Individual Standards Nathan

rs tst t

A variational method for the Rasch model Frank Rijmen and Ji r Vomlel Catholic

EXTRA SLIDES Model 2: Latent Regression LLTM + e Indices: p = person i = item j = person

Ninth ICTP Workshop on the Theory and Use of Regional Climate Models 28 May - 8 June 2018,

Analysis of variance and regression November 13, 2007 SAS language The SAS environments

Through Community-Engaged Experiential Learning Dr. Jonathan H. Westover Associate Professor of

Truecasing Clinical Narratives (Full Paper) Markus Kreuzthaler 1 , Stefan Schulz 1 , 2 1 Institute

Developing Scale Scores & Cut Scores for On-Demand Assessments of Individual Standards Nathan