Predicting Student Retention in STEM Majors Andrew Sage Dan - - PowerPoint PPT Presentation

predicting student retention in stem majors
SMART_READER_LITE
LIVE PREVIEW

Predicting Student Retention in STEM Majors Andrew Sage Dan - - PowerPoint PPT Presentation

Predicting Student Retention in STEM Majors Andrew Sage Dan Nettleton Cinzia Cervato Craig Ogilvie Iowa State University email: ajsage@iastate.edu August 10, 2015 Iowa State University Predicting STEM Retention August 10, 2015 1 / 15


slide-1
SLIDE 1

Predicting Student Retention in STEM Majors

Andrew Sage Dan Nettleton Cinzia Cervato Craig Ogilvie

Iowa State University email: ajsage@iastate.edu

August 10, 2015

Iowa State University Predicting STEM Retention August 10, 2015 1 / 15

slide-2
SLIDE 2

Project Overview

Objective: Use information available early in the first semester to identify first-year ISU undergraduate students most likely to leave STEM majors. We seek to . . . Estimate the probability of each student leaving STEM Identify important predictors of a student leaving STEM

Iowa State University Predicting STEM Retention August 10, 2015 2 / 15

slide-3
SLIDE 3

Data

We consider STEM students who stayed at ISU for at least 1 year

5,247 students from 2011-12, 2012-13 537 (10.3%) left STEM during first year

36 explanatory variables

High school academic performance (GPA, rank, courses, Regent Admission Index) Standardized test scores (ACT, SAT) MapWorks survey factors First semester ISU courses and learning community participation Gender

Iowa State University Predicting STEM Retention August 10, 2015 3 / 15

slide-4
SLIDE 4

Random Forest Methodology

Predictions based on ensembles of decision trees (Breiman, 2001) Useful in classification and regression problems Handles large number of explanatory variables Handles nonlinear relationships and complicated interactions Provides measure of variable importance Implemented in R packages randomForest (Liaw & Wiener), randomForestSRC (Ishwaran & Kogalur), party (Hothorn et al.)

Conditional inference trees in party perform best when predictors vary in numbers of values/categories

Iowa State University Predicting STEM Retention August 10, 2015 4 / 15

slide-5
SLIDE 5

A Decision Tree

ACT ≤ 27 > 27 LC N Y n = 810 y = 0.19 Sex F M n = 993 y = 0.13 GPA ≤ 3.44 > 3.44 n = 668 y = 0.13 n = 932 y = 0.08 GPA ≤ 3.97 > 3.97 n = 1116 y = 0.07 n = 728 y = 0.02

n=number of students in node y=proportion leaving STEM

Iowa State University Predicting STEM Retention August 10, 2015 5 / 15

slide-6
SLIDE 6

Random Forests

Growing a Forest from Training Data Many trees (1,000 in our case) Each tree grown from a different bootstrap sample Random subset of predictor variables considered for each split Trees grown until nodes are homogeneous Predicting New Cases Run new case through each tree in a forest Probability of a response class is estimated by the proportion of trees “voting” for that class

Iowa State University Predicting STEM Retention August 10, 2015 6 / 15

slide-7
SLIDE 7

Identifying At-Risk Students

Predict 2012-13 cases using 2011-12 as training data Consider 500 students most likely to leave STEM to be at-risk 20.2% of at-risk students left STEM, compared to 8.1% of others

0.00 0.05 0.10 0.15 0.20 At−Risk Not At−Risk

Classification Proportion Leaving STEM

Predictive Performance

Iowa State University Predicting STEM Retention August 10, 2015 7 / 15

slide-8
SLIDE 8

Visualizing Response Curve

0.00 0.05 0.10 0.15 0.20 200 250 300 350

Regent Admission Index Probability of Leaving STEM

Sex Female Male

Effect of Regent Admission Index and Sex

Marginal distribution of estimated probabilities for 2012-13 students Forest grown using 2011-12 data

Iowa State University Predicting STEM Retention August 10, 2015 8 / 15

slide-9
SLIDE 9

Visualizing Response Curve II

0.00 0.05 0.10 0.15 0.20 200 250 300 350

Regent Admission Index Probability of Leaving STEM

LC Member Yes No

Effect of Regent Admission Index and Learning Community

Marginal distribution of estimated probabilities for 2012-13 students Forest grown using 2011-12 data

Iowa State University Predicting STEM Retention August 10, 2015 9 / 15

slide-10
SLIDE 10

Visualizing Response Curve III

0.00 0.05 0.10 0.15 0.20 2 4 6

MapWorks:Analytical Skills Self−Assessment Probability of Leaving STEM

LC Member Yes No

Effect of Analytical Skills Self−Assesment

Marginal distribution of estimated probabilities for 2012-13 students Forest grown using 2011-12 data

Iowa State University Predicting STEM Retention August 10, 2015 10 / 15

slide-11
SLIDE 11

Assessing Variable Importance

Permutation Importance

1 Make predictions for out-of-bag cases for each tree 2 Compute misclassification rate 3 Randomly permute values for an explanatory variable and re-predict 4 Compute new misclassification rate 5 Large increase in misclassification rate indicates variable is important Iowa State University Predicting STEM Retention August 10, 2015 11 / 15

slide-12
SLIDE 12

Important Variables

MapWorks: Environment Chemistry Units HS Rank HS GPA Sex Biology Units Analytical Skills* Self−Efficacy* LC Member Regent Admissions Index

0e+00 3e−04 6e−04 9e−04

Importance

Variable

Ten Important Predictors

* variable from MapWorks self-assessment survey.

Iowa State University Predicting STEM Retention August 10, 2015 12 / 15

slide-13
SLIDE 13

Comparison with Logistic Regression

Performance comparable to logistic regression model obtained using backwards selection

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

ROC Curves

1−Specificity Sensitivity RF LR

Iowa State University Predicting STEM Retention August 10, 2015 13 / 15

slide-14
SLIDE 14

Future Work

Incorporate data from 2013-14, 2014-15 Additional explanatory variables (ALEKS, ACT inventory survey) Identify students likely to drop out of ISU altogether Ongoing adaptive predictions Examine differences between STEM majors

Iowa State University Predicting STEM Retention August 10, 2015 14 / 15

slide-15
SLIDE 15

References

L Breiman. Random forests. Machine Learning, 45:5-32, 2001. T Hothorn, P B¨ uhlmann, S Dudoit, A Molinaro, M Van Der Laan. Survival Ensembles,Biostatistics 7(3):355-373. 2006a. T Hothorn, K Hornick, and A Zeileis. Unbiased recursive partitioning: A conditional inference framework, Journal of Computational and Graphical Statistics, 15(3):651-674, 2006b. H Ishwaran and U.B. Kogalur. Random Forests for Survival, Regression and Classification (RF-SRC), R package 1.5.4. 2014. H Ishwaran and U.B. Kogalur. Random survival forests in R, R News, 7(2):25-31, 2007. H Ishwaran U.B. Kogalur, E Blackstone, and M Lauer. Random Survival Forests, The Annals of Applied Statistics, 2(3):841-860, 2008. A Liaw and M Wiener. Classification and Regression by randomForest. R News 2(3):18-22, 2002.

Iowa State University Predicting STEM Retention August 10, 2015 15 / 15