 
              Predicting Student Retention in STEM Majors Andrew Sage Dan Nettleton Cinzia Cervato Craig Ogilvie Iowa State University email: ajsage@iastate.edu August 10, 2015 Iowa State University Predicting STEM Retention August 10, 2015 1 / 15
Project Overview Objective: Use information available early in the first semester to identify first-year ISU undergraduate students most likely to leave STEM majors. We seek to . . . Estimate the probability of each student leaving STEM Identify important predictors of a student leaving STEM Iowa State University Predicting STEM Retention August 10, 2015 2 / 15
Data We consider STEM students who stayed at ISU for at least 1 year 5,247 students from 2011-12, 2012-13 537 (10.3%) left STEM during first year 36 explanatory variables High school academic performance (GPA, rank, courses, Regent Admission Index) Standardized test scores (ACT � , SAT � ) MapWorks � survey factors First semester ISU courses and learning community participation Gender Iowa State University Predicting STEM Retention August 10, 2015 3 / 15
Random Forest Methodology Predictions based on ensembles of decision trees (Breiman, 2001) Useful in classification and regression problems Handles large number of explanatory variables Handles nonlinear relationships and complicated interactions Provides measure of variable importance Implemented in R packages randomForest (Liaw & Wiener), randomForestSRC (Ishwaran & Kogalur), party (Hothorn et al.) Conditional inference trees in party perform best when predictors vary in numbers of values/categories Iowa State University Predicting STEM Retention August 10, 2015 4 / 15
A Decision Tree ACT ≤ 27 > 27 LC GPA N Y Sex ≤ 3.97 > 3.97 F M GPA ≤ 3.44 > 3.44 n = 810 n = 993 n = 668 n = 932 n = 1116 n = 728 y = 0.19 y = 0.13 y = 0.13 y = 0.08 y = 0.07 y = 0.02 n=number of students in node y=proportion leaving STEM Iowa State University Predicting STEM Retention August 10, 2015 5 / 15
Random Forests Growing a Forest from Training Data Many trees (1,000 in our case) Each tree grown from a different bootstrap sample Random subset of predictor variables considered for each split Trees grown until nodes are homogeneous Predicting New Cases Run new case through each tree in a forest Probability of a response class is estimated by the proportion of trees “voting” for that class Iowa State University Predicting STEM Retention August 10, 2015 6 / 15
Identifying At-Risk Students Predict 2012-13 cases using 2011-12 as training data Consider 500 students most likely to leave STEM to be at-risk 20.2% of at-risk students left STEM, compared to 8.1% of others Predictive Performance 0.20 Proportion Leaving STEM 0.15 0.10 0.05 0.00 At−Risk Not At−Risk Classification Iowa State University Predicting STEM Retention August 10, 2015 7 / 15
Visualizing Response Curve Effect of Regent Admission Index and Sex 0.20 Probability of Leaving STEM 0.15 Sex Female 0.10 Male 0.05 0.00 200 250 300 350 Regent Admission Index Marginal distribution of estimated probabilities for 2012-13 students Forest grown using 2011-12 data Iowa State University Predicting STEM Retention August 10, 2015 8 / 15
Visualizing Response Curve II Effect of Regent Admission Index and Learning Community 0.20 Probability of Leaving STEM 0.15 LC Member Yes 0.10 No 0.05 0.00 200 250 300 350 Regent Admission Index Marginal distribution of estimated probabilities for 2012-13 students Forest grown using 2011-12 data Iowa State University Predicting STEM Retention August 10, 2015 9 / 15
Visualizing Response Curve III Effect of Analytical Skills Self−Assesment 0.20 Probability of Leaving STEM 0.15 LC Member Yes 0.10 No 0.05 0.00 2 4 6 MapWorks:Analytical Skills Self−Assessment Marginal distribution of estimated probabilities for 2012-13 students Forest grown using 2011-12 data Iowa State University Predicting STEM Retention August 10, 2015 10 / 15
Assessing Variable Importance Permutation Importance 1 Make predictions for out-of-bag cases for each tree 2 Compute misclassification rate 3 Randomly permute values for an explanatory variable and re-predict 4 Compute new misclassification rate 5 Large increase in misclassification rate indicates variable is important Iowa State University Predicting STEM Retention August 10, 2015 11 / 15
Important Variables Ten Important Predictors Regent Admissions Index LC Member Self−Efficacy* Analytical Skills* Variable Biology Units Sex HS GPA HS Rank Chemistry Units MapWorks: Environment 0e+00 3e−04 6e−04 9e−04 Importance * variable from MapWorks � self-assessment survey. Iowa State University Predicting STEM Retention August 10, 2015 12 / 15
Comparison with Logistic Regression Performance comparable to logistic regression model obtained using backwards selection ROC Curves 1.0 0.8 Sensitivity 0.6 0.4 RF 0.2 LR 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1−Specificity Iowa State University Predicting STEM Retention August 10, 2015 13 / 15
Future Work Incorporate data from 2013-14, 2014-15 Additional explanatory variables (ALEKS � , ACT � inventory survey) Identify students likely to drop out of ISU altogether Ongoing adaptive predictions Examine differences between STEM majors Iowa State University Predicting STEM Retention August 10, 2015 14 / 15
References L Breiman. Random forests. Machine Learning, 45:5-32, 2001. T Hothorn, P B¨ uhlmann, S Dudoit, A Molinaro, M Van Der Laan. Survival Ensembles, Biostatistics 7(3):355-373. 2006a. T Hothorn, K Hornick, and A Zeileis. Unbiased recursive partitioning: A conditional inference framework, Journal of Computational and Graphical Statistics, 15(3):651-674, 2006b. H Ishwaran and U.B. Kogalur. Random Forests for Survival, Regression and Classification (RF-SRC), R package 1.5.4. 2014. H Ishwaran and U.B. Kogalur. Random survival forests in R, R News, 7(2):25-31, 2007. H Ishwaran U.B. Kogalur, E Blackstone, and M Lauer. Random Survival Forests, The Annals of Applied Statistics, 2(3):841-860, 2008. A Liaw and M Wiener. Classification and Regression by randomForest. R News 2(3):18-22, 2002. Iowa State University Predicting STEM Retention August 10, 2015 15 / 15
Recommend
More recommend