overview
play

Overview Introduction Problem Identification Conceptual Framework - PDF document

11/17/2016 Predicting SixYear Graduation Probabilities of FirstTime FullTime Freshmen CSRDE NSSR 2016 Khoi D. To, PhD Enterprise Analytics and Advanced Research VCU Office of Planning and Decision Support Overview Introduction


  1. 11/17/2016 Predicting Six‐Year Graduation Probabilities of First‐Time Full‐Time Freshmen CSRDE NSSR 2016 Khoi D. To, PhD Enterprise Analytics and Advanced Research VCU Office of Planning and Decision Support Overview  Introduction  Problem Identification  Conceptual Framework  Data, Methods, and Software  Findings  Further Study 1

  2. 11/17/2016 Introduction  Virginia Commonwealth University (VCU)  A major public research university located in Richmond, the state capital of Virginia.  Classified as a Research University‐Very High Research Activity, the highest ranking by the Carnegie Foundation.  Total enrollment of 32,000; 222 degree and certificate programs, 67 of which are unique in the state of Virginia.  One of the largest academic health centers in the nation. The VCU Medical Center was named the No. 1 hospital in the state in 2013 by U.S. News & World Report. Key Metrics on Student Success First‐year Six‐year High‐school Student/ retention graduation GPA Faculty (F2014 (F2009 (F2015 Ratio cohort) cohort) cohort) (F2015) Virginia Commonwealth 86% (4) 62% (4) 3.64 (4) 16:1 (1) University University of Alabama, 79% (0) 55% (0) 3.66 (0) 18:1 (0) Birmingham University of Cincinnati 88% (1) 65% (0) 3.48 (0) 18:1 (0) University of Illinois, Chicago 82% (0) 60% (0) University of Louisville 79% (0) 53% (0) 3.60 (0) 16:1 (1) University of South Carolina 88% (1) 72% (1) 4.07 (1) 18:1 (0) University of South Florida, 88% (1) 68% (0) 3.94 (0) 24:1 (0) Tampa (All numbers were obtained from 2015‐16 Common Data Set.) 2

  3. 11/17/2016 Problem Identification  VCU six‐year graduation rate is 62%  there is still a lot of room for improvement.  Enhancing six‐year graduation rate lines up with VCU’s commitment to student success.  Higher graduation rates:  Higher institutional reputation and ranking  Less costs for students and their families  More meaningful achievement for both institution and students Problem Identification (Cont.) Being able to predict “at‐risk” students at an early stage (after the first semester) and provide them with necessary assistance to graduate in time is crucial for all parties involved. 3

  4. 11/17/2016 Conceptual Framework Demographics Graduating within 6 years or not? “AT‐RISK” students identified Financial Aid Logistic Regression Pre‐College Academic Decision Tree Advising & Student In‐College Neural Network Support called in for assistance The whole process can be run on an ongoing basis Data  Data collected from Banner ODS modules (Admissions, Enrollment, Financial Aid) after the first semester (fall)  Fall 2009 full‐time first‐time freshman cohort (3,644 students)  Four groups of predictors: demographics, financial aid, pre‐college, and in‐college 4

  5. 11/17/2016 Data (Cont.) Demographics Financial Aid Residency Dependent/Independent Gender Applied for FASFA or not Race/Ethnicity Amount of Pell grant received First generation Percent of need met Median income of zip code Data (Cont.) Pre‐College High school GPA SAT Combined/Math/Verbal IB/AP credits recognized In‐College (at the end of the first semester) Transfer hours recognized On‐campus/Off‐campus STEM major College/School enrolled Student class (FR/SO/JR/SR) Athlete/Honors 5

  6. 11/17/2016 Data (Cont.) In‐College (at the end of the first semester) Number of Term hours attempted/earned Math/Physics/Chemistry courses taken Term quality points/GPA hours Academic standing Number of D/F/W grades Applied for transcript or not Methods and Software  Data are extracted and prepared with SAS Base.  Imputation for missing values and modeling tasks are done with SAS Enterprise Miner.  Three techniques: logistic regression, decision tree, and neural network models. The best model is selected based on misclassification rates.  Original data set is split into two: 60% for training and 40% for validation. 6

  7. 11/17/2016 1 – Logistic Regression Misclassification Rates 100% 26.7% 27.1% Incorrect 50% Correct 73.3% 72.9% 0% Training Validation Lift Chart (Validation) 2.5 2.35 1.98 2.0 Lift (Model) 1.77 Lift (Random) 1.5 1.0 0.5 10 20 30 40 50 60 70 80 90 100 (% of data set) 2 – Decision Tree Misclassification Rates 100% 26.8% 29.1% Incorrect 50% Correct 73.2% 70.9% 0% Training Validation Lift Chart (Validation) 2.5 2.24 1.93 2.0 Lift (Model) 1.69 Lift (Random) 1.5 1.0 0.5 10 20 30 40 50 60 70 80 90 100 (% of data set) 7

  8. 11/17/2016 3 – Neural Network Misclassification Rates 100% 26.1% 28.4% Incorrect Correct 50% 73.9% 71.6% 0% Training Validation Lift Chart (Validation) 2.5 2.37 1.95 2.0 Lift (Model) 1.75 Lift (Random) 1.5 1.0 0.5 10 20 30 40 50 60 70 80 90 100 (% of data set) Findings  Logistic regression model is chosen based on misclassification rate, lift chart, and easiness of model interpretation.  Significant predictors: first generation, academic standing, college, cumulative hours earned/GPA, SAT Verbal, high school GPA, percent of unmet need, and applied for transcript or not. 8

  9. 11/17/2016 Findings (Cont.)  The lift of logistic regression at 20% is 1.98  if the top 20% (sorted by highest to lowest probabilities) of total cohort were selected, the number of “at‐risk” students captured by the model would be 1.98 times as many as when 20% of total cohort were selected at random. Findings (Cont.) Total cohort (3,644 students): graduated in 6 years (2,269 students, or 62%), did not graduate in 6 years (1,375 students, or 38%) No modeling Logistic (random) regression Selecting 20% of total cohort 3,644*20% = 729 students  Number of “at‐risk” students 729*38% = 277 277* 1.98 = 548 captured  If 70% of those “at‐risk” students were helped to 277*70% = 194 548*70% = 384 graduate in 6 years  Improved six‐year (2,269+194)/3,644 (2,269+384)/3,644 graduation rate by targeting = 68% = 73% 20% of the total cohort 9

  10. 11/17/2016 Further Study  Other variables can be introduced to the models to improve accuracy: average time spent in library, intent to complete a degree program (from SAT/ACT record)…  Cluster analysis can be conducted on the predicted non‐graduates to see if they shared any common characteristics. Other Applications  “High‐risk” students can be monitored continuously on a semester basis and passed onto Academic Advising and Student Support for help so that they can graduate in time.  A customized model can be developed for each school/college to help keep track of the progress of their own students. 10

  11. 11/17/2016 Other Applications (Cont.)  The same framework and methods can be developed to predict students’ probabilities of retention/attrition at various levels (university, school/college, or department). Thank You 11

  12. 11/17/2016 References References  Pyke, S.D. & Sheridan, P.M. Logistic regression analysis of graduate student retention. The Canadian Journal of higher education, Vol. XXIII‐2, 1993.  Bogard, M., Helbig, T., Huff, G., & James, C. A comparison of empirical models for predicting student retention. https://www.wku.edu/instres/documents/comparison_of_e mpirical_models.pdf References (Cont.) References  Karimi, A. Predictive modeling of student success. (Coffee Talk 11/15/13) https://www.fullerton.edu/analyticalstudies/_resources/pdf s/irct5.pdf  Raju, D. Predicting student graduation in higher education using data mining models: A comparison. (Dissertation, 2012) http://acumen.lib.ua.edu/content/u0015/0000001/0000901 /u0015_0000001_0000901.pdf 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend