Overview Introduction Problem Identification Conceptual Framework - - PDF document

overview
SMART_READER_LITE
LIVE PREVIEW

Overview Introduction Problem Identification Conceptual Framework - - PDF document

11/17/2016 Predicting SixYear Graduation Probabilities of FirstTime FullTime Freshmen CSRDE NSSR 2016 Khoi D. To, PhD Enterprise Analytics and Advanced Research VCU Office of Planning and Decision Support Overview Introduction


slide-1
SLIDE 1

11/17/2016 1

Predicting Six‐Year Graduation Probabilities

  • f First‐Time Full‐Time Freshmen

CSRDE NSSR 2016 Khoi D. To, PhD

Enterprise Analytics and Advanced Research VCU Office of Planning and Decision Support

Overview

 Introduction  Problem Identification  Conceptual Framework  Data, Methods, and Software  Findings  Further Study

slide-2
SLIDE 2

11/17/2016 2

Introduction

 Virginia Commonwealth University (VCU)

  • A major public research university located in Richmond, the

state capital of Virginia.

  • Classified as a Research University‐Very High Research

Activity, the highest ranking by the Carnegie Foundation.

  • Total enrollment of 32,000; 222 degree and certificate

programs, 67 of which are unique in the state of Virginia.

  • One of the largest academic health centers in the nation.

The VCU Medical Center was named the No. 1 hospital in the state in 2013 by U.S. News & World Report.

First‐year retention (F2014 cohort) Six‐year graduation (F2009 cohort) High‐school GPA (F2015 cohort) Student/ Faculty Ratio (F2015) Virginia Commonwealth University 86% (4) 62% (4) 3.64 (4) 16:1 (1) University of Alabama, Birmingham 79% (0) 55% (0) 3.66 (0) 18:1 (0) University of Cincinnati 88% (1) 65% (0) 3.48 (0) 18:1 (0) University of Illinois, Chicago 82% (0) 60% (0) University of Louisville 79% (0) 53% (0) 3.60 (0) 16:1 (1) University of South Carolina 88% (1) 72% (1) 4.07 (1) 18:1 (0) University of South Florida, Tampa 88% (1) 68% (0) 3.94 (0) 24:1 (0)

Key Metrics on Student Success

(All numbers were obtained from 2015‐16 Common Data Set.)

slide-3
SLIDE 3

11/17/2016 3

Problem Identification

  • VCU six‐year graduation rate is 62%  there is still a

lot of room for improvement.

  • Enhancing six‐year graduation rate lines up with

VCU’s commitment to student success.

  • Higher graduation rates:

 Higher institutional reputation and ranking  Less costs for students and their families  More meaningful achievement for both institution and students

Problem Identification (Cont.)

Being able to predict “at‐risk” students at an early stage (after the first semester) and provide them with necessary assistance to graduate in time is crucial for all parties involved.

slide-4
SLIDE 4

11/17/2016 4

Conceptual Framework

In‐College Financial Aid Pre‐College Demographics Logistic Regression Decision Tree Neural Network “AT‐RISK” students identified Academic Advising & Student Support called in for assistance Graduating within 6 years or not? The whole process can be run on an ongoing basis

Data

  • Data collected from Banner ODS modules

(Admissions, Enrollment, Financial Aid) after the first semester (fall)

  • Fall 2009 full‐time first‐time freshman cohort

(3,644 students)

  • Four groups of predictors: demographics,

financial aid, pre‐college, and in‐college

slide-5
SLIDE 5

11/17/2016 5

Data (Cont.)

Demographics Financial Aid

Residency Dependent/Independent Gender Applied for FASFA or not Race/Ethnicity Amount of Pell grant received First generation Percent of need met Median income of zip code

Data (Cont.)

Pre‐College

High school GPA SAT Combined/Math/Verbal IB/AP credits recognized

In‐College (at the end of the first semester)

Transfer hours recognized On‐campus/Off‐campus STEM major College/School enrolled Student class (FR/SO/JR/SR) Athlete/Honors

slide-6
SLIDE 6

11/17/2016 6

Data (Cont.)

In‐College (at the end of the first semester)

Term hours attempted/earned Number of Math/Physics/Chemistry courses taken Term quality points/GPA hours Academic standing Number of D/F/W grades Applied for transcript or not

Methods and Software

  • Data are extracted and prepared with SAS Base.
  • Imputation for missing values and modeling tasks

are done with SAS Enterprise Miner.

  • Three techniques: logistic regression, decision tree,

and neural network models. The best model is selected based on misclassification rates.

  • Original data set is split into two: 60% for training

and 40% for validation.

slide-7
SLIDE 7

11/17/2016 7

1 – Logistic Regression

73.3% 72.9% 26.7% 27.1%

0% 50% 100% Training Validation

Misclassification Rates

Incorrect Correct

2.35 1.98 1.77

0.5 1.0 1.5 2.0 2.5 10 20 30 40 50 60 70 80 90 100 (% of data set)

Lift Chart (Validation)

Lift (Model) Lift (Random)

2 – Decision Tree

73.2% 70.9% 26.8% 29.1%

0% 50% 100% Training Validation

Misclassification Rates

Incorrect Correct

2.24 1.93 1.69

0.5 1.0 1.5 2.0 2.5 10 20 30 40 50 60 70 80 90 100 (% of data set)

Lift Chart (Validation)

Lift (Model) Lift (Random)

slide-8
SLIDE 8

11/17/2016 8

3 – Neural Network

73.9% 71.6% 26.1% 28.4%

0% 50% 100% Training Validation

Misclassification Rates

Incorrect Correct

2.37 1.95 1.75

0.5 1.0 1.5 2.0 2.5 10 20 30 40 50 60 70 80 90 100 (% of data set)

Lift Chart (Validation)

Lift (Model) Lift (Random)

Findings

  • Logistic regression model is chosen based on

misclassification rate, lift chart, and easiness of model interpretation.

  • Significant predictors: first generation, academic

standing, college, cumulative hours earned/GPA, SAT Verbal, high school GPA, percent of unmet need, and applied for transcript or not.

slide-9
SLIDE 9

11/17/2016 9

Findings (Cont.)

  • The lift of logistic regression at 20% is 1.98  if

the top 20% (sorted by highest to lowest probabilities) of total cohort were selected, the number of “at‐risk” students captured by the model would be 1.98 times as many as when 20% of total cohort were selected at random.

Findings (Cont.)

Total cohort (3,644 students): graduated in 6 years (2,269 students, or 62%), did not graduate in 6 years (1,375 students, or 38%) No modeling (random) Logistic regression Selecting 20% of total cohort 3,644*20% = 729 students  Number of “at‐risk” students captured 729*38% = 277 277*1.98 = 548  If 70% of those “at‐risk” students were helped to graduate in 6 years 277*70% = 194 548*70% = 384  Improved six‐year graduation rate by targeting 20% of the total cohort (2,269+194)/3,644 = 68% (2,269+384)/3,644 = 73%

slide-10
SLIDE 10

11/17/2016 10

Further Study

  • Other variables can be introduced to the models

to improve accuracy: average time spent in library, intent to complete a degree program (from SAT/ACT record)…

  • Cluster analysis can be conducted on the

predicted non‐graduates to see if they shared any common characteristics.

Other Applications

  • “High‐risk” students can be monitored

continuously on a semester basis and passed

  • nto Academic Advising and Student Support for

help so that they can graduate in time.

  • A customized model can be developed for each

school/college to help keep track of the progress

  • f their own students.
slide-11
SLIDE 11

11/17/2016 11

Other Applications (Cont.)

  • The same framework and methods can be

developed to predict students’ probabilities of retention/attrition at various levels (university, school/college, or department).

Thank You

slide-12
SLIDE 12

11/17/2016 12

References

  • Pyke, S.D. & Sheridan, P.M. Logistic regression analysis of

graduate student retention. The Canadian Journal of higher education, Vol. XXIII‐2, 1993.

  • Bogard, M., Helbig, T., Huff, G., & James, C. A comparison of

empirical models for predicting student retention. https://www.wku.edu/instres/documents/comparison_of_e mpirical_models.pdf

References References

  • Karimi, A. Predictive modeling of student success. (Coffee

Talk 11/15/13) https://www.fullerton.edu/analyticalstudies/_resources/pdf s/irct5.pdf

  • Raju, D. Predicting student graduation in higher education

using data mining models: A comparison. (Dissertation, 2012) http://acumen.lib.ua.edu/content/u0015/0000001/0000901 /u0015_0000001_0000901.pdf

References (Cont.)