SLIDE 1 UI Predictive Modeling for Recruitment & Retention
Michael Hovland, Director of Enrollment Mgmt Data Analytics Knute Carter, Assistant Professor, College of Public Health
BI Community Presentation
January 18, 2017
SLIDE 2 Two Primary Types of Predictive Enrollment Models at UI
- Prospect Models (Introduced 7/15)
– All junior and senior prospects
- Admit Model (Introduced 3/15)
– Begins when students are admitted
SLIDE 3 Most Important Factors in Enrollment Predictive Modeling
- 80-90 variables chosen from:
– Student academic ability – Student enrollment preferences/intentions – Length of time students are interested UI – Strength of interest in UI – Student demographics/characteristics – Institutional data (financial aid, housing, and
SLIDE 4 What Predictive Modeling Looks Like
- Scores are stored in MAUI on a scale of 1-99
– The higher the number, the more likely the student is to enroll
- Each probability of enrollment also carries a
corresponding percentile rank
- To get an enrollment projection for any group,
you sum the probabilities and divide by 100
- Most of our enrollment comes from the top 3
deciles or top 30%
SLIDE 5 Predictive Modeling Scores Are the Beginning, not the End
- Updated scores are generated weekly
depending on student activity
– Some students will go up and others will go down
SLIDE 6 Benefits/Uses of Predictive Modeling
- Identify students more likely to enroll
- Predict aggregate enrollment of groups
– Admissions counselor territories – High schools – Ability bands – Racial/ethnic groups – UI Colleges and departments – Scholarship program recipients – Students likely to take specific courses
SLIDE 7 Applications of Predictive Modeling Data in Diverse Areas
- Marketing and communications
- Financial aid scholarships
- Housing
- Orientation
- Presidential scholarships
- Course and section planning
- Admissions waiting lists
SLIDE 8 Use of Predicting Modeling Data in Marketing and Communications
- Use the admit model and prospect model to
determine which prospective students will receive print publications
- Start with a target numeric goal
– Omit and protect certain groups of students based on characteristics – Use predictive modeling scores to fill in the gap between the number of protected students and the target numeric goal
SLIDE 9 Use of Predicting Modeling Data in Financial Aid Scholarships
- The overall yield rate for incoming freshmen is
under 30%
– This means that UI does not spend 70 cents of every dollar of scholarship moneys offered
- To project total scholarship costs, staff multiply
the cost of every scholarship offered times the probability of enrollment for each student
- FA staff also use projected scholarship
headcounts to do 6-year cost projections
SLIDE 10 Use of Predicting Modeling Data in Housing
- Enrollment probabilities can be summed to
project:
– Likely occupancy for each residence hall – Likely size of each living/learning community
SLIDE 11 Use of Predicting Modeling Data in Orientation
- Incoming freshmen attend one of series of
- n-campus orientation programs throughout
the summer
- Students are scheduled (and advisors
assigned) based on program of study
- Orientation staff can use probabilities to
determine the likely number of slots needed for each major
SLIDE 12 Use of Predicting Modeling Data with Presidential Scholarships
- Every year the UI awards 20 Presidential
Scholarships to incoming freshman
- Several hundred scholarship applications
are pared down to a group of 30-40 finalists
- Probabilities are used to determine how
many finalists are to be offered presidential scholarships
SLIDE 13 Use of Predicting Modeling Data in Course and Section Scheduling
- Academic departments can use probabilities
to determine the number of adjunct instructors to hire and the number of sections of courses to offer
SLIDE 14 Use of Predicting Modeling Data with Admissions Waiting Lists
- Last year the UI Admissions Office instituted
a waiting list for students applying after May 1
- This year the waiting list will come much
earlier
- Probabilities are used, along with student
profile data, to determine how many students to admit from the waiting list
SLIDE 15 Establishing Multiple Types of Enrollment Predictors
- Predictive modeling scores
- Longitudinal trends for applications, admits,
and admissions acceptances
- Housing applications
- Admissions deposits
- Orientation reservations
- FAFSAs received
- ACT and SAT scores received
SLIDE 16 What Have We Learned and Where Do We Go From Here
- Technical issues
- Challenges of aggregate group predictions
early in the admissions cycle
- Model changes necessitated by external
circumstances
- New retention and success models
SLIDE 17 Individual vs. Aggregate Predictions
- Individual is easy; most admissions uses of
PM data simply rely on ranking probabilities from top to bottom. It doesn’t matter if a student is a 0.89 or 0.86
- But when you’re summing probabilities to
make an aggregate enrollment prediction, it matters a great deal whether a student is 0.89 or 0.86
SLIDE 18 The Arc of Aggregate Predictions
- ver the Admissions Cycle
- Weekly patterns of apps, admits, deposits,
and predicted enrollments can vary greatly by time of year – even when the end result is the same
- Examples of when predictive modeling is
most accurate and when it isn’t
SLIDE 19
Weekly Patterns of Admissions Acceptances Vary a Great Deal During Year
Week Number 2014 Pct 15 Dep Change 1 Yr Pct 16 Dep Change 1 Yr Pct 17 Dep Change 1 Yr 44 29.61% 6.68% 4.44% 45 27.26% 6.32% 7.24% 46 25.15% 7.72% 8.03% 47 25.27% 8.14% 10.25% 48 24.05% 8.78% 13.63% Census % 12.83% 12.89% Cenus # 4061 4582 5173
SLIDE 20
But Weekly Deposits Compared to Final Census Numbers Very Similar
Week Number Pct Dep of 14 Census Pct Dep of 15 Census Pct Dep of 16 Census Diff 15 and 16 44 17.6% 20.3% 19.1% 1.1% 45 19.6% 22.1% 20.8% 1.3% 46 21.2% 23.5% 22.4% 1.1% 47 22.7% 25.2% 24.1% 1.1% 48 24.0% 26.3% 25.4% 1.0%
SLIDE 21
In Fall and Early Winter, PM Data Varies Considerably YOY
Week Number Pct Diff 15 PM from Census Pct Diff 16 PM from Census Diff 15 and 16 44 65.1% 58.3% 6.8% 45 70.1% 63.4% 6.7% 46 74.3% 66.8% 7.5% 47 77.6% 71.6% 6.0% 48 80.2% 74.1% 6.1%
SLIDE 22 Weeks 9-15, PM Projections More Accurate than Deposit Projections
Week Number Pct Diff 15 PM from Census Pct Diff 16 PM from Census Diff Col F and G 9 95.1% 94.8% 0.3% 10 96.0% 96.4%
11 98.1% 98.0% 0.2% 12 98.2% 98.7%
13 99.2% 99.6%
14 100.3% 99.4% 0.9% 15 100.8% 100.6% 0.2%
SLIDE 23
Problem of Interest: Enrollment
SLIDE 24 What Kind of Data Processing Do We Need?
- Identifiable population of potential enrollees
- One record per person, containing all
explanatory information
- Assurance that all included variables have
same interpretation for past and present data (e.g. date related fields)
SLIDE 25 Variables of Interest
We use a wide variety of potentially informative variables:
- Location data: State of residence, distance to
UI/ISU/UNI, raw latitude/longitude
- Preference data: ACT/SAT data on college choice, size
preference, max tuition, college type etc.
- Interest data: campus visits, orientation attendance,
self-initiated inquiries, intended major, time since first contact
- Demographic data: Parents’ education level, financial
aid status
SLIDE 26 Model Output and Potential Applications
- We produce an enrollment probability estimate for each
member of the active admit population
- Application 1: Evaluate individual outcomes
– Targeted messaging – Early picture of likely melt – Indications of factors driving enrollment, and opportunities to intervene
- Application 2: Estimate the total number of likely attendees
– Difficult problem due to shifting population (stealth applicants) – Can inform financial aid spending estimates, enrollment rates by demographic/location factors
SLIDE 27 Desirable features
We want statistical techniques with good:
- 1. Predictive ability
- 2. Robustness to inclusion of extraneous
information
- 3. Capability to explore complex relationships
between explanatory variables and outcome measures
SLIDE 28 Trees: Basic Concepts
- Pro: Easy to interpret
- Pro: Capture complex
relationships within, and interactions between, variables of many types
- Con: Highly variable
- Con: Difficult to find ‘best’
tree, generally grown with greedy algorithm
- Con: Can’t easily capture
linear relationships
SLIDE 29 Gradient Boosted Trees
- Instead of using single trees, or averages of many trees
(random forests), we grow many trees sequentially.
- Each new tree contributes a small amount to the
classification (enroll/not enroll)
- This procedure is performed under cross-validation to
prevent overfitting
- Implemented via the gbm package in R.
- Details require arguments from numerical optimization
(see Hastie, Tibshirani, and Friedman 2009, Chapter 10)
SLIDE 30 Parameters to Fine Tune Model Accuracy
- Tree depth
- Number of features
– Available – Maximum included
- Minimum leaf size
- Number of trees
SLIDE 31 Performance Measures
Prediction Actual Outcome Enroll Don’t Enroll Did Enroll True Positive False Negative Didn’t Enroll False Positive True Negative
Sensitivity = # True Positive / # Did Enrolled Sensitivity = True Positive Rate Specificity = # True Negative / # Didn’t Enroll 1-Specificity = False Positive Rate
SLIDE 32
Accuracy Over Time
SLIDE 33
Accuracy Over Time
SLIDE 34
Accuracy Over Time
SLIDE 35
Accuracy Over Time
SLIDE 36
Summary Reports of Admissions Index
[show examples]
SLIDE 37 Issues: Model Changes Due to External Circumstances
Examples to look out for:
- Housing
- Financial Aid
- Snapshot Alignment
- Shifts in Enrollment Patterns
SLIDE 38 Key Behaviors for Admit Model: When Patterns Change Model Affected
- Setting up a Hawk-ID after admission
- Visiting campus
- Filing a FAFSA
- Accepting admission
- Applying for/completing housing app
- Registering for orientation
SLIDE 39 Predictive Modeling Enhancements
- Predicted first-year GPA for entering
freshmen
– First year to second year – Second year to third year – Third year to fourth year
– Four-year and six-year graduation likelihood
SLIDE 40 Challenges of Retention Models and How They Differ from Recruit Models
- Student academic profile and course-taking
behaviors don’t change frequently in high school
- At UI we have a great deal more data that
changes frequently:
– Changes in program of study and college – Mid-term, term, and cumulative grades – Course drops and adds
SLIDE 41 Discussion
- Who is using predictive modelling?
- If so, what for?
- Do you have applications in mind and are
interested in learning more?