UI Predictive Modeling for Recruitment & Retention Michael - - PowerPoint PPT Presentation

ui predictive modeling for recruitment retention
SMART_READER_LITE
LIVE PREVIEW

UI Predictive Modeling for Recruitment & Retention Michael - - PowerPoint PPT Presentation

BI Community Presentation January 18, 2017 UI Predictive Modeling for Recruitment & Retention Michael Hovland, Director of Enrollment Mgmt Data Analytics Knute Carter, Assistant Professor, College of Public Health Two Primary Types of


slide-1
SLIDE 1

UI Predictive Modeling for Recruitment & Retention

Michael Hovland, Director of Enrollment Mgmt Data Analytics Knute Carter, Assistant Professor, College of Public Health

BI Community Presentation

January 18, 2017

slide-2
SLIDE 2

Two Primary Types of Predictive Enrollment Models at UI

  • Prospect Models (Introduced 7/15)

– All junior and senior prospects

  • Admit Model (Introduced 3/15)

– Begins when students are admitted

slide-3
SLIDE 3

Most Important Factors in Enrollment Predictive Modeling

  • 80-90 variables chosen from:

– Student academic ability – Student enrollment preferences/intentions – Length of time students are interested UI – Strength of interest in UI – Student demographics/characteristics – Institutional data (financial aid, housing, and

  • rientation)
slide-4
SLIDE 4

What Predictive Modeling Looks Like

  • Scores are stored in MAUI on a scale of 1-99

– The higher the number, the more likely the student is to enroll

  • Each probability of enrollment also carries a

corresponding percentile rank

  • To get an enrollment projection for any group,

you sum the probabilities and divide by 100

  • Most of our enrollment comes from the top 3

deciles or top 30%

slide-5
SLIDE 5

Predictive Modeling Scores Are the Beginning, not the End

  • Updated scores are generated weekly

depending on student activity

– Some students will go up and others will go down

slide-6
SLIDE 6

Benefits/Uses of Predictive Modeling

  • Identify students more likely to enroll
  • Predict aggregate enrollment of groups

– Admissions counselor territories – High schools – Ability bands – Racial/ethnic groups – UI Colleges and departments – Scholarship program recipients – Students likely to take specific courses

slide-7
SLIDE 7

Applications of Predictive Modeling Data in Diverse Areas

  • Marketing and communications
  • Financial aid scholarships
  • Housing
  • Orientation
  • Presidential scholarships
  • Course and section planning
  • Admissions waiting lists
slide-8
SLIDE 8

Use of Predicting Modeling Data in Marketing and Communications

  • Use the admit model and prospect model to

determine which prospective students will receive print publications

  • Start with a target numeric goal

– Omit and protect certain groups of students based on characteristics – Use predictive modeling scores to fill in the gap between the number of protected students and the target numeric goal

slide-9
SLIDE 9

Use of Predicting Modeling Data in Financial Aid Scholarships

  • The overall yield rate for incoming freshmen is

under 30%

– This means that UI does not spend 70 cents of every dollar of scholarship moneys offered

  • To project total scholarship costs, staff multiply

the cost of every scholarship offered times the probability of enrollment for each student

  • FA staff also use projected scholarship

headcounts to do 6-year cost projections

slide-10
SLIDE 10

Use of Predicting Modeling Data in Housing

  • Enrollment probabilities can be summed to

project:

– Likely occupancy for each residence hall – Likely size of each living/learning community

slide-11
SLIDE 11

Use of Predicting Modeling Data in Orientation

  • Incoming freshmen attend one of series of
  • n-campus orientation programs throughout

the summer

  • Students are scheduled (and advisors

assigned) based on program of study

  • Orientation staff can use probabilities to

determine the likely number of slots needed for each major

slide-12
SLIDE 12

Use of Predicting Modeling Data with Presidential Scholarships

  • Every year the UI awards 20 Presidential

Scholarships to incoming freshman

  • Several hundred scholarship applications

are pared down to a group of 30-40 finalists

  • Probabilities are used to determine how

many finalists are to be offered presidential scholarships

slide-13
SLIDE 13

Use of Predicting Modeling Data in Course and Section Scheduling

  • Academic departments can use probabilities

to determine the number of adjunct instructors to hire and the number of sections of courses to offer

slide-14
SLIDE 14

Use of Predicting Modeling Data with Admissions Waiting Lists

  • Last year the UI Admissions Office instituted

a waiting list for students applying after May 1

  • This year the waiting list will come much

earlier

  • Probabilities are used, along with student

profile data, to determine how many students to admit from the waiting list

slide-15
SLIDE 15

Establishing Multiple Types of Enrollment Predictors

  • Predictive modeling scores
  • Longitudinal trends for applications, admits,

and admissions acceptances

  • Housing applications
  • Admissions deposits
  • Orientation reservations
  • FAFSAs received
  • ACT and SAT scores received
slide-16
SLIDE 16

What Have We Learned and Where Do We Go From Here

  • Technical issues
  • Challenges of aggregate group predictions

early in the admissions cycle

  • Model changes necessitated by external

circumstances

  • New retention and success models
slide-17
SLIDE 17

Individual vs. Aggregate Predictions

  • Individual is easy; most admissions uses of

PM data simply rely on ranking probabilities from top to bottom. It doesn’t matter if a student is a 0.89 or 0.86

  • But when you’re summing probabilities to

make an aggregate enrollment prediction, it matters a great deal whether a student is 0.89 or 0.86

slide-18
SLIDE 18

The Arc of Aggregate Predictions

  • ver the Admissions Cycle
  • Weekly patterns of apps, admits, deposits,

and predicted enrollments can vary greatly by time of year – even when the end result is the same

  • Examples of when predictive modeling is

most accurate and when it isn’t

slide-19
SLIDE 19

Weekly Patterns of Admissions Acceptances Vary a Great Deal During Year

Week Number 2014 Pct 15 Dep Change 1 Yr Pct 16 Dep Change 1 Yr Pct 17 Dep Change 1 Yr 44 29.61% 6.68% 4.44% 45 27.26% 6.32% 7.24% 46 25.15% 7.72% 8.03% 47 25.27% 8.14% 10.25% 48 24.05% 8.78% 13.63% Census % 12.83% 12.89% Cenus # 4061 4582 5173

slide-20
SLIDE 20

But Weekly Deposits Compared to Final Census Numbers Very Similar

Week Number Pct Dep of 14 Census Pct Dep of 15 Census Pct Dep of 16 Census Diff 15 and 16 44 17.6% 20.3% 19.1% 1.1% 45 19.6% 22.1% 20.8% 1.3% 46 21.2% 23.5% 22.4% 1.1% 47 22.7% 25.2% 24.1% 1.1% 48 24.0% 26.3% 25.4% 1.0%

slide-21
SLIDE 21

In Fall and Early Winter, PM Data Varies Considerably YOY

Week Number Pct Diff 15 PM from Census Pct Diff 16 PM from Census Diff 15 and 16 44 65.1% 58.3% 6.8% 45 70.1% 63.4% 6.7% 46 74.3% 66.8% 7.5% 47 77.6% 71.6% 6.0% 48 80.2% 74.1% 6.1%

slide-22
SLIDE 22

Weeks 9-15, PM Projections More Accurate than Deposit Projections

Week Number Pct Diff 15 PM from Census Pct Diff 16 PM from Census Diff Col F and G 9 95.1% 94.8% 0.3% 10 96.0% 96.4%

  • 0.4%

11 98.1% 98.0% 0.2% 12 98.2% 98.7%

  • 0.5%

13 99.2% 99.6%

  • 0.4%

14 100.3% 99.4% 0.9% 15 100.8% 100.6% 0.2%

slide-23
SLIDE 23

Problem of Interest: Enrollment

slide-24
SLIDE 24

What Kind of Data Processing Do We Need?

  • Identifiable population of potential enrollees
  • One record per person, containing all

explanatory information

  • Assurance that all included variables have

same interpretation for past and present data (e.g. date related fields)

slide-25
SLIDE 25

Variables of Interest

We use a wide variety of potentially informative variables:

  • Location data: State of residence, distance to

UI/ISU/UNI, raw latitude/longitude

  • Preference data: ACT/SAT data on college choice, size

preference, max tuition, college type etc.

  • Interest data: campus visits, orientation attendance,

self-initiated inquiries, intended major, time since first contact

  • Demographic data: Parents’ education level, financial

aid status

  • Many more
slide-26
SLIDE 26

Model Output and Potential Applications

  • We produce an enrollment probability estimate for each

member of the active admit population

  • Application 1: Evaluate individual outcomes

– Targeted messaging – Early picture of likely melt – Indications of factors driving enrollment, and opportunities to intervene

  • Application 2: Estimate the total number of likely attendees

– Difficult problem due to shifting population (stealth applicants) – Can inform financial aid spending estimates, enrollment rates by demographic/location factors

slide-27
SLIDE 27

Desirable features

We want statistical techniques with good:

  • 1. Predictive ability
  • 2. Robustness to inclusion of extraneous

information

  • 3. Capability to explore complex relationships

between explanatory variables and outcome measures

slide-28
SLIDE 28

Trees: Basic Concepts

  • Pro: Easy to interpret
  • Pro: Capture complex

relationships within, and interactions between, variables of many types

  • Con: Highly variable
  • Con: Difficult to find ‘best’

tree, generally grown with greedy algorithm

  • Con: Can’t easily capture

linear relationships

slide-29
SLIDE 29

Gradient Boosted Trees

  • Instead of using single trees, or averages of many trees

(random forests), we grow many trees sequentially.

  • Each new tree contributes a small amount to the

classification (enroll/not enroll)

  • This procedure is performed under cross-validation to

prevent overfitting

  • Implemented via the gbm package in R.
  • Details require arguments from numerical optimization

(see Hastie, Tibshirani, and Friedman 2009, Chapter 10)

slide-30
SLIDE 30

Parameters to Fine Tune Model Accuracy

  • Tree depth
  • Number of features

– Available – Maximum included

  • Minimum leaf size
  • Number of trees
slide-31
SLIDE 31

Performance Measures

Prediction Actual Outcome Enroll Don’t Enroll Did Enroll True Positive False Negative Didn’t Enroll False Positive True Negative

Sensitivity = # True Positive / # Did Enrolled Sensitivity = True Positive Rate Specificity = # True Negative / # Didn’t Enroll 1-Specificity = False Positive Rate

slide-32
SLIDE 32

Accuracy Over Time

slide-33
SLIDE 33

Accuracy Over Time

slide-34
SLIDE 34

Accuracy Over Time

slide-35
SLIDE 35

Accuracy Over Time

slide-36
SLIDE 36

Summary Reports of Admissions Index

[show examples]

slide-37
SLIDE 37

Issues: Model Changes Due to External Circumstances

Examples to look out for:

  • Housing
  • Financial Aid
  • Snapshot Alignment
  • Shifts in Enrollment Patterns
slide-38
SLIDE 38

Key Behaviors for Admit Model: When Patterns Change Model Affected

  • Setting up a Hawk-ID after admission
  • Visiting campus
  • Filing a FAFSA
  • Accepting admission
  • Applying for/completing housing app
  • Registering for orientation
slide-39
SLIDE 39

Predictive Modeling Enhancements

  • Predicted first-year GPA for entering

freshmen

  • Retention predictions

– First year to second year – Second year to third year – Third year to fourth year

  • Graduation predictions

– Four-year and six-year graduation likelihood

slide-40
SLIDE 40

Challenges of Retention Models and How They Differ from Recruit Models

  • Student academic profile and course-taking

behaviors don’t change frequently in high school

  • At UI we have a great deal more data that

changes frequently:

– Changes in program of study and college – Mid-term, term, and cumulative grades – Course drops and adds

slide-41
SLIDE 41

Discussion

  • Who is using predictive modelling?
  • If so, what for?
  • Do you have applications in mind and are

interested in learning more?