ui predictive modeling for recruitment retention
play

UI Predictive Modeling for Recruitment & Retention Michael - PowerPoint PPT Presentation

BI Community Presentation January 18, 2017 UI Predictive Modeling for Recruitment & Retention Michael Hovland, Director of Enrollment Mgmt Data Analytics Knute Carter, Assistant Professor, College of Public Health Two Primary Types of


  1. BI Community Presentation January 18, 2017 UI Predictive Modeling for Recruitment & Retention Michael Hovland, Director of Enrollment Mgmt Data Analytics Knute Carter, Assistant Professor, College of Public Health

  2. Two Primary Types of Predictive Enrollment Models at UI • Prospect Models (Introduced 7/15) – All junior and senior prospects • Admit Model (Introduced 3/15) – Begins when students are admitted

  3. Most Important Factors in Enrollment Predictive Modeling • 80-90 variables chosen from: – Student academic ability – Student enrollment preferences/intentions – Length of time students are interested UI – Strength of interest in UI – Student demographics/characteristics – Institutional data (financial aid, housing, and orientation)

  4. What Predictive Modeling Looks Like • Scores are stored in MAUI on a scale of 1-99 – The higher the number, the more likely the student is to enroll • Each probability of enrollment also carries a corresponding percentile rank • To get an enrollment projection for any group, you sum the probabilities and divide by 100 • Most of our enrollment comes from the top 3 deciles or top 30%

  5. Predictive Modeling Scores Are the Beginning, not the End • Updated scores are generated weekly depending on student activity – Some students will go up and others will go down

  6. Benefits/Uses of Predictive Modeling • Identify students more likely to enroll • Predict aggregate enrollment of groups – Admissions counselor territories – High schools – Ability bands – Racial/ethnic groups – UI Colleges and departments – Scholarship program recipients – Students likely to take specific courses

  7. Applications of Predictive Modeling Data in Diverse Areas • Marketing and communications • Financial aid scholarships • Housing • Orientation • Presidential scholarships • Course and section planning • Admissions waiting lists

  8. Use of Predicting Modeling Data in Marketing and Communications • Use the admit model and prospect model to determine which prospective students will receive print publications • Start with a target numeric goal – Omit and protect certain groups of students based on characteristics – Use predictive modeling scores to fill in the gap between the number of protected students and the target numeric goal

  9. Use of Predicting Modeling Data in Financial Aid Scholarships • The overall yield rate for incoming freshmen is under 30% – This means that UI does not spend 70 cents of every dollar of scholarship moneys offered • To project total scholarship costs, staff multiply the cost of every scholarship offered times the probability of enrollment for each student • FA staff also use projected scholarship headcounts to do 6-year cost projections

  10. Use of Predicting Modeling Data in Housing • Enrollment probabilities can be summed to project: – Likely occupancy for each residence hall – Likely size of each living/learning community

  11. Use of Predicting Modeling Data in Orientation • Incoming freshmen attend one of series of on-campus orientation programs throughout the summer • Students are scheduled (and advisors assigned) based on program of study • Orientation staff can use probabilities to determine the likely number of slots needed for each major

  12. Use of Predicting Modeling Data with Presidential Scholarships • Every year the UI awards 20 Presidential Scholarships to incoming freshman • Several hundred scholarship applications are pared down to a group of 30-40 finalists • Probabilities are used to determine how many finalists are to be offered presidential scholarships

  13. Use of Predicting Modeling Data in Course and Section Scheduling • Academic departments can use probabilities to determine the number of adjunct instructors to hire and the number of sections of courses to offer

  14. Use of Predicting Modeling Data with Admissions Waiting Lists • Last year the UI Admissions Office instituted a waiting list for students applying after May 1 • This year the waiting list will come much earlier • Probabilities are used, along with student profile data, to determine how many students to admit from the waiting list

  15. Establishing Multiple Types of Enrollment Predictors • Predictive modeling scores • Longitudinal trends for applications, admits, and admissions acceptances • Housing applications • Admissions deposits • Orientation reservations • FAFSAs received • ACT and SAT scores received

  16. What Have We Learned and Where Do We Go From Here • Technical issues • Challenges of aggregate group predictions early in the admissions cycle • Model changes necessitated by external circumstances • New retention and success models

  17. Individual vs. Aggregate Predictions • Individual is easy; most admissions uses of PM data simply rely on ranking probabilities from top to bottom. It doesn’t matter if a student is a 0.89 or 0.86 • But when you’re summing probabilities to make an aggregate enrollment prediction, it matters a great deal whether a student is 0.89 or 0.86

  18. The Arc of Aggregate Predictions over the Admissions Cycle • Weekly patterns of apps, admits, deposits, and predicted enrollments can vary greatly by time of year – even when the end result is the same • Examples of when predictive modeling is most accurate and when it isn’t

  19. Weekly Patterns of Admissions Acceptances Vary a Great Deal During Year Week Pct 15 Dep Pct 16 Dep Pct 17 Dep Number 2014 Change 1 Yr Change 1 Yr Change 1 Yr 44 29.61% 6.68% 4.44% 45 27.26% 6.32% 7.24% 46 25.15% 7.72% 8.03% 47 25.27% 8.14% 10.25% 48 24.05% 8.78% 13.63% Census % 12.83% 12.89% Cenus # 4061 4582 5173

  20. But Weekly Deposits Compared to Final Census Numbers Very Similar Week Pct Dep of Pct Dep of Pct Dep of Diff 15 and Number 14 Census 15 Census 16 Census 16 44 17.6% 20.3% 19.1% 1.1% 45 19.6% 22.1% 20.8% 1.3% 46 21.2% 23.5% 22.4% 1.1% 47 22.7% 25.2% 24.1% 1.1% 48 24.0% 26.3% 25.4% 1.0%

  21. In Fall and Early Winter, PM Data Varies Considerably YOY Week Pct Diff 15 PM Pct Diff 16 PM Diff 15 and Number from Census from Census 16 44 65.1% 58.3% 6.8% 45 70.1% 63.4% 6.7% 46 74.3% 66.8% 7.5% 47 77.6% 71.6% 6.0% 48 80.2% 74.1% 6.1%

  22. Weeks 9-15, PM Projections More Accurate than Deposit Projections Week Pct Diff 15 PM Pct Diff 16 PM Diff Col F Number from Census from Census and G 9 95.1% 94.8% 0.3% 10 96.0% 96.4% -0.4% 11 98.1% 98.0% 0.2% 12 98.2% 98.7% -0.5% 13 99.2% 99.6% -0.4% 14 100.3% 99.4% 0.9% 15 100.8% 100.6% 0.2%

  23. Problem of Interest: Enrollment

  24. What Kind of Data Processing Do We Need? • Identifiable population of potential enrollees • One record per person, containing all explanatory information • Assurance that all included variables have same interpretation for past and present data (e.g. date related fields)

  25. Variables of Interest We use a wide variety of potentially informative variables: • Location data: State of residence, distance to UI/ISU/UNI, raw latitude/longitude • Preference data: ACT/SAT data on college choice, size preference, max tuition, college type etc. • Interest data: campus visits, orientation attendance, self-initiated inquiries, intended major, time since first contact • Demographic data: Parents’ education level, financial aid status • Many more

  26. Model Output and Potential Applications • We produce an enrollment probability estimate for each member of the active admit population • Application 1: Evaluate individual outcomes – Targeted messaging – Early picture of likely melt – Indications of factors driving enrollment, and opportunities to intervene • Application 2: Estimate the total number of likely attendees – Difficult problem due to shifting population (stealth applicants) – Can inform financial aid spending estimates, enrollment rates by demographic/location factors

  27. Desirable features We want statistical techniques with good: 1. Predictive ability 2. Robustness to inclusion of extraneous information 3. Capability to explore complex relationships between explanatory variables and outcome measures

  28. Trees: Basic Concepts • Pro: Easy to interpret • Pro: Capture complex relationships within, and interactions between, variables of many types • Con: Highly variable • Con: Difficult to find ‘best’ tree, generally grown with greedy algorithm • Con: Can’t easily capture linear relationships

  29. Gradient Boosted Trees • Instead of using single trees, or averages of many trees (random forests), we grow many trees sequentially. • Each new tree contributes a small amount to the classification (enroll/not enroll) • This procedure is performed under cross-validation to prevent overfitting • Implemented via the gbm package in R. • Details require arguments from numerical optimization (see Hastie, Tibshirani, and Friedman 2009, Chapter 10)

  30. Parameters to Fine Tune Model Accuracy • Tree depth • Number of features – Available – Maximum included • Minimum leaf size • Number of trees

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend