Its Not Magic Understanding Data Science with Applications in - - PowerPoint PPT Presentation

it s not magic
SMART_READER_LITE
LIVE PREVIEW

Its Not Magic Understanding Data Science with Applications in - - PowerPoint PPT Presentation

Its Not Magic Understanding Data Science with Applications in Enrollment Management North Carolina Association for Institutional Research Conference 2019 Beyond the hype Beyond the hype The hype Buzz about big data, artificial


slide-1
SLIDE 1

It’s Not Magic

Understanding Data Science with Applications in Enrollment Management

North Carolina Association for Institutional Research Conference 2019

slide-2
SLIDE 2

Beyond the hype

slide-3
SLIDE 3

3

  • The hype…
  • Buzz about big data, artificial intelligence, machine learning, predictive analytics
  • The reality…
  • Like any new technology, has its benefits and limitations
  • Can be a powerful tool when combined with organizational buy-in, knowledge and

training

Beyond the hype

slide-4
SLIDE 4

4

DESCRIBE

What happened?

DIAGNOSE

Why did it happen?

MONITOR

What’s happening now?

PREDICT

What might happen?

COMPLEXITY BUSINESS VALUE

Define, measure, report. Explore, explain, act. Model, analyze, predict.

Data science or data analytics?

slide-5
SLIDE 5

5

  • Predict some future state or some current state that is unmeasurable
  • Predictive can also be used to understand the “why” behind the what –
  • The model inputs are as important as the model outcome – are there hidden patterns

that are visible when we control for other factors?

  • Example: What are the common denominators behind students who have dropped
  • ut?

Why data science?

slide-6
SLIDE 6

So you want to build a model

slide-7
SLIDE 7

7

Define Questions Data Assembly Exploration Predictive Modeling

How many new and returning students do we expect next term by academic program? Which students are the most at risk for not returning next term?

Model Competition Testing & Validation Distribute Results

Random Forest Logistic Regression K-Means Clustering

Data science project flow

HelioCampus Proprietary and Confidential

Admissions Enrollment Financial Aid Retention Advancement Financials How is financial aid and need related to yield at our institution?

slide-8
SLIDE 8

Ask the right question

slide-9
SLIDE 9

9

What is next year’s enrollment going to be?

slide-10
SLIDE 10

10

How many new students are enrolling next year? How many students who are currently enrolled are going to come back?

What is next year’s enrollment going to be?

slide-11
SLIDE 11

11

How many new students are enrolling next year?

  • Questions:
  • How many applications are we expecting?
  • If a given student applies, what is the likelihood that they will enroll?

How many students who are currently enrolled are going to come back?

  • Questions:
  • Who is likely to graduate?
  • Who is likely to persist or drop out?

What is next year’s enrollment going to be?

slide-12
SLIDE 12

12

How many new students are enrolling next year?

  • Questions:
  • How many applications are we expecting?
  • If a given student applies, what is the likelihood that they will enroll?
  • Universe:
  • First time freshmen
  • Transfers
  • Certain majors/colleges

How many students who are currently enrolled are going to come back?

  • Questions:
  • Who is likely to graduate?
  • Who is likely to persist or drop out?
  • Universe:
  • Segmented by credit hours

What is next year’s enrollment going to be?

slide-13
SLIDE 13

Garbage in, garbage out

slide-14
SLIDE 14

14

How many new students are enrolling next year?

  • Daily applications entered into the system
  • Applicant-level data including HS academics, test scores, demographics

How many students who are currently enrolled are going to come back?

  • Student-level data: credits, grades, demographics
  • Historical datasets of previous students who were enrolled and did / did not re-enroll

Data: the foundation of the model

slide-15
SLIDE 15

Show me the magic

slide-16
SLIDE 16

16

A model is a set of rules used to turn a set of inputs into an output. An algorithm is how we come up with those rules.

What is a model?

slide-17
SLIDE 17

17

Train the model: 𝑏𝑚𝑕𝑝𝑠𝑗𝑢ℎ𝑛 𝑗𝑜𝑞𝑣𝑢𝑡 → 𝑠𝑣𝑚𝑓𝑡 Apply the model: 𝑠𝑣𝑚𝑓𝑡 𝑗𝑜𝑞𝑣𝑢𝑡 → 𝑝𝑣𝑢𝑞𝑣𝑢

What is a model?

slide-18
SLIDE 18

18

Algorithms ahoy!

CLASSIFICATION

Enrollment Prediction Identifying admitted students who are most likely to enroll

K-Nearest Neighbors Random Forest

CLUSTERING

Student Segmentation Finding related sub-populations of students

K-Means Hierarchical Clustering

REGRESSION

Attribute Importance/ Influence on Retention Understanding top predictors that correlate with retention

Logistic Regression Linear Regression

DIMENSIONALITY REDUCTION

Simplifying and Combining Attributes Discovering correlated attributes and streamlining analyses

Randomized PCA Kernel Approximation

slide-19
SLIDE 19

19

Inputs:

  • Independent variables: student’s cumulative GPA, cumulative credits, total

dropped classes, full or part time, financial aid status, number of previous terms enrolled

  • Dependent variable: whether the student re-enrolled

Algorithm:

  • Elastic net regression

Output:

  • 0 to 1 “score”

Modeling re-enrollment likelihood

slide-20
SLIDE 20

Measure twice, cut once

slide-21
SLIDE 21

21

  • Evaluate the model:

𝑏𝑚𝑕𝑝𝑠𝑗𝑢ℎ𝑛 𝑢𝑓𝑡𝑢 𝑗𝑜𝑞𝑣𝑢𝑡 → 𝑝𝑣𝑢𝑞𝑣𝑢 𝑛𝑝𝑒𝑓𝑚 𝑝𝑣𝑢𝑞𝑣𝑢 ~ 𝑏𝑑𝑢𝑣𝑏𝑚 𝑝𝑣𝑢𝑞𝑣𝑢

How do we know it works?

slide-22
SLIDE 22

22

How do we know it works?

22

slide-23
SLIDE 23

23

How do we know it works?

slide-24
SLIDE 24

Showtime

slide-25
SLIDE 25

25

  • Build out infrastructure
  • Table inside a SQL database
  • Script that runs regularly to refresh the model
  • Train and deploy to end users
  • Dashboard or other front-end tool
  • Documentation and training materials

How are we going to use it?

slide-26
SLIDE 26

Questions