Week 1, video 2: Regressors Prediction Develop a model which can - PowerPoint PPT Presentation

Week 1, video 2: Regressors

Prediction � Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other aspects of the data (predictor variables) � Sometimes used to predict the future � Sometimes used to make inferences about the present

Prediction: Examples � A student is watching a video in a MOOC right now. � Is he bored or frustrated? � A student has used educational software for the last half hour. � How likely is it that she knows the skill in the next problem? � A student has completed three years of high school. � What will be her score on the college entrance exam?

What can we use this for? � Improved educational design � If we know when students get bored, we can improve that content � Automated decisions by software � If we know that a student is frustrated, let’s offer the student some online help � Informing teachers, instructors, and other stakeholders � If we know that a student is frustrated, let’s tell their teacher

Regression in Prediction � There is something you want to predict (“the label”) � The thing you want to predict is numerical � Number of hints student requests � How long student takes to answer � How much of the video the student will watch � What will the student’s test score be

Regression in Prediction � A model that predicts a number is called a regressor in data mining � The overall task is called regression

Regression � To build a regression model, you obtain a data set where you already know the answer – called the training label � For example, if you want to predict the number of hints the student requests, each value of numhints is Skill ¡ ¡pknow ¡ ¡*me ¡ ¡totalac*ons a training label ¡numhints ¡ ENTERINGGIVEN ¡0.704 ¡ ¡9 ¡ ¡1 ¡ ¡0 ¡ ENTERINGGIVEN ¡0.502 ¡ ¡10 ¡ ¡2 ¡ ¡0 ¡ ¡ USEDIFFNUM ¡0.049 ¡ ¡6 ¡ ¡1 ¡ ¡3 ¡ ¡ ENTERINGGIVEN ¡0.967 ¡ ¡7 ¡ ¡3 ¡ ¡0 ¡ ¡ REMOVECOEFF ¡0.792 ¡ ¡16 ¡ ¡1 ¡ ¡1 ¡ ¡

Regression � Associated with each label are a set of “features”, other variables, which you will try to use to predict the label Skill ¡ ¡pknow ¡ ¡*me ¡ ¡totalac*ons ¡numhints ¡ ENTERINGGIVEN ¡0.704 ¡ ¡9 ¡ ¡1 ¡ ¡0 ¡ ENTERINGGIVEN ¡0.502 ¡ ¡10 ¡ ¡2 ¡ ¡0 ¡ ¡ USEDIFFNUM ¡0.049 ¡ ¡6 ¡ ¡1 ¡ ¡3 ¡ ¡ ENTERINGGIVEN ¡0.967 ¡ ¡7 ¡ ¡3 ¡ ¡0 ¡ ¡ REMOVECOEFF ¡0.792 ¡ ¡16 ¡ ¡1 ¡ ¡1 ¡ ¡ REMOVECOEFF ¡0.792 ¡ ¡13 ¡ ¡2 ¡

Regression � The basic idea of regression is to determine which features, in which combination, can predict the label’s value Skill ¡ ¡pknow ¡ ¡*me ¡ ¡totalac*ons ¡numhints ¡ ENTERINGGIVEN ¡0.704 ¡ ¡9 ¡ ¡1 ¡ ¡0 ¡ ENTERINGGIVEN ¡0.502 ¡ ¡10 ¡ ¡2 ¡ ¡0 ¡ ¡ USEDIFFNUM ¡0.049 ¡ ¡6 ¡ ¡1 ¡ ¡3 ¡ ¡ ENTERINGGIVEN ¡0.967 ¡ ¡7 ¡ ¡3 ¡ ¡0 ¡ ¡ REMOVECOEFF ¡0.792 ¡ ¡16 ¡ ¡1 ¡ ¡1 ¡ ¡ REMOVECOEFF ¡0.792 ¡ ¡13 ¡ ¡2 ¡

Linear Regression � The most classic form of regression is linear regression � Numhints = 0.12*Pknow + 0.932*Time – 0.11*Totalactions Skill ¡ ¡pknow ¡ ¡*me ¡ ¡totalac*ons ¡numhints ¡ COMPUTESLOPE ¡0.544 ¡ ¡9 ¡ ¡1 ¡ ¡? ¡

Quiz Skill ¡ ¡pknow ¡ ¡*me ¡ ¡totalac*ons ¡numhints ¡ COMPUTESLOPE ¡0.322 ¡ ¡15 ¡ ¡4 ¡ ¡? ¡ � Numhints = 0.12*Pknow + 0.932*Time – ¡ 0.11*Totalactions � What is the value of numhints? 8.34 A) 13.58 B) 3.67 C) 9.21 D) FNORD E)

Quiz � Numhints = 0.12*Pknow + 0.932*Time – 0.11*Totalactions � Which of the variables has the largest impact on numhints? (Assume they are scaled the same) Pknow A) Time B) Totalactions C) Numhints D) They are equal E)

However… � These variables are unlikely to be scaled the same! � If Pknow is a probability � From 0 to 1 � We’ll discuss this variable later in the class � And time is a number of seconds to respond � From 0 to infinity � Then you can’t interpret the weights in a straightforward fashion � You need to transform them first

Transform � When you make a new variable by applying some mathematical function to the previous variable � Xt = X 2

Transform: Unitization � Increases interpretability of relative strength of features � Reduces interpretability of individual features Xt = X – M(X) SD(X)

Linear Regression � Linear regression only fits linear functions… � Except when you apply transforms to the input variables � Which most statistics and data mining packages can do for you

Ln(X) 3 2 1 0 -15 -10 -5 0 5 10 15 -1 -2 -3 -4 -5

Sqrt(X) 3.5 3 2.5 2 1.5 1 0.5 0 -15 -10 -5 0 5 10 15

X 2 120 100 80 60 Xt 40 20 0 -15 -10 -5 0 5 10 15

X 3 1500 1000 500 0 Xt -15 -10 -5 0 5 10 15 -500 -1000 -1500

1/X 80 60 40 20 0 -15 -10 -5 0 5 10 15 -20 -40 -60 -80

Sin(X) 1.5 1 0.5 0 -15 -10 -5 0 5 10 15 -0.5 -1 -1.5

Linear Regression � Surprisingly flexible… � But even without that � It is blazing fast � It is often more accurate than more complex models, particularly once you cross-validate � Caruana & Niculescu-Mizil (2006) � It is feasible to understand your model (with the caveat that the second feature in your model is in the context of the first feature, and so on)

Example of Caveat � Let’s graph the relationship between number of graduate students and number of papers per year

Data 16 14 12 10 Papers per year 8 6 4 2 0 0 2 4 6 8 10 12 14 16 Number of graduate students

Data 16 14 12 10 Papers per year 8 Too much time spent 6 filling out personnel 4 action forms? 2 0 0 2 4 6 8 10 12 14 16 Number of graduate students

Model � Number of papers = 4 + 2 * # of grad students - 0.1 * (# of grad students) 2 � But does that actually mean that (# of grad students) 2 is associated with less publication? � No!

Example of Caveat 16 14 12 Papers per year 10 8 6 4 2 0 0 2 4 6 8 10 12 14 16 Number of graduate students � (# of grad students) 2 is actually positively correlated with publications! � r=0.46

Example of Caveat 16 14 12 Papers per year 10 8 6 4 2 0 0 2 4 6 8 10 12 14 16 Number of graduate students � The relationship is only in the negative direction when the number of graduate students is already in the model…

Example of Caveat � So be careful when interpreting linear regression models (or almost any other type of model)

Regression Trees

Regression Trees (non-linear; RepTree) � If X>3 � Y = 2 � else If X<-7 � Y = 4 � Else Y = 3

Linear Regression Trees (linear; M5’) � If X>3 � Y = 2A + 3B � else If X< -7 � Y = 2A – 3B � Else Y = 2A + 0.5B + C

Linear Regression Tree 16 14 12 10 Papers per year 8 6 4 2 0 0 2 4 6 8 10 12 14 16 Number of graduate students

Later Lectures � Other regressors � Goodness metrics for comparing regressors � Validating regressors

Next Lecture � Classifiers – another type of prediction model

Week 1, video 2: Regressors Prediction Develop a model which can - PowerPoint PPT Presentation

Week 1, video 2: Regressors Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other aspects of the data (predictor variables) Sometimes used to predict the future

Week 2 Video 4 Metrics for Regressors Metrics for Regressors Linear Correlation MAE/RMSE

MATH2130-F17 Week 13 Week 14 Week 15, Inner Farid Aliniaeifard Product Space CU BOULDER

Time Matters Week 7 Week 6 Prototyping + Needfinding Week 7 Week 8 Implementation Week 9

Math 610 Section 700 - Recitation week 3 week 4 week 6 week 8 TA: Peng Wei Office: Blocker

Linear Classifiers and Regressors Borrowed with permission from Andrew Moore (CMU)

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Video Games Written and Researched by: Patrick Kania First Video Game The first Video Game made

Galatians: week 3 Galatians 3:1-29 Week 1: Galatians 1:1-2:14 Week 2: Galatians 2:15-21 Week 3:

Week 2 Video 3 Diagnostic Metrics Different Methods, Different Measures Today well continue

Week 2 Video 2 Diagnostic Metrics, Part 1 Different Methods, Different Measures Today well

Image and Video Coding: Improved Inter-Picture Prediction Review of Hybrid Video Coding Last

Image and Video Coding: Intra Prediction & Picture Partitioning Intra-Picture Prediction

Using High Performance Forecasting to measure regressors of a time-series aka. Measuring ROI Tim

kinkyreg: Instrument-free inference for linear regression models with endogenous regressors

Univariate 1-Way ANOVA as a Linear Model with Fixed Regressors Group 1 Group 2 Group 3 x x x

Challenges in Crowd-sourcing The positive side of things 150+ active volunteer translators

A Friendly Caveat Administrative Issues This is not going to be an easy course. You need to

Queryable LINQ Radu Nicolescu Department of Computer Science University of Auckland 10 Oct 2018

Automatically Assessing Code Understandability Reanalyzed : Combined Metrics Matter Asher

Programming Derivatives of RBFs Robert Schaback Georg-August-Universitt Gttingen Akademie

Problem of definition: Jordan vs Einstein frame, beyond slow-roll Godfrey Leung

Investigating the AGN-Merger Connection at z~2 with CANDELS Background Evolution of AGN Fueling

Applied Machine Learning CIML Chap 4 (A Geometric Approach) Equations are just the boring

Week 1, video 2: Regressors Prediction Develop a model which can - PowerPoint PPT Presentation

Week 1, video 2: Regressors Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other aspects of the data (predictor variables) Sometimes used to predict the future

Week 2 Video 4 Metrics for Regressors Metrics for Regressors Linear Correlation MAE/RMSE

MATH2130-F17 Week 13 Week 14 Week 15, Inner Farid Aliniaeifard Product Space CU BOULDER

Time Matters Week 7 Week 6 Prototyping + Needfinding Week 7 Week 8 Implementation Week 9

Math 610 Section 700 - Recitation week 3 week 4 week 6 week 8 TA: Peng Wei Office: Blocker

Linear Classifiers and Regressors Borrowed with permission from Andrew Moore (CMU)

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Video Games Written and Researched by: Patrick Kania First Video Game The first Video Game made

Galatians: week 3 Galatians 3:1-29 Week 1: Galatians 1:1-2:14 Week 2: Galatians 2:15-21 Week 3:

Week 2 Video 3 Diagnostic Metrics Different Methods, Different Measures Today well continue

Week 2 Video 2 Diagnostic Metrics, Part 1 Different Methods, Different Measures Today well

Image and Video Coding: Improved Inter-Picture Prediction Review of Hybrid Video Coding Last

Image and Video Coding: Intra Prediction &amp; Picture Partitioning Intra-Picture Prediction

Using High Performance Forecasting to measure regressors of a time-series aka. Measuring ROI Tim

kinkyreg: Instrument-free inference for linear regression models with endogenous regressors

Univariate 1-Way ANOVA as a Linear Model with Fixed Regressors Group 1 Group 2 Group 3 x x x

Challenges in Crowd-sourcing The positive side of things 150+ active volunteer translators

A Friendly Caveat Administrative Issues This is not going to be an easy course. You need to

Queryable LINQ Radu Nicolescu Department of Computer Science University of Auckland 10 Oct 2018

Automatically Assessing Code Understandability Reanalyzed : Combined Metrics Matter Asher

Programming Derivatives of RBFs Robert Schaback Georg-August-Universitt Gttingen Akademie

Problem of definition: Jordan vs Einstein frame, beyond slow-roll Godfrey Leung

Investigating the AGN-Merger Connection at z~2 with CANDELS Background Evolution of AGN Fueling

Applied Machine Learning CIML Chap 4 (A Geometric Approach) Equations are just the boring

Image and Video Coding: Intra Prediction & Picture Partitioning Intra-Picture Prediction