Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 - PowerPoint PPT Presentation

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016

Multiple Regression • Extends simple linear regression to the scenario where • Multiple predictors are available • Multiple regression often results in better models of the outcome, as • Very few outcomes are determined by one predictor • Typically, the outcome is jointly determined multiple predictors. • Example: • Video game auction: • What factors may predict the auction price for the video games: • Mario_cart dataset

Mario_Kart Dataset

Simple Linear Regression Revisited • Examine relationships between price and cond_new �� = � � + � × �� + � • What is the estimated values for � • Is it significantly different from 0? • Can you make a plot of the data?

Regression line

Multiple Regression • In many cases, the price can be determined by multiple predictors • In order to achieve a better model for the price, we may want to include multiple predictors in the same model • In the Mario_Kart data, we may consider a model like �� = � + � � × �� + � � ��. �� + � � × �� + � ! × "#��$% + �

0 0 Estimating Parameters • The parameters are estimated so that the sum of squared residuals are minimized, i.e. + � = ∑ ( � - � . )� − � - � . )� … ��& = ∑ ( ) − ( ) − � , − � , ) ) ) 1 - � . )� + � - � . )� + ⋯ is the predicted outcome based upon where ( ) = � , + � the predictors. • The model parameters are estimated such that the observed outcome and predicted outcome “agree” the best. • Can you please estimate the parameters for the Mario_Kart Example?

Why is the Estimate Different from Simple Linear Regression? • How to interpret the estimates from multiple linear regression?

Why is the Estimate Different from Simple Linear Regression? • How to interpret the estimates from multiple linear regression? Answer: Holding everything else constant, a new game cost 10.90 USD more than an old game.

How to Measure How well the Model Fit: R 2 2 2 2 Adjusted Adjusted Adjusted Adjusted R R R • Estimate the amount of variability that can be explained by the model 3 � = 1 − 5�� ) Residual variance; the smaller the better The bigger the better 5�� ( ) • 3 � is biased • Adjusted 3 � : 5�� ) 9 − � − 1 � 3 678 = 1 − 5�� ( ) 9 − 1 • K is the number of predictors • N is the number of sample individuals � is always smaller than the 3 � (why??) • 3 678

How to calculate 3 � from R? summary(lm(formula = totalPr ~ as.numeric(cond), data = data)) Residuals: Min 1Q Median 3Q Max -18.168 -7.771 -3.148 1.857 279.362 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 60.393 7.219 8.366 5.24e-14 *** as.numeric(cond) -6.623 4.343 -1.525 0.13 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 25.57 on 141 degrees of freedom Multiple R-squared: 0.01622, Adjusted R-squared: 0.009244 F-statistic: 2.325 on 1 and 141 DF, p-value: 0.1296

y.new=data$totalPr[data$cond=='new'] y.used=data$totalPr[data$cond=='used'] y.new y.used t.test(y.new,y.used) history() names(data) data$duration data$wheels data$stockPhoto lm(totalPr~duration+stockPhoto+wheels+cond) lm(totalPr~duration+stockPhoto+wheels+cond,data=data) summary(lm(totalPr~duration+stockPhoto+wheels+cond,data=data)) history() dir() res=read.table('babies.csv',header=T,sep=','); baby=read.table('babies.csv',header=T,sep=','); names(baby) history() baby$case names(baby) lm(btw ~ gestation + parity + age + height + weight + smoke, data=baby) lm(bwt ~ gestation + parity + age + height + weight + smoke, data=baby) summary(lm(bwt ~ gestation + parity + age + height + weight + smoke, data=baby)) history

Two P-values • P-values for model fitting • � � : � � = ⋯ = � ; = 0 • � = : � � ≠ 0 or � � ≠ 0 or … � ; ≠ 0 • P-values for testing the statistical significance for each predictor • � � : � 8 = 0 • � = : � 8 ≠ 0

An Warmup Exercise

Questions of Interest • Not all predictors are useful • Including “not useful” predictors in the model will reduce the accuracy of predictors • Full model is the model that contains all predictors • Question: Determine useful predictors from the full model

Approach I • Fit the full model that contains the full set of predictors • Determine which predictors are important by looking at • P-values for testing � � : � 8 = 0 • Predictor ? is important if p-values are significant for testing � �

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 - PowerPoint PPT Presentation

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple Regression Extends simple linear regression to the scenario where Multiple predictors are available Multiple regression often results in better

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Multiple and Logistic Regression IV Dajiang Liu @PHS 525 Apr-21 st -2016 Review of Last Two

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Multiple Regression and Logistic Regression II Dajiang Liu @PHS 525 Apr-19-2016 Materials from

Outline Outline Review of PSP Levels Overview Resource planning Planning IV: Planning IV:

Modeling and Control of Dynamic Systems Validation Darya Krushevskaya Konstantin Tretyakov

Chapter 11 Output Analysis for a Single Model Banks, Carson, Nelson & Nicol Discrete-Event

Conditional Predictive Inference Post Model Selection Hannes Leeb Department of Statistics Yale

Econ 2148, fall 2019 Shrinkage in the Normal means model Maximilian Kasy Department of

Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security

Lagged Regression again: Transfer Functions To forecast an output series y t given its own past

r trs r tr ts t

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 - PowerPoint PPT Presentation

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple Regression Extends simple linear regression to the scenario where Multiple predictors are available Multiple regression often results in better

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Multiple and Logistic Regression IV Dajiang Liu @PHS 525 Apr-21 st -2016 Review of Last Two

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Multiple Regression and Logistic Regression II Dajiang Liu @PHS 525 Apr-19-2016 Materials from

Outline Outline Review of PSP Levels Overview Resource planning Planning IV: Planning IV:

Modeling and Control of Dynamic Systems Validation Darya Krushevskaya Konstantin Tretyakov

Chapter 11 Output Analysis for a Single Model Banks, Carson, Nelson &amp; Nicol Discrete-Event

Conditional Predictive Inference Post Model Selection Hannes Leeb Department of Statistics Yale

Econ 2148, fall 2019 Shrinkage in the Normal means model Maximilian Kasy Department of

Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security

Lagged Regression again: Transfer Functions To forecast an output series y t given its own past

r trs r tr ts t

Chapter 11 Output Analysis for a Single Model Banks, Carson, Nelson & Nicol Discrete-Event