Workshop 7.2a: Introduction to Linear models Murray Logan 19 Jul - PowerPoint PPT Presentation

Workshop 7.2a: Introduction to Linear models Murray Logan 19 Jul 2017

Section 1 Revision

Aims of statistical modelling Use samples to: • Describe relationships • Inference testing (relationships/effects) • Predictive models

Mathematical models 12 10 8 y 6 y = β 0 + β 1 x 4 2 0 0 1 2 3 4 5 6 x

Statistical models 12 ● 10 ● ● 8 y 6 ● ● y = β 0 + β 1 x + ε 4 ● ε ~ Norm ( 0 , σ 2 ) ● 2 0 0 1 2 3 4 5 6 x

Linear models 12 ● 10 ● ● 8 y 6 ● ● y = β 0 + β 1 x + ε 4 ● ● 2 0 0 1 2 3 4 5 6 x

Linear models 120 ● 100 ● 80 y 60 ● 40 ● y = β 0 + β 1 x + β 2 x 2 20 ● ● ● 0 0 1 2 3 4 5 6 x

Non-linear models 1500 ● 1000 y y = αβ x 500 ● ● ● ● 0 ● ● 0 1 2 3 4 5 6 x

Linear models y i = + × + β 0 β 1 x 1 ϵ 1 variable = population response + population × predictor + error intercept slope variable � �� Stoichastic component � �� intercept term slope term � �� Systematic component

Linear models y i = + × + β 0 β 1 x 1 ε 1 response intercept single value × predictor slope vector = + + error single value vector � �� Stoichastic component � �� intercept term slope term � �� Systematic component

Vectors and Matrices Vector Matrix     3 . 0 1 0 2 . 5 1 1         6 . 0 1 2         5 . 5 1 3         9 . 0 1 4         8 . 6 1 5     12 . 0 1 6 Has length ONLY Has length AND width

Estimation 12 ● 10 ● ● 8 y 6 ● ● y = β 0 + β 1 x + ε 4 ● ● 2 0 0 1 2 3 4 5 6 x Ordinary Least Squares

Estimation Y X 3 0 2.5 1 6 2 5.5 3 9 4 8.6 5 12 6 3 . 0 = β 0 × 1 + β 1 × 0 + ε 1 2 . 5 = β 0 × 1 + β 1 × 1 + ε 1 6 . 0 = β 0 × 1 + β 1 × 2 + ε 2 5 . 5 = β 0 × 1 + β 1 × 3 + ε 3

Estimation 3 . 0 = β 0 × 1 + β 1 × 0 + ε 1 β 0 × 1 β 1 × 1 2 . 5 = + + ε 1 6 . 0 = β 0 × 1 + β 1 × 2 + ε 2 5 . 5 = β 0 × 1 + β 1 × 3 + ε 3 9 . 0 = β 0 × 1 + β 1 × 4 + ε 4 8 . 6 = β 0 × 1 + β 1 × 5 + ε 5 12 . 0 = β 0 × 1 + β 1 × 6 + ε 6     3 . 0 1 0   ε 1 2 . 5 1 1     ε 2       6 . 0 1 2     ( β 0 )   ε 3       5 . 5 = 1 3 × +       β 1 ε 4       9 . 0 1 4       � �� ε 5       8 . 6 1 5 Parameter vector     ε 6 12 . 0 1 6 � �� Residual vector Response values Model matrix

Inference testing Ho: β 1 = 0 (slope equals zero) The t-statistic param t = SE param t = β 1 SE β 1

Inference testing Ho: β 1 = 0 (slope equals zero) The t-statistic and the t distribution −4 −2 0 2 4

Section 2 Linear model Assumptions

Assumptions • Independence - unbiased, scale of treatment • Normality - residuals • Homogeneity of variance - residuals • Linearity

Assumptions y l i t r m a N o ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● y ● y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Assumptions c e i a n v a r o f t y n e i o g e H o m ● ● ● ● ● Residuals ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Y ● ● ● ● y ● res ● ● y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● X x Predicted x ● ● ● ● ● Residuals ● ● ● ● ● ● ● ● ● ● ● ● res Y y ● y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● X x Predicted x

Assumptions y r i t n e a L i Trendline ● ● 60 ● ● 2.5 ● ● ● ● ● ● ● ● 50 ● 2.0 ● 40 ● ● ● ● ● ● 1.5 ● ● 30 ● ● ● 1.0 ● ● ● 20 ● ● ● ● ● ● ● 0.5 10 ● ● ● ● ● 0 0.0 0 5 10 15 20 25 30 0 5 10 15 20 25 30

Assumptions y r i t n e a L i Loess (lowess) smoother ● 60 ● ● ● ● 50 ● ● 40 ● ● 30 ● ● 20 ● ● ● ● ● 10 ● ● ● ● 0 0 5 10 15 20 25 30 ● 2.5 ● ● ● ● ● ● ● 2.0 ●

Assumptions i t y e a r L i n Spline smoother ● ● 60 ● ● 2.5 ● ● ● ● ● ● ● ● 50 ● 2.0 ● 40 ● ● ● ● ● ● 1.5 ● ● 30 ● ● ● 1.0 ● ● ● 20 ● ● ● ● ● ● ● 0.5 10 ● ● ● ● ● 0 0.0 0 5 10 15 20 25 30 0 5 10 15 20 25 30

Assumptions y i = β 0 + β 1 × x i + ε i ϵ i ∼ N (0 , σ 2 )

Example Make these data and call the data frame DATA Y X 3 0 2.5 1 6 2 5.5 3 9 4 8.6 5 12 6

> DATA <- data.frame (Y= c (3, 2.5, 6.0, 5.5, 9.0, 8.6, 12), X=0:6) Example Make these data and call the data frame DATA Y X 3 0 2.5 1 6 2 5.5 3 9 4 8.6 5 12 6 • try this฀

148 FERTILIZER 1st Qu.:104.5 1st Qu.: 81.25 : 80.0 Min. : 25.00 Min. YIELD > summary (fert) Median :161.5 169 150 6 > fert <- read.csv ('../data/fertilizer.csv', strip.white=T) 125 5 154 100 Median :137.50 Mean 90 'data.frame': > library (INLA) 84 80 90 154 148 169 206 244 212 248 : int $ YIELD 25 50 75 100 125 150 175 200 225 250 $ FERTILIZER: int 2 variables: 10 obs. of > str (fert) :137.50 :248.0 Max. :250.00 Max. 3rd Qu.:210.5 3rd Qu.:193.75 :163.5 Mean 4 75 > fert.inla <- inla (YIELD ~ FERTILIZER, data=fert) 75 6 148 125 5 154 100 4 90 3 169 80 50 2 84 25 1 FERTILIZER YIELD > fert 150 7 3 248 80 50 2 84 25 1 FERTILIZER YIELD > head (fert) 250 175 10 212 225 9 244 200 8 206 > Worked Examples > summary (fert.inla) Call: ฀inla(formula = YIELD ฀ FERTILIZER, data = fert)฀ Time used: Pre-processing Running inla Post-processing Total 0.3043 0.0715 0.0217 0.3974 Fixed effects: mean sd 0.025quant 0.5quant 0.975quant mode kld (Intercept) 51.9341 12.9747 25.9582 51.9335 77.8990 51.9339 0 FERTILIZER 0.8114 0.0836 0.6439 0.8114 0.9788 0.8114 0 The model has no random effects Model hyperparameters: mean sd 0.025quant 0.5quant 0.975quant mode Precision for the Gaussian observations 0.0035 0.0015 0.0012 0.0032 0.007 0.0028 Expected number of effective parameters(std dev): 2.00(0.00) Number of equivalent replicates : 5.00 Marginal log-Likelihood: -61.65

Worked Examples Question: is there a relationship between fertilizer concentration and grass yeild? Linear model: ε ∼ N (0 , σ 2 ) Y i = β 0 + β 1 F i + ε i

> library (car) Example i s l y s a n a t a d a o r y r a t p l o E x > scatterplot (Y~X, data=DATA) 12 ● 10 ● ● 8 Y 6 ● ● 4 ● ● 0 1 2 3 4 5 6 X

> library (car) Example i s l y s a n a t a d a r y a t o l o r E x p > peake <- read.csv ('../data/peake.csv') > scatterplot (SPECIES ~ AREA, data=peake) 25 ● ● ● ● ● ● ● 20 ● ● ● SPECIES 15 ● ● ● ● ● ● ● 10 ● ● ● ● ● ● ● 5 ● 0 5000 10000 15000 20000 25000 AREA ● ● ●

smoother=gamLine) + > scatterplot (SPECIES ~ AREA, data=peake, Example i s l y s a n a t a d a o r y r a t p l o E x 25 ● ● ● ● ● ● ● 20 ● ● ● SPECIES 15 ● ● ● ● ● ● ● 10 ● ● ● ● ● ● ● 5 ● 0 5000 10000 15000 20000 25000 AREA ● ● ●

Workshop 7.2a: Introduction to Linear models Murray Logan 19 Jul - PowerPoint PPT Presentation

Workshop 7.2a: Introduction to Linear models Murray Logan 19 Jul 2017 Section 1 Revision Aims of statistical modelling Use samples to: Describe relationships Inference testing (relationships/effects) Predictive models

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

ECON 950 Winter 2020 Prof. James MacKinnon 9. Going Beyond Linear Models Linear regression,

Outline Statistical inference for linear mixed models general form of linear mixed models

Workshop 7: (Generalized) Linear models Murray Logan July 19, 2017 Table of contents 1

Workshop 10: Non-linear Regression Murray Logan 26-011-2013 Linear models LM y N ( , 2

STAT 401A - Statistical Methods for Research Workers Case statistics Jarad Niemi (Dr. J) Iowa

Basis of CNN and RNN School of Data Science, Fudan

CSE 802 Spring 2017 Deep Learning Inci M. Baytas Michigan State University February 13-15,

Hierarchical Convolutional Features for Visual Tracking Chao Ma Jia-Bin Huang Xiaokang Yang

Welcome and Syllabus STAT 432 | UIUC | Fall 2019 | Dalpiaz Questions? Comments? Concerns? STAT

Economics of Technology A trillion observations to infer social-economic behaviour Klaus

Archimax Copulas Arthur Charpentier charpentier.arthur@uqam.ca http

On the Choice of Parametric Families of Copulas Radu Craiu Department of Statistics University

Workshop 7.2a: Introduction to Linear models Murray Logan 19 Jul - PowerPoint PPT Presentation

Workshop 7.2a: Introduction to Linear models Murray Logan 19 Jul 2017 Section 1 Revision Aims of statistical modelling Use samples to: Describe relationships Inference testing (relationships/effects) Predictive models

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

ECON 950 Winter 2020 Prof. James MacKinnon 9. Going Beyond Linear Models Linear regression,

Outline Statistical inference for linear mixed models general form of linear mixed models

Workshop 7: (Generalized) Linear models Murray Logan July 19, 2017 Table of contents 1

Workshop 10: Non-linear Regression Murray Logan 26-011-2013 Linear models LM y N ( , 2

STAT 401A - Statistical Methods for Research Workers Case statistics Jarad Niemi (Dr. J) Iowa

Basis of CNN and RNN School of Data Science, Fudan

CSE 802 Spring 2017 Deep Learning Inci M. Baytas Michigan State University February 13-15,

Hierarchical Convolutional Features for Visual Tracking Chao Ma Jia-Bin Huang Xiaokang Yang

Welcome and Syllabus STAT 432 | UIUC | Fall 2019 | Dalpiaz Questions? Comments? Concerns? STAT

Economics of Technology A trillion observations to infer social-economic behaviour Klaus

Archimax Copulas Arthur Charpentier charpentier.arthur@uqam.ca http

On the Choice of Parametric Families of Copulas Radu Craiu Department of Statistics University

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE