Marcel Dettling Institute fr Datenanalyse und Prozessdesign Zrcher - PowerPoint PPT Presentation

Applied Statistical Regression HS 2011 – Week 03 Marcel Dettling Institute für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften marcel.dettling@zhaw.ch http://stat.ethz.ch/~dettling ETH Zürich, October 10, 2011 Marcel Dettling, Zurich University of Applied Sciences 1

Applied Statistical Regression HS 2011 – Week 03 Simple Linear Regression Example : In India, it was observed that alkaline soil hampers plant growth. This gave rise to a search for tree species which show high tolerance against these conditions. An outdoor trial was performed, where 120 trees of a particular species were planted on a big field with considerable soil pH- value variation. After 3 years of growth, every trees height was measured. Additionally, the pH-value of the soil in the vicinity of each tree was determined and recorded. Marcel Dettling, Zurich University of Applied Sciences 2

Applied Statistical Regression HS 2011 – Week 03 Scatterplot: Tree Height vs. pH-value Tree Height vs. pH-Value 7 6 5 height 4 3 2 7.5 8.0 8.5 phvalue Marcel Dettling, Zurich University of Applied Sciences 3

Applied Statistical Regression HS 2011 – Week 03 Systematic Relation What is a good description Tree Height vs. pH-Value of the systematic relation 7 between pH-value and 6 tree height? height 5 1) a line connecting all the 4 data points? 3 2 7.5 8.0 8.5 phvalue Marcel Dettling, Zurich University of Applied Sciences 4

Applied Statistical Regression HS 2011 – Week 03 Systematic Relation What is a good description Tree Height vs. pH-Value of the systematic relation 7 between pH-value and 6 tree height? height 5 1) a line connecting all the 4 data points? 3 2) a smooth line that tries 2 to follow the data? 7.5 8.0 8.5 phvalue Marcel Dettling, Zurich University of Applied Sciences 5

Applied Statistical Regression HS 2011 – Week 03 Systematic Relation What is a good description Tree Height vs. pH-Value of the systematic relation 7 between pH-value and 6 tree height? height 5 1) a line connecting all the 4 data points? 3 2) a smooth line that tries 2 to follow the data? 7.5 8.0 8.5 3) a straight line? phvalue Marcel Dettling, Zurich University of Applied Sciences 6

Applied Statistical Regression HS 2011 – Week 03 Simple Linear Regression The higher the pH-value, the smaller the trees tend to be. The relation seems to be linear, which is of course also the mathe- matically most simple way of describing the relation.           f x ( ) x height 1 ( pH value ) , resp. 0   Name/meaning of the two "Intercept" 0   parameters in the equation: "Slope" 1 Fitting a straight line into a 2-dimensional scatter plot is known as simple linear regression . This is because: • there is just one single predictor variable (" simple "). • the relation is linear in the parameters (" linear "). Marcel Dettling, Zurich University of Applied Sciences 7

Applied Statistical Regression HS 2011 – Week 03 Model, Data & Random Errors No we are bringing the data into play. The regression line will not run through all the data points. Thus, there are random errors:       y x E i 1,..., n , for all i i i Meaning of variables/parameters: y i is the response variable (height) of observation . i x i is the predictor variable (pH-value) of observation . i   , are the regression coefficients. They are unknown 0 1 previously, and need to be estimated from the data. is the residual or error, i.e. the random difference bet- E i ween observation and regression line. Marcel Dettling, Zurich University of Applied Sciences 8

Applied Statistical Regression HS 2011 – Week 03 Least Squares Fitting  http://hspm.sph.sc.edu/courses/J716/demos/LeastSquares/LeastSquaresDemo.html We need to fit a straight line that fits the data well. Many possible solutions exist, some are good, some are worse. Our paradigm is to fit the line such that the squared errors are minimal. Marcel Dettling, Zurich University of Applied Sciences 9

Applied Statistical Regression HS 2011 – Week 03 Least Squares: Mathematics The paradigm in verbatim... ( , x y ) Given a set of data points , the goal is to fit the  1,..., i i i n regression line such that the sum of squared differences y between observed value and regression line is minimal. i The function n n n               2 2 2 ˆ ( , ) ( ) ( ( )) min! Q r y y y x 0 1 i i i i 0 i    i 1 i 1 i 1   measures, how well the regression line, defined by , , fits 0 1 the data. The goal is to minimize the function. Solution :  see next slide... Marcel Dettling, Zurich University of Applied Sciences 10

Applied Statistical Regression HS 2011 – Week 03 Solution Idea: Partial Derivatives Q   • We are taking partial derivatives on the function with ( , ) 0 1   respect to both arguments and . As we are after the 0 1 minimum of the function, we set them to zero:   Q Q   and 0 0     0 1 • This results in a linear equation system, which (here) has two   , unknowns , but also two equations. These are also 0 1 known under the name normal equations .   , • The solution for can be written explicitly as a function of 0 1 ( , x y ) the data pairs , see next slide...  i i i 1,..., n Marcel Dettling, Zurich University of Applied Sciences 11

Applied Statistical Regression HS 2011 – Week 03 Least Squares: Solution According to the least squares paradigm, the best fitting regression line is, i.e. the optimal coefficients are: n    ( x x )( y y ) i i     ˆ ˆ   ˆ  und y x i 1 0 1 1 n   2 ( x x ) i  i 1 ( , x y ) • For a given set of data points , we can determine  i i i 1,..., n the solution with a pocket calculator (...or better, with R). • The solution for our example "Tree Height":      ˆ ˆ 3.003, 28.723 1 0  lm(height ~ phvalue, data=treeheight) Marcel Dettling, Zurich University of Applied Sciences 12

Applied Statistical Regression HS 2011 – Week 03 Least Squares Regression Line Tree Height vs. pH-Value 7 6 Tree Height 5 4 3 2 7.5 8.0 8.5 pH-Value Marcel Dettling, Zurich University of Applied Sciences 13

Applied Statistical Regression HS 2011 – Week 03 Is This a Good Model for Predicting the Tree Height from the Soil pH-Value? a) Beyond the range of observed data Unknown, but most likely not... b) Within the range of observed data Yes, under the following conditions: E E  - the relation is in truth a straight line, i.e. [ ] 0 i   2 - the scatter of the errors is constant, i.e. Var E ( ) i - the data are uncorrelated (from a representative sample) - the errors are approximately normally distributed  Fodder for thougt: irrigation, shaded corners...? Marcel Dettling, Zurich University of Applied Sciences 14

Applied Statistical Regression HS 2011 – Week 03 Model Diagnostics For assessing the quality of the regression line, we need to (at least roughly) check whether the assumptions are met: E E    2 and can be reviewed by: [ ] 0 Var E ( ) i i Residuals vs. pH-Value Residuals vs. Fitted Values 4 4 2 2 Residuals Residuals 0 0 -2 -2 -4 -4 7.5 8.0 8.5 2 3 4 5 6 7 15 pH-Value Fitted Values

Applied Statistical Regression HS 2011 – Week 03 Model Diagnostics For assessing the quality of the regression line, we need to (at least roughly) check whether the assumptions are met: Gaussian distribution can be reviewed by: Normal Plot We will revisit model diagnostics 2 again later in this course, where 1 it will be discussed more deeply. Residuals 0 -1 "Residuals vs. Fitted" and the -2 "Normal Plot" will always stay at -3 the heart of model diagnostics. -2 -1 0 1 2 Quantiles of the Gaussian Distribution 16

Applied Statistical Regression HS 2011 – Week 03 Why Least Squares? History... Within a few years (1801, 1805), the method was developed independently by Gauss and Legendre. Both were after solving applied problems in astronomy... Source:  http://de.wikipedia.org/wiki/Methode_der_kleinsten_Quadrate Carl Friedrich Gauss Adrien-Marie Legendre Marcel Dettling, Zurich University of Applied Sciences 17

Applied Statistical Regression HS 2011 – Week 03 Why Least Squares? Mathematics... • Least Squares is simple in the sense that the solution is ( , x y ) known in closed form as a function of .  i i i 1,..., n ( , ) x y • The line runs through the center of gravity n   r 0 • The sum of residuals adds up to zero: i  i 1 • Some deeper mathematical optimality can be shown when   ˆ ˆ , analyzing the large sample properties of the estimates 0 1 This is especially true under the assumption of normally distributed errors . E i Marcel Dettling, Zurich University of Applied Sciences 18

Marcel Dettling Institute fr Datenanalyse und Prozessdesign Zrcher - PowerPoint PPT Presentation

Applied Statistical Regression HS 2011 Week 03 Marcel Dettling Institute fr Datenanalyse und Prozessdesign Zrcher Hochschule fr Angewandte Wissenschaften marcel.dettling@zhaw.ch http://stat.ethz.ch/~dettling ETH Zrich, October 10,

Marcel Dettling Marcel Dettling Institute for Data Analysis and d Process Design Zurich

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute fr Datenanalyse und Prozessdesign Zrcher Hochschule fr Angewandte

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

MISTER MARCEL STORY Mister Marcel is the fashion line for Men by Little Marcel, inspired by the

Systematic Mapping Studies Marcel Heinz 23. Juli 2014 Marcel Heinz Systematic Mapping Studies

Complexity in granular materials and porous media: More is Different by Marcel Moura and

! Audit Oversight in Malta Presented by MARCEL P. COPPINI marcel.coppini@gov.mt ! 12th FCM

Estatstica e Modelos Probabilsticos - COE241 Aula passada Aula de hoje Goodness of fit:

How to choose the covariance for Gaussian process regression independently of the basis Workshop

AI and Predictive Analytics in Data-Center Environments Supervised Learning Methods Josep Ll.

ENGN2219/COMP6719 Computer Architecture & Simulation Ramesh Sankaranarayana Semester 1, 2020

Volumetric Image Visualization Alexandre Xavier Falc ao LIDS - Institute of Computing -

Nonlinear reconstruction Yu Yu (SJTU) Tianlai Workshop, 17 Sep 2018 OUTLINE Motivation

Comp/Phys/Mtsc 715 Lecture 3: Visualization Stages, Sensory vs. Arbitrary symbols, Data

Kinematics Manipulator Kinematics P P y 1 y 1 x 1 x 1 Many slides adapted from:

Sambuz

Useful Links

Newsletter

Mail Us

Marcel Dettling Institute fr Datenanalyse und Prozessdesign Zrcher - PowerPoint PPT Presentation

Applied Statistical Regression HS 2011 Week 03 Marcel Dettling Institute fr Datenanalyse und Prozessdesign Zrcher Hochschule fr Angewandte Wissenschaften marcel.dettling@zhaw.ch http://stat.ethz.ch/~dettling ETH Zrich, October 10,

Marcel Dettling Marcel Dettling Institute for Data Analysis and d Process Design Zurich

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute fr Datenanalyse und Prozessdesign Zrcher Hochschule fr Angewandte

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

MISTER MARCEL STORY Mister Marcel is the fashion line for Men by Little Marcel, inspired by the

Systematic Mapping Studies Marcel Heinz 23. Juli 2014 Marcel Heinz Systematic Mapping Studies

Complexity in granular materials and porous media: More is Different by Marcel Moura and

! Audit Oversight in Malta Presented by MARCEL P. COPPINI marcel.coppini@gov.mt ! 12th FCM

Estatstica e Modelos Probabilsticos - COE241 Aula passada Aula de hoje Goodness of fit:

How to choose the covariance for Gaussian process regression independently of the basis Workshop

AI and Predictive Analytics in Data-Center Environments Supervised Learning Methods Josep Ll.

ENGN2219/COMP6719 Computer Architecture &amp; Simulation Ramesh Sankaranarayana Semester 1, 2020

Volumetric Image Visualization Alexandre Xavier Falc ao LIDS - Institute of Computing -

Nonlinear reconstruction Yu Yu (SJTU) Tianlai Workshop, 17 Sep 2018 OUTLINE Motivation

Comp/Phys/Mtsc 715 Lecture 3: Visualization Stages, Sensory vs. Arbitrary symbols, Data

Kinematics Manipulator Kinematics P P y 1 y 1 x 1 x 1 Many slides adapted from:

Sambuz

Useful Links

Newsletter

Mail Us

ENGN2219/COMP6719 Computer Architecture & Simulation Ramesh Sankaranarayana Semester 1, 2020