Bus 701: Advanced Statistics Harald Schmidbauer c Harald - PowerPoint PPT Presentation

Bus 701: Advanced Statistics Harald Schmidbauer c � Harald Schmidbauer & Angi R¨ osch, 2007

13.1 Simple Linear Regression: Goals Goals of Simple Linear Regression. Once again, given are points ( x i , y i ) , from a bivariate metric variable ( X, Y ) . How can we establish a functional relationship between X and Y ? Most importantly: • Which straight line is “good”? — What does “good” mean? • How can the parameters of a “good” line be computed? c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 2/35

13.1 Simple Linear Regression: Goals Goals of Simple Linear Regression. Why would we want to fit a line to a cloud of points? • In order to quantify the relationship between X and Y , using a simple model. • In order to forecast Y for a given value of X . c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 3/35

13.2 The Regression Line Finding a “good” line. . . ● ● ● ● ● ● ● ● ● ● ● y ● ● ● ● ● ● ● ● ● x . . . and how can we find a “good” line? — A criterion is needed! c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 4/35

13.2 The Regression Line A very simple scatterplot. • observed points: y 2 ● ( x i , y i ) ^ y 3 ^ • points on the line: y 2 y 3 ● ^ y 1 ( x i , ˆ y i ) y 1 ● x 1 x 2 x 3 c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 5/35

13.2 The Regression Line Definition. Define ˆ y i = a + bx i and e i = y i − ˆ y i . The regression line of Y with respect to X is the line y = a + bx with parameters a and b such that n n n y i ) 2 = � � � ( y i − a − bx i ) 2 e 2 Q ( a, b ) = i = ( y i − ˆ i =1 i =1 i =1 attains its minimum. The parameter b thus obtained is called the regression coefficient. This way to find a and b is called the method of least squares . c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 6/35

13.2 The Regression Line Regression: some first comments. • “Good” means: The sum of squared distances, parallel to the y -axis , is minimized. • This procedure is asymmetric! • It comforms to the idea: Given X , what is Y ? • X : “independent variable”, Y : “dependent variable” c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 7/35

13.2 The Regression Line Regression is asymmetric. The regression lines. . . ● • . . . of Y w.r.t. X and ● y • . . . of X w.r.t. Y ● are usually different. x c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 8/35

13.2 The Regression Line Y w.r.t. X , or rather X w.r.t. Y ? Example: = body-height of a person; X = body-weight of a person Y Here, a regression of Y w.r.t. X looks quite natural, while a regression of X w.r.t. Y would be strange. c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 9/35

13.2 The Regression Line Y w.r.t. X , or rather X w.r.t. Y ? Example: Consider the change in percent of price indices, on the corresponding month of the previous year: = change of housing price index; X = change of clothing price index Y Here, neither of the regressions — Y w.r.t. X nor X w.r.t. Y — looks very meaningful, because it is neither convincing to say that X influences (or even causes) Y , nor vice versa. In this example, a symmetric procedure is more appropriate than regression. c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 10/35

13.2 The Regression Line Computing the regression line. Minimizing Q leads to the following equations for the slope b and the intercept a : = n � x i y i − ( � x i ) ( � y i ) � ( x i − ¯ x )( y i − ¯ y ) = b n � x 2 i − ( � x i ) 2 � ( x i − ¯ x ) 2 cov( X, Y ) = var( X ) , = y − b ¯ ¯ a x. c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 11/35

13.2 The Regression Line Example: (This is a toy example. . . ) x 2 y 2 i x i y i x i y i y i ˆ e i i i 1 5 15 25 225 75 13.9 1.1 2 10 8 100 64 80 11.3 − 3.3 3 15 12 225 144 180 8.7 3.3 4 20 5 400 25 100 6.1 − 1.1 � 50 40 750 458 435 40 0 Then, b = 4 · 435 − 50 · 40 a = 40 4 − ( − 0 . 52) · 50 = − 0 . 52 , 4 = 16 . 5 4 · 750 − 50 2 The regression line is: y = 16 . 5 − 0 . 52 x . Using this regression line, the ˆ y i and the e i can be computed. We observe: ¯ y = ¯ ˆ y , ¯ e = 0 . (This is always the case.) c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 12/35

13.2 The Regression Line A plot of the toy example. 20 15 ● ● 10 y ● 5 ● 0 0 5 10 15 20 25 x c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 13/35

13.3 Explanatory Power of the Model Next, we look at the explanatory power of the regression model. y 2 ● ^ y 3 ^ y 2 y 3 ● ^ y 1 y 1 ● x 1 x 2 x 3 c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 14/35

13.3 Explanatory Power of the Model The explanatory power of the regression model. . . We observe: • There is (in general) less variability in the ˆ y i than in the y i ! — That is, the regression line cannot explain the entire variablity in the observed y i . • The regression could provide a complete explanation if all points ( x i , y i ) were on the regression line. c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 15/35

13.3 Explanatory Power of the Model Decomposition of variance. y ) 2 = � (ˆ y ) 2 + � ( y i − ˆ y i ) 2 � ( y i − ¯ y i − ¯ SST = SSR + SSE Here, SST: total sum of squares SSR: regression sum of squares SSE: error sum of squares c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 16/35

13.3 Explanatory Power of the Model The coefficient of determination. It is defined as: SSR SST • The coefficient of determination is the share of variablity in the data which is explained by the regression. • It holds that SSR SST = r 2 = cor 2 ( X, Y ) . • r 2 = 100% if and only if all observed points are on the regression line. • r 2 = 0% if and only if X and Y are uncorrelated. c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 17/35

13.3 Explanatory Power of the Model Overseas Shipholding Group, Inc. (“OSG”), is a Example: marine transportation company whose stock is listed at New York Stock Exchange (NYSE). Let monthly returns in percent be defined as osg.ret = on OSG stock (black in the figure below); nyse.ret = on the NYSE Composite Index (red) 20 ret on osg / nyse 10 0 −10 −20 2001 2002 2003 2004 2005 c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 18/35

13.3 Explanatory Power of the Model Scatterplot and regression results. ● 20 ● ● ● ● ● ● ● ● ● ● ● ● ● ● • regression line: 10 ● ● ● ● return on osg ● ● ● ● ● ● ● ● osg.ret = 1 . 50 + 1 . 47 · nyse.ret ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● −10 • coef. of determination: ● ● ● ● ● ● ● ● ● r 2 = 29% ● ● −20 ● −10 −5 0 5 10 return on nyse c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 19/35

13.3 Explanatory Power of the Model An interpretation of our results. Why are there fluctuations in OSG stock price? • It is not by pure chance that OSG stock price fluctuates. • It is because the market index NYSE Composite fluctuates! • Is this the only reason? — No, but fluctuations in NYSE Composite explain about 29% of the variability in OSG stock price. • So what might be other reasons? This is not investigated here. . . (a guess: import/export quantities, decisions of the CEO, condition of competitors, . . . ) c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 20/35

13.4 A Stochastic SLR Model SLR in descriptive and inductive statistics. • So far, we have seen SLR from a purely descriptive point of view. (There were no probabilities, no stochastic models.) • Advantage of this approach: simplicity • Disadvantage: We obtain no insight into the mechanism which created the data — for this purpose, we need a stochastic model and the methods of inductive statistics! c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 21/35

13.4 A Stochastic SLR Model A stochastic simple linear regression model. Y i = α + βx i + ǫ i , i = 1 , . . . , n • The random variable Y i represents the observation belonging to x i . • α and β are unknown parameters (to be estimated). • x i is the observation of the independent variable X . • ǫ i is a random variable; is contains everything not accounted for in the equation y = α + βx . c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 22/35

13.4 A Stochastic SLR Model Assumptions about ǫ . We shall assume that the ǫ i in Y i = α + βx i + ǫ i , i = 1 , . . . , n are a sequence of independent and identically distributed random variables: ǫ i ∼ N (0 , σ 2 ǫ ) iid The “normality assumption” is very strong. c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 23/35

Bus 701: Advanced Statistics Harald Schmidbauer c Harald - PowerPoint PPT Presentation

Bus 701: Advanced Statistics Harald Schmidbauer c Harald Schmidbauer & Angi R osch, 2007 13.1 Simple Linear Regression: Goals Goals of Simple Linear Regression. Once again, given are points ( x i , y i ) , from a bivariate metric

701 HARRISON Planning Commission Hearing April 30th, 2020 701 HARRISON PROJECT SITE ASSESSOR'S

The Bus Services Bill and Municipal Bus Companies Summary Why we need bus services What

Bus 701: Advanced Statistics Harald Schmidbauer c Harald Schmidbauer & Angi R osch,

Bus 701: Advanced Statistics Harald Schmidbauer c Harald Schmidbauer & Angi R osch,

Bus 701: Advanced Statistics Harald Schmidbauer c Harald Schmidbauer & Angi R osch,

Consolidated Bus Stops 2015-2016 What is a consolidated bus stop? A consolidated bus stop is a

Consolidated Bus Stops 2020-2021 What is a consolidated bus stop? A consolidated bus stop is

Inter&IntegratedCircuitBus (I 2 CSerialBus) http://www.i2c&bus.org/ 1 I 2 C*OVERVIEW

System Buses Chapter 5 S. Dandamudi Outline Introduction Bus arbitration Dynamic

Input Front side bus A Front side bus B controller Bus #0 HI North bridge Output Bus #0

lecture 22 Input / Output (I/O) 4 - asynchronous bus, handshaking - serial bus Mon.

System Buses Chapter 5 S. Dandamudi Outline Introduction Bus arbitration Dynamic

Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Carnegie Mellon

Hafner valves with Namur interface With the standard MNH 310 701 and MNH 510 701 Hafner offers

Machine Learning Machine Learning 10 10- -701/15 701/15- -781, Fall 2006 781, Fall 2006

Leveraging Supply Chain Finance to Optimize Value Brad Peterson +1 312 701 8568

Stock valuation Chapter 10 1 Principles Applied in This Chapter Principle 1: Money Has a

Conditional Heteroscedasticity (CH) So far, our models are for the conditional mean . For

Towards Transport-Agnostic Middleware Martin Sstrik sustrik@250bpm.com www.250bpm.com

A New Tool For Correlation Risk Management: The Market Implied Comonotonicity Gap Peter Michael

CS344M Autonomous Multiagent Systems Patrick MacAlpine Department or Computer Science The

5 Pain Points of Agile Development Ralf Gronkowski Principal Product Consultant

Statistical Aspects of Quantum Computing Yazhen Wang Department of Statistics University of

10/4/2019 Received speakers honorarium and/or consultation fees from: Mundipharma, Merck

Bus 701: Advanced Statistics Harald Schmidbauer c Harald - PowerPoint PPT Presentation

Bus 701: Advanced Statistics Harald Schmidbauer c Harald Schmidbauer & Angi R osch, 2007 13.1 Simple Linear Regression: Goals Goals of Simple Linear Regression. Once again, given are points ( x i , y i ) , from a bivariate metric

701 HARRISON Planning Commission Hearing April 30th, 2020 701 HARRISON PROJECT SITE ASSESSOR'S

The Bus Services Bill and Municipal Bus Companies Summary Why we need bus services What

Bus 701: Advanced Statistics Harald Schmidbauer c Harald Schmidbauer &amp; Angi R osch,

Bus 701: Advanced Statistics Harald Schmidbauer c Harald Schmidbauer &amp; Angi R osch,

Bus 701: Advanced Statistics Harald Schmidbauer c Harald Schmidbauer &amp; Angi R osch,

Consolidated Bus Stops 2015-2016 What is a consolidated bus stop? A consolidated bus stop is a

Consolidated Bus Stops 2020-2021 What is a consolidated bus stop? A consolidated bus stop is

Inter&amp;Integrated*Circuit*Bus (I 2 C*Serial*Bus) http://www.i2c&amp;bus.org/ 1 I 2 C*OVERVIEW

System Buses Chapter 5 S. Dandamudi Outline Introduction Bus arbitration Dynamic

Input Front side bus A Front side bus B controller Bus #0 HI North bridge Output Bus #0

lecture 22 Input / Output (I/O) 4 - asynchronous bus, handshaking - serial bus Mon.

System Buses Chapter 5 S. Dandamudi Outline Introduction Bus arbitration Dynamic

Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Carnegie Mellon

Hafner valves with Namur interface With the standard MNH 310 701 and MNH 510 701 Hafner offers

Machine Learning Machine Learning 10 10- -701/15 701/15- -781, Fall 2006 781, Fall 2006

Leveraging Supply Chain Finance to Optimize Value Brad Peterson +1 312 701 8568

Stock valuation Chapter 10 1 Principles Applied in This Chapter Principle 1: Money Has a

Conditional Heteroscedasticity (CH) So far, our models are for the conditional mean . For

Towards Transport-Agnostic Middleware Martin Sstrik sustrik@250bpm.com www.250bpm.com

A New Tool For Correlation Risk Management: The Market Implied Comonotonicity Gap Peter Michael

CS344M Autonomous Multiagent Systems Patrick MacAlpine Department or Computer Science The

5 Pain Points of Agile Development Ralf Gronkowski Principal Product Consultant

Statistical Aspects of Quantum Computing Yazhen Wang Department of Statistics University of

10/4/2019 Received speakers honorarium and/or consultation fees from: Mundipharma, Merck

Bus 701: Advanced Statistics Harald Schmidbauer c Harald Schmidbauer & Angi R osch,

Bus 701: Advanced Statistics Harald Schmidbauer c Harald Schmidbauer & Angi R osch,

Bus 701: Advanced Statistics Harald Schmidbauer c Harald Schmidbauer & Angi R osch,

Inter&IntegratedCircuitBus (I 2 CSerialBus) http://www.i2c&bus.org/ 1 I 2 C*OVERVIEW