Week 2: Inference for SLR Inference: sampling distributions, testing - PowerPoint PPT Presentation

BUS41100 Applied Regression Analysis Week 2: Inference for SLR Inference: sampling distributions, testing confidence intervals, and prediction intervals Max H. Farrell The University of Chicago Booth School of Business

Back to House Prices Understand the relationship between price and size . How? Last week we fit a line through a bunch of points: price = 39 + 35 × size . ● 160 ● 140 ● price 120 ● ● ● 100 ● ● ● ● ● 80 ● ● ● 60 ● 1.0 1.5 2.0 2.5 3.0 3.5 size 1

CAPM Another example of conditional distributions: Individual returns given market return. The Capital Asset Pricing Model (CAPM) for asset A relates return R At = V At − V At − 1 to the “market” return, R Mt . V At − 1 In particular, the relationship is given by the regression model R At = α + βR Mt + ε with observations at times t = 1 . . . T (and where [ α, β ] ≡ [ β 0 , β 1 ] ). When asset A is a mutual fund, this CAPM regression can be used as a performance benchmark for fund managers. 2

> mfund <- read.csv("mfunds.csv") > mu <- apply(mfund, 2, mean) > mu drefus fidel keystne Putnminc scudinc 0.006767000 0.004696739 0.006542550 0.005517072 0.004432333 windsor valmrkt tbill 0.010021906 0.006812983 0.005978333 > stdev <- apply(mfund, 2, sd) > stdev drefus fidel keystne Putnminc scudinc 0.047237111 0.056587091 0.084236450 0.030079074 0.035969261 windsor valmrkt tbill 0.048639473 0.048000146 0.002522863 3

> plot(mu, stdev, col=0) > text(x=mu, y=stdev, labels=names(mfund), col=4) keystne 0.08 0.06 fidel windsor valmrkt stdev drefus 0.04 scudinc Putnminc 0.02 0.00 tbill 0.005 0.006 0.007 0.008 0.009 0.010 mu 4

Lets look at just windsor (which dominates the market). > windsor.reg <- lm(mfund$windsor ~ mfund$valmrkt) > plot(mfund$valmrkt, mfund$windsor, pch=20) > abline(windsor.reg, col="green") ● 0.15 ● ● ● ● ● ● ● ● ● ● mfund$windsor ● ● ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● b_0 = 0.0036 ● −0.15 b_1 = 0.9357 ● −0.10 −0.05 0.00 0.05 0.10 0.15 mfund$valmrkt 5

Modeling goals Prediction Model ˆ Y = b 0 + b 1 X Y = β 0 + β 1 X + ε Y = b 0 + b 1 X + e Why are we running regressions anyway? 1. Properties of β k ◮ Sign: Does Y go up when X goes up? ◮ Magnitude: By how much? 2. Predicting Y ◮ Best guess for Y given X . Key question today: how uncertain are our answers? ◮ First we must formalize our model. 6

Simple linear regression (SLR) model ε ∼ N (0 , σ 2 ) Y = β 0 + β 1 X + ε, What’s important? ◮ It is a model, so we are assuming this relationship holds for some fixed but unknown values of β 0 , β 1 . ◮ It is linear. ◮ The error ε is independent & mean zero 1. E [ ε ] = 0 ⇔ E [ Y | X ] = β 0 + β 1 X 2. Fixed but unknown variance σ 2 ; constant over X 3. Most things are approx. Normal (Central Limit Theorem) 4. ε represents anything left, not captured in linear fcn of X ◮ It just works! This is a very robust model for the world. 7

Before looking at any data, the model specifies ◮ how Y varies with X on average: E [ Y | X ] = β 0 + β 1 X ; i.e. what’s the trend? ◮ and the influence of factors other than X , ε ∼ N (0 , σ 2 ) independently of X . Y ε E [ Y | X ] = β 0 + β 1 X X 8

The variance σ 2 controls the dispersion of Y around β 0 + β 1 X ◮ think signal-to-noise small dispersion large dispersion 200 200 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Y ● ● ● ● ●● ● Y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 100 ● ● ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 50 ● ● ● 0 0 1.0 1.5 2.0 2.5 3.0 3.5 1.0 1.5 2.0 2.5 3.0 3.5 X X 9

IMPORTANT! β 0 is not b 0 , β 1 is not b 1 , and ε i is not e i e i Y ε i ˆ Y = b 0 + b 1 X E [ Y | X ] = β 0 + β 1 X X (We use Greek letters remind to us.) 10

Context from the house data example E [ Y | X ] is the average price of houses with size X , and σ 2 is the spread around that average. When we specify the SLR model we say that ◮ the average house price is linear in its size , but we don’t know the coefficients. ◮ Some houses could have a higher than expected value, some lower, but the amount by which they differ from average is unknown and ◮ is independent of the size , ◮ and is Normal. Question: At an open house: is this house priced fairly? 11

Context from the CAPM example E [ Y | X ] is the average return of the asset when the market return is X , and σ 2 is the spread around that average. When we specify the SLR model we say that ◮ the average asset return is linear in the market return , but we don’t know the coefficients. ◮ Some days could have a higher than expected value, some lower, but the amount by which they differ from average is unknown and ◮ is independent of the market return , ◮ and is Normal. Question: Does this asset follow the market? (Is β = 1 ?) 12

Detour / example: Oracle v. SAP Uncertainty Matters! 13

> sap <- read.csv("sap.csv") > m.sap <- mean(sap$ROE) > m.I <- mean(sap$IndustryROE) > m.sap / m.I [1] 0.8049701 That’s the mean, what about the spread? > summary(sap[,4:5]) ROE IndustryROE Min. :-91.80 Min. : 2.6 1st Qu.: 6.20 1st Qu.:10.2 Median : 13.40 Median :14.0 Mean : 12.64 Mean :15.7 3rd Qu.: 22.80 3rd Qu.:19.5 Max. :116.40 Max. :48.8 14

What’s going on here? ◮ SAP ROE is more variable than average Industry ROE. ֒ → Makes sense, averages are less variable than atoms ◮ What about large values (positive and negative)? ● 100 40 ● ● 50 ● ● ● 30 Frequency ROE SAP Industry average 0 20 ● ● 10 −50 ● ● 0 −100 ● −100 −50 0 50 100 SAP Industry ROE 15

Uncertainty matters! Do we even think that SAP use is correlated with lower ROE? ◮ Probably not, given the above results But even beyond statistical uncertainty: ◮ Does SAP use cause ROE to fall? ◮ Were the SAP ROEs selected at random in the industry? Statistical uncertainty is the only kind we can quantify. In any analysis there is a lot we aren’t sure about: ◮ Do we have the right data? ◮ Do we have the “right” (useful?) model? ◮ What assumptions are we making? 16

Sampling distribution of LS estimates We think of the data as being one possible realization of data that could have been generated from the model Y | X ∼ N ( β 0 + β 1 X, σ 2 ) . ◮ How much do our estimates depend on the particular random sample that we happen to observe? ◮ Different data ⇒ different b 0 and b 1 ◮ Always the same β 0 and β 1 . If the estimates don’t vary much from sample to sample, then it doesn’t matter which sample you happen to observe. If the estimates do vary a lot, then it matters which sample you happen to observe. 17

Week 2: Inference for SLR Inference: sampling distributions, testing - PowerPoint PPT Presentation

BUS41100 Applied Regression Analysis Week 2: Inference for SLR Inference: sampling distributions, testing confidence intervals, and prediction intervals Max H. Farrell The University of Chicago Booth School of Business Back to House Prices

Week 2: Inference for SLR Inference: sampling distributions, testing confidence intervals, and

Plan for Today SLR (Simple LR) Problem with LR(0) SLR modification, follow sets strike

Week 3: Finish SLR Inference Then Multiple Linear Regression I. Confidence and Prediction

MATH2130-F17 Week 13 Week 14 Week 15, Inner Farid Aliniaeifard Product Space CU BOULDER

Market Impact of TLAC Requirements FIG DCM Bank Capital Solutions December 17, 2015 RWA vs. SLR

LR(0) Drawbacks Simple LR (SLR) Consider the unambiguous augmented grammar: New algorithm for

Time Matters Week 7 Week 6 Prototyping + Needfinding Week 7 Week 8 Implementation Week 9

Math 610 Section 700 - Recitation week 3 week 4 week 6 week 8 TA: Peng Wei Office: Blocker

Galatians: week 3 Galatians 3:1-29 Week 1: Galatians 1:1-2:14 Week 2: Galatians 2:15-21 Week 3:

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Vermont M nt Marble: A e: Americas s nt Stone Monument Sto Class S s Schedule e Week

Week 1: Christ: The Source of True Happiness Week 2: Happiness, the Gospel and Living Well Week

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Type Inference 75 Definition Type Inference Type inference = Java compiler's ability

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Reflections on Statistical Data Analysis in Neutrino Experiments since NOMAD and F-C Bob Cousins

(AGSDest) An R-package for estimation in classical and adaptive group sequential trials Niklas

ADAPT Floating-Point Precision Tuning Ignacio Laguna, Harshitha Menon, Tristan Vanderbruggen

Hessenberg Reduction with Transient Error Resilience on GPU-Based Hybrid Architectures Yulu

Class Imbalance Multiclass Problems General Idea Original D Training data .... Step 1:

Microarrays False Discovery Rate Prof. Tesler Math 186 Winter 2019 Prof. Tesler

s trts r

EC3062 ECONOMETRICS ELEMENTARY REGRESSION ANALYSIS We shall consider three methods for estimating