Statistics, inference and ordinary least squares Frank Venmans

Statistics

Conditional probability • Consider 2 events: • A: die shows 1,3 or 5 => P(A)=3/6 • B: die shows 3 or 6 =>P(B)=2/6 2 1 5 3 6 4 • A ∩B : A and B occur: die shows 3 =>P(A&B)=1/6 • AUB : A or B occur: die shows 1,3, 5 or 6 =>P(AorB)=4/6 • Addition rule: P(AorB)=P(A)+P(B)-P(A&B) (~ venn diagram) 𝑄 𝐵 & 𝐶 • 𝑄 𝐵 𝐶 = (~ venn diagram) 𝑄 𝐶 • P(A|B): prob of event A given that B occurs=1/2 • P(B|A): prob of event B given that A occurs=1/3 Income>30,000 • Bayes’ Law: 𝑄 𝐵&𝐶 = 𝑄(𝐵|𝐶) P(B)=P(B|A)P(A) Education>12 • Event can be any set of outcomes. Example • A: Random draw from belgian population with income >30,000 • B: Random draw from Belgian population with education >12 years • P(A|B) ≠ P(A)

Independence • 2 events A and B: 𝑄 𝐵 𝐶 = 𝑄 𝐵 ⇔ 𝑄 𝐶 𝐵 = 𝑄 𝐶 ⇔ 𝐵 𝑏𝑜𝑒 𝐶 𝑏𝑠𝑓 𝑗𝑜𝑒𝑓𝑞𝑓𝑜𝑒𝑓𝑜𝑢 • Two variables X and Y 𝑔 𝑦|𝑧 = 𝑔 𝑦 ⇔ 𝑔 𝑧 𝑦 = 𝑔 𝑧 ⇔ x and y are independent • X and Y are independent if the conditional distribution of X given Y is the same as the unconditional distribution of X. • Independent variables do not necessarily have a zero correlation. • Example: height of my sun and Indian GDP are correlated (both affected by time) • Dependent variables may have a zero correlation in exceptional cases. • Example: selection bias may compensate a causal effect (see further)

Cumulative Distribution Function CDF Probability Density Function PDF • Notation: • Random variables X,Y: ex. Yearly earnings and level of eduction • Discrete if earnings are multiples of 100 € and eduction in years • ~Continuous if earnings are expressed un eurocent and education in seconds • Specific values of random variables: • a,b or x,y • Cumulative Distrubtion Function: • probability that X is smaller than or equal to a • 𝐺 𝑏 = 𝑄 𝑌 ≤ 𝑏 • Probability Density Function • For discrete variables: f(a)=P(X=a) • For continuous variables • 𝑔 𝑏 = 𝑒𝐺 𝑏 𝑏 ⇔ 𝐺 𝑏 = 𝑔 𝑌 𝑒𝑌 −∞ 𝑒𝑏 • Area under the pdf =1 because 𝐺 ∞ = 1

Joint Cumulative Distribution Function • Assume Y Yearly earnings and X level of education • 𝐺 𝑦, 𝑧 = 𝑄 𝑌 < 𝑦 &𝑍 < 𝑧

Density function • Joint Density Function • For discrete variables: 𝑔 𝑦, 𝑧 = 𝑄 𝑌 = 𝑦&𝑍 = 𝑧 • Continuous variables: 𝑔 𝑦, 𝑧 = 𝜖 2 𝐺 𝑦,𝑧 𝜖𝑦𝜖𝑧 • Marginal Denstity Function • Discrete variables 𝑔 𝑦 = 𝑄 𝑌 = 𝑦 disregarding y 𝑧=∞ • Continuous variables 𝑔 𝑦 = 𝑔 𝑦, 𝑧 𝑒𝑧 𝑧=−∞ • (red and blue line) • Conditional Density Function • Discrete variables 𝑔 𝑦|𝑧 = 𝑄 𝑌 = 𝑦 |𝑍 = 𝑧 • 𝑔 𝑦|𝑧 = 𝑔 𝑦,𝑧 𝑔 𝑧 • (intersections through the joint density function)

Regression as a conditional density function

Expected value • Unconditional expected value • For a discrete random variable : 𝐹 𝑌 = ∑𝑦 𝑗 𝑄 𝑦 𝑗 = 𝜈 ∞ • For a continuous random variable : 𝐹 𝑌 = 𝑦𝑔 𝑦 𝑒𝑦 = 𝜈 −∞ • Conditional expected value (in finance many expectations are conditional on the information set at time t) • 𝐹 𝑌 𝑍 = 𝐹 𝑍 [𝑌] = ∑𝑦 𝑗 𝑄 𝑦 𝑗 |𝑍 ∞ • 𝐹 𝑌|𝑍 = 𝑦𝑔 𝑦|𝑧 𝑒𝑦 −∞ • Variance= 𝜏 2 = 𝐹[ 𝑌 − 𝜈 2 ] • Covariance between X and Y= 𝜏 𝑌,𝑍 = 𝐹 𝑌 − 𝜈 𝑌 Y − 𝜈 𝑍 3 𝑌−𝜈 • Skewness= 𝐹 𝜏 4 𝑌−𝜈 • Kurtosis= 𝐹 𝜏

Normal distribution 2 1 1 𝑦−𝜈 • 𝑔 𝑦 = 𝜏 2𝜌 exp − 2 𝜏 • Notation 𝑌~𝑂(𝜈, 𝜏 2 ) • Skewness=0 • Kurtosis=3 • Jacques-Berra test for normality: tests if skewness and kurtosis are close to 0 and 3. • Any linear combination of normally distributed variables (correlated or not) is normally distributed • Central limit theorem: the probability distribution of a variable that is the sum of an infinite number of independent random variables with any distribution will be normally distributed.

Chi square distribution 𝑜 2 𝑥𝑗𝑢ℎ 𝑌 𝑗 ~𝑂 0,1 𝑏𝑜𝑒 𝑏𝑚𝑚 𝑌 𝑗 𝑗𝑜𝑒𝑓𝑞𝑓𝑜𝑒𝑓𝑜𝑢 • 𝑍 = ∑ 𝑌 𝑗 follows a 𝑗=1 𝜓 2 distribution with n degrees of freedom. 2 • 𝑍~𝜓 𝑜

Student t distribution 𝑌 2 𝑏𝑜𝑒 𝑌 𝑗𝑜𝑒𝑓𝑞𝑓𝑜𝑒𝑓𝑜𝑢 𝑔𝑠𝑝𝑛 𝑍 • 𝑎 = 𝑥𝑗𝑢ℎ 𝑌~𝑂 0,1 𝑏𝑜𝑒 𝑍~𝜓 𝑜 𝑍 𝑜 follows a student or t-distribution with n degrees of freedom • 𝑎~𝑢 𝑜 • Higher variance and kurtosis than the standardized normal distribution • Converges to the normal distribution for large n: 𝑢 ∞ = 𝑂 0,1

F distribution • Z= X/n 2 𝑏𝑜𝑒 𝑌 𝑗𝑜𝑒𝑓𝑞𝑓𝑜𝑒𝑓𝑜𝑢 𝑔𝑠𝑝𝑛 𝑍 follows 2 𝑏𝑜𝑒 𝑍~𝜓 𝑛 Y/m with X~ χ 𝑜 an F distribution with n and m degrees of freedom. • 𝑎~𝐺 𝑜,𝑛

Inference

Statistical inference • Try to say something about the real distribution of a random variable based on a sample. • The real distribution corresponds to an infinitely repeated event (ex dice), the entire population, entire set of possible ‘states of the world’ in a future period etc.

3 types of inference • Point estimator: • Ex: sample mean, sample variance, marginal effect in a linear regression (beta), correlation… => 𝜄 will follow a prob • Concept of repeated sampling: every sample gives another estimator 𝜄 distribution = 𝜄 • Unbiased: Expected value of estimator corresponds to the real parameter 𝐹 𝜄 • Consistent: The estimator can get arbitrarily close to the real parameter by increasing the sample size = 𝜄 plim 𝜄 𝑜→∞ 1 2 • Ex: sample variance estimator 𝑡 ² = 𝑜 ∑ 𝑧 𝑗 − 𝑧 is a biased but consistent estimator of the variance 𝑗 ) is small • Efficient estimator: 𝑤𝑏𝑠(𝜄 • Interval estimation: • Ex: given the observed sample, the real mean lays between 1 and 3 with 95% probability • Hypothesis testing: • Ex: if the null hypothesis is true (𝜈 = 2) , what is the probability of a random sample to have a more extreme (less likely) outcome than the observed sample mean of 4 and sample variance of 2.

Example: Sample mean • Income of Belgian households: a random variable following a distribution with mean 𝜈 and variance 𝜏² (distribution is skewed, not normal) • You have a sample of n individuals. You want to say something about 𝜈 and 𝜏² 𝑧 1 +𝑧 2 +𝑧 3 …𝑧 𝑜 • Estimator of 𝜈 : sample mean y = 𝑜 • Estimator will be different each time you draw a different sample=>sample mean will follow a distribution, which is different from the distribution of y. • Central limit theorem =>the sample mean converges to a normal distribution even if y does not follow a normal distribution.

Sample mean: variance known 𝑏𝑡𝑡𝑧𝑛𝑞𝑢𝑝𝑢𝑗𝑑 ~N 𝜈, 𝑏𝑡𝑡𝑧𝑛𝑞𝑢𝑝𝑢𝑗𝑑 ~N(0,1) 𝜏 2 𝑧 −𝜈 • 𝑧 ⇒ 𝑜 𝜏 𝑜 • This allows to determine a 95%confidence interval 𝑧 −𝜈 𝜏 𝜏 𝑄 −1,96 < 𝜏/ 𝑜 < 1,96 = 0,95 ⇔ 𝑄 𝑧 − 1,96 𝑜 < 𝜈 < y + 1,96 𝑜 =0,95 • When interval includes zero we say that the sample mean is not significantly different from zero at the 5% confidence level.

Sample mean: variance unknown and y normally distributed • Both mean and variance will need to be estimated. • Estimator for variance: 𝑡 2 = 1 2 𝑜−1 ∑ 𝑧 𝑗 − 𝑧 𝑜 2 • If Y follows a normal distribution ⇔ (𝑜−1)𝑡 2 𝑧 𝑗 −𝑧 2 = ∑ ~𝜓 𝑜−1 (no proof but intuitive) 𝑜 𝜏 2 𝜏 −𝜈 −𝜈 𝑧 𝑧 • 𝑧 −𝜈 ~ 𝑂 0,1 𝜏/ 𝑜 𝜏/ 𝑜 𝑡 / 𝑜 = = = 𝑢 𝑜−1 2 2 𝑡 𝑜−1 𝑡 𝜓𝑜−1 (𝑜−1)𝜏2 𝜏 𝑜−1 • This allows to determine a 95% confidence interval (ex. n=21) −𝜈 • 𝑄 −2,086 < 𝑧 − 2,086 𝑡 + 2,086 𝑡 < 2,086 = 0,95 ⇔ 𝑄 𝑧 𝑜 < 𝜈 < y 𝑜 =0,95 𝑡 𝑜 • For large n, the t distribution converges to the normal distribution

Hypothesis testing • Null hypothesis 𝐼 0 : 𝜄 = 𝜄 0 ex: 𝐼 0 : 𝜄 = 0 • One sided test 𝐼 𝐵 : 𝜄 > 𝜄 0 (𝑝𝑠 𝜄 < 𝜄 0 ) ex: 𝐼 𝐵 : 𝜄 > 0 • Two sided test 𝐼 𝐵 : 𝜄 ≠ 𝜄 0 ex: 𝐼 𝐵 : 𝜄 ≠ 0 • 2 regions: • If observed data (test statistic) falls in rejection region =>reject H 0 • If observed data (test statistic) falls in acceptence region =>accept H 0 • Imagine you have 10 months of data and you observe a mean monthly return of the stock of Apple of 0,8% and you want to test if this mean is different from a zero return. • Assume the standard error of the return is observed to be 1,58%, so the standard error of the mean is 1,58% 10 = 0,5%

Statistics, inference and ordinary least squares Frank Venmans - PowerPoint PPT Presentation

Statistics, inference and ordinary least squares Frank Venmans Statistics Conditional probability Consider 2 events: A: die shows 1,3 or 5 => P(A)=3/6 B: die shows 3 or 6 =>P(B)=2/6 2 1 5 3 6 4 A

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

ACMS 20340 Statistics for Life Sciences Chapter 15: Inference in Practice Inference in Practice

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Type Inference 75 Definition Type Inference Type inference = Java compiler's ability

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Exact Inference Inference Basic task for inference: Compute

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

Unit 5: Inference for categorical variables Lecture 2: Inference for 2-sample proportions

Unit 5: Inference for categorical variables Lecture 1: Inference for proportions Statistics 101

Unit 5: Inference for categorical variables Lecture 1: Inference for proportions Statistics 101

Stat 5102 Lecture Slides: Deck 4 Bayesian Inference Charles J. Geyer School of Statistics

Advanced Mathematical Methods Part II Statistics Statistical Inference Mel Slater

MAXIMIZING UTILIZATION FOR DATA CENTER INFERENCE WITH TENSORRT INFERENCE SERVER David Goodwin,

Quartet Inference from SNP Data Under the Coalescent Model Syed Shalan Naqvi Quartet Inference

Comparison of Bayesian and Frequentist Inference 18.05 Spring 2014 First discuss last class 19

A new method for the detemination of the charge of the Top: Measuring the top charge with soft

Comparison of Bayesian and Frequentist Inference 18.05 Spring 2014 First discuss last class 19

Lecture 5: Hypothesis testing with the classical linear model Assumption MLR6: Normality 2

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20

Basics of Geographic Analysis in R Spatial Regression Yuri M. Zhukov GOV 2525: Political

Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. Srihari University at