statistics inference
play

Statistics, inference and ordinary least squares Frank Venmans - PowerPoint PPT Presentation

Statistics, inference and ordinary least squares Frank Venmans Statistics Conditional probability Consider 2 events: A: die shows 1,3 or 5 => P(A)=3/6 B: die shows 3 or 6 =>P(B)=2/6 2 1 5 3 6 4 A


  1. Statistics, inference and ordinary least squares Frank Venmans

  2. Statistics

  3. Conditional probability • Consider 2 events: • A: die shows 1,3 or 5 => P(A)=3/6 • B: die shows 3 or 6 =>P(B)=2/6 2 1 5 3 6 4 • A ∩B : A and B occur: die shows 3 =>P(A&B)=1/6 • AUB : A or B occur: die shows 1,3, 5 or 6 =>P(AorB)=4/6 • Addition rule: P(AorB)=P(A)+P(B)-P(A&B) (~ venn diagram) 𝑄 𝐵 & 𝐶 • 𝑄 𝐵 𝐶 = (~ venn diagram) 𝑄 𝐶 • P(A|B): prob of event A given that B occurs=1/2 • P(B|A): prob of event B given that A occurs=1/3 Income>30,000 • Bayes’ Law: 𝑄 𝐵&𝐶 = 𝑄(𝐵|𝐶) P(B)=P(B|A)P(A) Education>12 • Event can be any set of outcomes. Example • A: Random draw from belgian population with income >30,000 • B: Random draw from Belgian population with education >12 years • P(A|B) ≠ P(A)

  4. Independence • 2 events A and B: 𝑄 𝐵 𝐶 = 𝑄 𝐵 ⇔ 𝑄 𝐶 𝐵 = 𝑄 𝐶 ⇔ 𝐵 𝑏𝑜𝑒 𝐶 𝑏𝑠𝑓 𝑗𝑜𝑒𝑓𝑞𝑓𝑜𝑒𝑓𝑜𝑢 • Two variables X and Y 𝑔 𝑦|𝑧 = 𝑔 𝑦 ⇔ 𝑔 𝑧 𝑦 = 𝑔 𝑧 ⇔ x and y are independent • X and Y are independent if the conditional distribution of X given Y is the same as the unconditional distribution of X. • Independent variables do not necessarily have a zero correlation. • Example: height of my sun and Indian GDP are correlated (both affected by time) • Dependent variables may have a zero correlation in exceptional cases. • Example: selection bias may compensate a causal effect (see further)

  5. Cumulative Distribution Function CDF Probability Density Function PDF • Notation: • Random variables X,Y: ex. Yearly earnings and level of eduction • Discrete if earnings are multiples of 100 € and eduction in years • ~Continuous if earnings are expressed un eurocent and education in seconds • Specific values of random variables: • a,b or x,y • Cumulative Distrubtion Function: • probability that X is smaller than or equal to a • 𝐺 𝑏 = 𝑄 𝑌 ≤ 𝑏 • Probability Density Function • For discrete variables: f(a)=P(X=a) • For continuous variables • 𝑔 𝑏 = 𝑒𝐺 𝑏 𝑏 ⇔ 𝐺 𝑏 = 𝑔 𝑌 𝑒𝑌 −∞ 𝑒𝑏 • Area under the pdf =1 because 𝐺 ∞ = 1

  6. Joint Cumulative Distribution Function • Assume Y Yearly earnings and X level of education • 𝐺 𝑦, 𝑧 = 𝑄 𝑌 < 𝑦 &𝑍 < 𝑧

  7. Density function • Joint Density Function • For discrete variables: 𝑔 𝑦, 𝑧 = 𝑄 𝑌 = 𝑦&𝑍 = 𝑧 • Continuous variables: 𝑔 𝑦, 𝑧 = 𝜖 2 𝐺 𝑦,𝑧 𝜖𝑦𝜖𝑧 • Marginal Denstity Function • Discrete variables 𝑔 𝑦 = 𝑄 𝑌 = 𝑦 disregarding y 𝑧=∞ • Continuous variables 𝑔 𝑦 = 𝑔 𝑦, 𝑧 𝑒𝑧 𝑧=−∞ • (red and blue line) • Conditional Density Function • Discrete variables 𝑔 𝑦|𝑧 = 𝑄 𝑌 = 𝑦 |𝑍 = 𝑧 • 𝑔 𝑦|𝑧 = 𝑔 𝑦,𝑧 𝑔 𝑧 • (intersections through the joint density function)

  8. Regression as a conditional density function

  9. Expected value • Unconditional expected value • For a discrete random variable : 𝐹 𝑌 = ∑𝑦 𝑗 𝑄 𝑦 𝑗 = 𝜈 ∞ • For a continuous random variable : 𝐹 𝑌 = 𝑦𝑔 𝑦 𝑒𝑦 = 𝜈 −∞ • Conditional expected value (in finance many expectations are conditional on the information set at time t) • 𝐹 𝑌 𝑍 = 𝐹 𝑍 [𝑌] = ∑𝑦 𝑗 𝑄 𝑦 𝑗 |𝑍 ∞ • 𝐹 𝑌|𝑍 = 𝑦𝑔 𝑦|𝑧 𝑒𝑦 −∞ • Variance= 𝜏 2 = 𝐹[ 𝑌 − 𝜈 2 ] • Covariance between X and Y= 𝜏 𝑌,𝑍 = 𝐹 𝑌 − 𝜈 𝑌 Y − 𝜈 𝑍 3 𝑌−𝜈 • Skewness= 𝐹 𝜏 4 𝑌−𝜈 • Kurtosis= 𝐹 𝜏

  10. Normal distribution 2 1 1 𝑦−𝜈 • 𝑔 𝑦 = 𝜏 2𝜌 exp − 2 𝜏 • Notation 𝑌~𝑂(𝜈, 𝜏 2 ) • Skewness=0 • Kurtosis=3 • Jacques-Berra test for normality: tests if skewness and kurtosis are close to 0 and 3. • Any linear combination of normally distributed variables (correlated or not) is normally distributed • Central limit theorem: the probability distribution of a variable that is the sum of an infinite number of independent random variables with any distribution will be normally distributed.

  11. Chi square distribution 𝑜 2 𝑥𝑗𝑢ℎ 𝑌 𝑗 ~𝑂 0,1 𝑏𝑜𝑒 𝑏𝑚𝑚 𝑌 𝑗 𝑗𝑜𝑒𝑓𝑞𝑓𝑜𝑒𝑓𝑜𝑢 • 𝑍 = ∑ 𝑌 𝑗 follows a 𝑗=1 𝜓 2 distribution with n degrees of freedom. 2 • 𝑍~𝜓 𝑜

  12. Student t distribution 𝑌 2 𝑏𝑜𝑒 𝑌 𝑗𝑜𝑒𝑓𝑞𝑓𝑜𝑒𝑓𝑜𝑢 𝑔𝑠𝑝𝑛 𝑍 • 𝑎 = 𝑥𝑗𝑢ℎ 𝑌~𝑂 0,1 𝑏𝑜𝑒 𝑍~𝜓 𝑜 𝑍 𝑜 follows a student or t-distribution with n degrees of freedom • 𝑎~𝑢 𝑜 • Higher variance and kurtosis than the standardized normal distribution • Converges to the normal distribution for large n: 𝑢 ∞ = 𝑂 0,1

  13. F distribution • Z= X/n 2 𝑏𝑜𝑒 𝑌 𝑗𝑜𝑒𝑓𝑞𝑓𝑜𝑒𝑓𝑜𝑢 𝑔𝑠𝑝𝑛 𝑍 follows 2 𝑏𝑜𝑒 𝑍~𝜓 𝑛 Y/m with X~ χ 𝑜 an F distribution with n and m degrees of freedom. • 𝑎~𝐺 𝑜,𝑛

  14. Inference

  15. Statistical inference • Try to say something about the real distribution of a random variable based on a sample. • The real distribution corresponds to an infinitely repeated event (ex dice), the entire population, entire set of possible ‘states of the world’ in a future period etc.

  16. 3 types of inference • Point estimator: • Ex: sample mean, sample variance, marginal effect in a linear regression (beta), correlation… => 𝜄 will follow a prob • Concept of repeated sampling: every sample gives another estimator 𝜄 distribution = 𝜄 • Unbiased: Expected value of estimator corresponds to the real parameter 𝐹 𝜄 • Consistent: The estimator can get arbitrarily close to the real parameter by increasing the sample size = 𝜄 plim 𝜄 𝑜→∞ 1 2 • Ex: sample variance estimator 𝑡 ² = 𝑜 ∑ 𝑧 𝑗 − 𝑧 is a biased but consistent estimator of the variance 𝑗 ) is small • Efficient estimator: 𝑤𝑏𝑠(𝜄 • Interval estimation: • Ex: given the observed sample, the real mean lays between 1 and 3 with 95% probability • Hypothesis testing: • Ex: if the null hypothesis is true (𝜈 = 2) , what is the probability of a random sample to have a more extreme (less likely) outcome than the observed sample mean of 4 and sample variance of 2.

  17. Example: Sample mean • Income of Belgian households: a random variable following a distribution with mean 𝜈 and variance 𝜏² (distribution is skewed, not normal) • You have a sample of n individuals. You want to say something about 𝜈 and 𝜏² 𝑧 1 +𝑧 2 +𝑧 3 …𝑧 𝑜 • Estimator of 𝜈 : sample mean y = 𝑜 • Estimator will be different each time you draw a different sample=>sample mean will follow a distribution, which is different from the distribution of y. • Central limit theorem =>the sample mean converges to a normal distribution even if y does not follow a normal distribution.

  18. Sample mean: variance known 𝑏𝑡𝑡𝑧𝑛𝑞𝑢𝑝𝑢𝑗𝑑 ~N 𝜈, 𝑏𝑡𝑡𝑧𝑛𝑞𝑢𝑝𝑢𝑗𝑑 ~N(0,1) 𝜏 2 𝑧 −𝜈 • 𝑧 ⇒ 𝑜 𝜏 𝑜 • This allows to determine a 95%confidence interval 𝑧 −𝜈 𝜏 𝜏 𝑄 −1,96 < 𝜏/ 𝑜 < 1,96 = 0,95 ⇔ 𝑄 𝑧 − 1,96 𝑜 < 𝜈 < y + 1,96 𝑜 =0,95 • When interval includes zero we say that the sample mean is not significantly different from zero at the 5% confidence level.

  19. Sample mean: variance unknown and y normally distributed • Both mean and variance will need to be estimated. • Estimator for variance: 𝑡 2 = 1 2 𝑜−1 ∑ 𝑧 𝑗 − 𝑧 𝑜 2 • If Y follows a normal distribution ⇔ (𝑜−1)𝑡 2 𝑧 𝑗 −𝑧 2 = ∑ ~𝜓 𝑜−1 (no proof but intuitive) 𝑜 𝜏 2 𝜏 −𝜈 −𝜈 𝑧 𝑧 • 𝑧 −𝜈 ~ 𝑂 0,1 𝜏/ 𝑜 𝜏/ 𝑜 𝑡 / 𝑜 = = = 𝑢 𝑜−1 2 2 𝑡 𝑜−1 𝑡 𝜓𝑜−1 (𝑜−1)𝜏2 𝜏 𝑜−1 • This allows to determine a 95% confidence interval (ex. n=21) −𝜈 • 𝑄 −2,086 < 𝑧 − 2,086 𝑡 + 2,086 𝑡 < 2,086 = 0,95 ⇔ 𝑄 𝑧 𝑜 < 𝜈 < y 𝑜 =0,95 𝑡 𝑜 • For large n, the t distribution converges to the normal distribution

  20. Hypothesis testing • Null hypothesis 𝐼 0 : 𝜄 = 𝜄 0 ex: 𝐼 0 : 𝜄 = 0 • One sided test 𝐼 𝐵 : 𝜄 > 𝜄 0 (𝑝𝑠 𝜄 < 𝜄 0 ) ex: 𝐼 𝐵 : 𝜄 > 0 • Two sided test 𝐼 𝐵 : 𝜄 ≠ 𝜄 0 ex: 𝐼 𝐵 : 𝜄 ≠ 0 • 2 regions: • If observed data (test statistic) falls in rejection region =>reject H 0 • If observed data (test statistic) falls in acceptence region =>accept H 0 • Imagine you have 10 months of data and you observe a mean monthly return of the stock of Apple of 0,8% and you want to test if this mean is different from a zero return. • Assume the standard error of the return is observed to be 1,58%, so the standard error of the mean is 1,58% 10 = 0,5%

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend