Statistics, inference and ordinary least squares
Frank Venmans
Statistics, inference and ordinary least squares Frank Venmans - - PowerPoint PPT Presentation
Statistics, inference and ordinary least squares Frank Venmans Statistics Conditional probability Consider 2 events: A: die shows 1,3 or 5 => P(A)=3/6 B: die shows 3 or 6 =>P(B)=2/6 2 1 5 3 6 4 A
Frank Venmans
π π΅& πΆ π πΆ
(~ venn diagram)
2 1 5 3 6 4
Income>30,000 Education>12
π π΅ πΆ = π π΅ β π πΆ π΅ = π πΆ β π΅ πππ πΆ ππ π πππππππππππ’
π π¦|π§ = π π¦ β π π§ π¦ = π π§ β x and y are independent
same as the unconditional distribution of X.
ππ
β πΊ π = π π ππ
π ββ
ππ¦ππ§
π π¦, π§ ππ§
π§=β π§=ββ
π π§
π¦π π¦ ππ¦
β ββ
= π
information set at time t)
π¦π π¦|π§ ππ¦
β ββ
π β ππ Y β ππ
πβπ π 3
πβπ π 4
1 π 2π exp β 1 2 π¦βπ π 2
tests if skewness and kurtosis are close to 0 and 3.
(correlated or not) is normally distributed
infinite number of independent random variables with any distribution will be normally distributed.
ππ
2π₯ππ’β ππ~π 0,1 πππ πππ πππππππππππππ’ π π=1
follows a π2distribution with n degrees of freedom.
2
π
π π
π₯ππ’β π~π 0,1 πππ π~ππ
2 πππ π πππππππππππ’ ππ ππ π
follows a student or t-distribution with n degrees of freedom
the standardized normal distribution
for large n: π’β = π 0,1
Y/m with X~Οπ
2 πππ π~ππ 2 πππ π πππππππππππ’ ππ ππ π follows
an F distribution with n and m degrees of freedom.
π,π
entire population, entire set of possible βstates of the worldβ in a future period etc.
=> π will follow a prob distribution
= π
plim
πββ
π = π
1 π β π§π β π§
2
π
is a biased but consistent estimator of the variance
) is small
extreme (less likely) outcome than the observed sample mean of 4 and sample variance of 2.
with mean π and variance πΒ² (distribution is skewed, not normal)
πΒ²
=
π§1+π§2+π§3β¦π§π π
mean will follow a distribution, which is different from the distribution of y.
distribution even if y does not follow a normal distribution.
ππ‘π‘π§πππ’ππ’ππ ~N π, π2 π
β
π§ βπ π π ππ‘π‘π§πππ’ππ’ππ ~N(0,1)
π β1,96 <
π§ βπ π/ π < 1,96 = 0,95 β π π§
β 1,96
π π < π < y
+ 1,96
π π =0,95
significantly different from zero at the 5% confidence level.
1 πβ1 β
π§π β π§ 2
π
π2
= β
π§πβπ§ π 2 π
~ππβ1
2
(no proof but intuitive)
βπ π‘ / π =
π§ βπ π/ π π‘ π
=
π§ βπ π/ π πβ1 π‘ 2 (πβ1)π2
~ π 0,1
ππβ1 2 πβ1
= π’πβ1
βπ
π‘ π
< 2,086 = 0,95 β π π§ β 2,086 π‘
π < π < y
+ 2,086 π‘
π =0,95
ex: πΌ0: π = 0
ex: πΌπ΅: π > 0
ex: πΌ
π΅: π β 0
from a zero return.
10 = 0,5%
be the probability to observe an estimator larger than 0,8%(1,58%)?
π§ β0 π‘ / π ~π’π
π§ β0
π‘ π
= π π > 1,6 =0,05 (Pvalue given by Stata)
be the probability that the sample mean was outside the interval of [β
π§ π‘ / π, π§ π‘ / π]
π§ π‘ / π < π < π§ π‘ / π)
= 1- P(-1,6<X<1,6)=0,10 (Pvalue given by Stata)
returns is needed
Do not reject H0 Reject H0 H0 true Type I error, Ξ± (ex 5%) H0 false Type II error, Ξ² 1-Ξ²=power of test
test
big as an elephant, that you knew allready before doing the test, all the rest has an insignificant effect.
Y=Sales X=advertising expenditure ππ π = π½ + πΎπ Slope= Ξ² Ξ± π½ + πΎππ π
π
Y=Sales ππ π = π½ + πΎπ Slope= Ξ² Ξ± π½ + πΎππ π
π
π π π = π½ + πΎ π Slope= πΎ π½ + πΎ π Advertising expenditures
2 = πΒ²
follows a t-distribution only if errors are normal => be prudent with interpreting confidence intervals in small samples
πππ’π ππ¦ πππ’ππ’πππ πΉ ππβ² = ππ½
min
πΎ β ππ 2 π
= min
πΎ πβ²π = min πΎ
π β ππΎ β² π β ππΎ = min
πΎ
πβ²π β 2πΎβ² πβ²π + πΎβ² πβ²ππΎ πππ π‘π’ πππ ππ€ππ’ππ€π: β2πβ²π + 2πβ²ππΎ = 0 β πΎ = πβ²π β1πβ²π
Xβ²π = 0 β πβ² π β ππΎ = 0 β πΎ = πβ²π β1πβ²π
1 2ππ2 exp β ππ
2
2π2 π
n 2 log 2ππ2 β 1 2π2 β ππΒ² π
follows a student t distribution (only assymptotically the case if errors are not normally distributed)
πΎ 2 can
be increased to take that into account (option βrobustβ in stata)
most important condition for causal interpretation of betaβs (see next week)
unit, all other relevant factors being equal, will have an effect πΎ1 on Y.
drive the error term (and thus Y), are uncorrelated to the variables of interest X.
Sales Marketing expenditures Error= all other factors Innovative company Competitors Quality of product Delivery time Business cycle
πΉ π|π β 0 β πππ€ π, π β 0 β π and X are driven by common factors.
Sales Marketing expenditures Fixed effect= all factors that are constant over time Innovative company Competitors Quality of product Delivery time Idiosyncratic error= all other factors that change over time Business cycle
+ β πΏπ πΈπ
π
+ π with Di a dummy variable for company i.
ππ’ = πππ’πΎ
+ πΏπ + πit
ππ’ β π π = (πππ’βπ
π)πΎ + πππ’ β π π =βwithin estimatorβ (obtained by subtracting the sum of eq 2 over time periods)
company i. =>The difference in mean sales between a company with high and low average marketing expenditures does not drive the estimation of beta => be careful with measurement errors and lagged effects because part of variability is filtered out.
use fixed effects!
effect drives any of the Xβs as well: fixed effects are uncorrelated with Xβs
correlation is used to make estimator more efficient compared to the pooled panel regression
ππ = πΎ= marginal effect
ππππ = πΎ = ππ/π ππ/π = ππππ‘π’ππππ’π§
ππ = πΎ = ππ/π ππ = πππ₯π’β π ππ’π (think of X as time)
πππ = πΎ = ππ
ππ π
Y Y Y X X X
time series or panel.
use them with `β¦β . Since they are local macroβs, they are erased at some point (in this case until the next command that uses r to store results.