Slide Set 2 Tools Pietro Coretto pcoretto@unisa.it Econometrics - - PowerPoint PPT Presentation

slide set 2 tools
SMART_READER_LITE
LIVE PREVIEW

Slide Set 2 Tools Pietro Coretto pcoretto@unisa.it Econometrics - - PowerPoint PPT Presentation

Notes Slide Set 2 Tools Pietro Coretto pcoretto@unisa.it Econometrics Master in Economics and Finance (MEF) Universit degli Studi di Napoli Federico II Version: Monday 13 th January, 2020 (h15:38) P. Coretto MEF Tools 1 / 61


slide-1
SLIDE 1

Slide Set 2 Tools

Pietro Coretto pcoretto@unisa.it

Econometrics

Master in Economics and Finance (MEF) Università degli Studi di Napoli “Federico II”

Version: Monday 13th January, 2020 (h15:38)

  • P. Coretto • MEF

Tools 1 / 61

Vector spaces

A vector space (linear space) is a collection of objects, called vectors, that can be added together, or multiplied by a scalar (scaling). These

  • perations need to satisfy certain axioms.

Many objects in mathematics can be organized in vector spaces, and this has the advantage that we can define concepts like distance, length/size or angles in a general way. The latter is useful to extend the notion of parallelism, direction, magnitude, etc., to a broad class of objects. By now we focus on the Euclidean space RK, x = (x1, x2, . . . , xK)′ is a K-dimensional vector of RK.

  • P. Coretto • MEF

Tools 2 / 61

Notes Notes

slide-2
SLIDE 2

Distances and metrics

A metric space is a space where we can measure distance between its

  • bjects.

A distance is an abstract function d(·, ·) that measures of how far apart

  • bjects. A distance needs to fulfill axioms (positive-definiteness, symmetry,

triangle inequality). The Euclidean vector space RK is also a metric space, examples of distances are d1(x, y) = K

i=1 |xi − yi|

d2(x, y) =

K

i=1(xi − yi)2

d∞(x, y) = max{|xi − yi| , i = 1, 2, . . . , K} dp(x, y) =

K

i=1 |xi − yi|p 1

p , where p ≥ 1

  • P. Coretto • MEF

Tools 3 / 61

Every distance defines a different way of metricizing the space Take R2, and consider the point 0 = (0, 0)′. Fix ε > 0, and think about “open balls” (neighbourood) of the point 0 N1 =

  • x ∈ R2 : d1(x, 0) ≤ ε
  • N2 =
  • x ∈ R2 : d2(x, 0) ≤ ε
  • N∞ =
  • x ∈ R2 : d∞(x, 0)∞ ≤ ε
  • P. Coretto • MEF

Tools 4 / 61

Notes Notes

slide-3
SLIDE 3

Take points x = (x1, x2)′ ∈ [−2, 2] × [−2, 2], fix ε = 1, and color in red those points in the open balls

  • 2
  • 1

1 2

  • 2
  • 1

1 2

d1(·, ·)

x2 x1

  • 2
  • 1

1 2

  • 2
  • 1

1 2

d2(·, ·)

x2 x1

  • 2
  • 1

1 2

  • 2
  • 1

1 2

d∞(·, ·)

x2 x1

  • P. Coretto • MEF

Tools 5 / 61

Length, size, magnitude

A norm of a vector it’s a non negative function · that measures the “length/size/magnitude” of a vector. The norm needs to fulfill a set of axioms. Vector space with a norm, are called normed vector space. Various norms for the usual Euclidean space ℓ1-norm: x1 = K

i=1 |xi|

ℓ2-norm: x2 =

K

i=1 x2 i

ℓ∞-norm: x∞ = max{|xi| , i = 1, 2, . . . , K} ℓp-norm: xp =

K

i=1 |xi|p 1

p , where p ≥ 1

  • P. Coretto • MEF

Tools 6 / 61

Notes Notes

slide-4
SLIDE 4

There is an important connection between metric space and normed vector space For given norm · we can define a distance: d(x, y) = x − y In the Euclidean RK space d1(x, y) = x − y1 = K

i=1 |xi − yi|

d2(x, y) = x − y2 =

K

i=1(xi − yi)2

d∞(x, y) = x − y∞ = max{|xi − yi| , i = 1, 2, . . . , K} dp(x, y) = x − yp =

K

i=1 |xi − yi|p 1

p , p ≥ 1

Also note that this clarifies the meaning of norm as measure of length/size/magnitude. Let 0 be the origin, a fixed reference point of the vector space, then x = x − 0 = d(x, 0)

  • P. Coretto • MEF

Tools 7 / 61

Angles

An inner product it’s another function ·, · useful for measuring angles between vectors. It also needs to fulfill axioms. The inner product (aka dot product) defined on the Euclidean space RK is x, y = x · y = x′y =

K

  • i=1

xiyi If we have an inner product for a vector space, we also have norm. Take the Eucliden space: x2 =

  • x, x =
  • K
  • i=1

x2

i

Norm + inner product define angles: x, y = cos(θ) x2 y2 where θ is the angle between x and y.

  • P. Coretto • MEF

Tools 8 / 61

Notes Notes

slide-5
SLIDE 5

The Cauchy–Schwarz Inequality is a direct consequence of this equation, in fact: cos(θ) = x, y x2 y2 ∈ [−1, 1] Orthogonality: x and y are orthogonal/perpendicular when θ = 90◦, 270◦, that is cos(θ) = 0. x and y are orthogonal if and only if x′y = 0

  • Collinearity. x and y are collinear if they are either along same line or are

parallel to each other. I this case θ = 0◦, 180◦ which means cos(θ) = 1 or cos(θ) = −1.

  • P. Coretto • MEF

Tools 9 / 61

Linear dependence

The vectors in a subset X = {x1, x2, . . . , xk} of a vector space are linearly dependent (LD) if there exist scalars {a1, a2, . . . , ak}, not all of them null, such that a1x1 + a2x2 + . . . , akxk = 0 therefore, assuming (wlog) a1 = 0, x1 = −a2 a1 x2 + −a3 a1 x3+, . . . , +−ak a1 xk The set of vectors in X are linearly independent (LI) if the equation a1x1 + a2x2 + . . . , akxk = 0 can only hold for a1 = a2 =, . . . , = ak = 0 A set S is LI if it is not redundant, in the sense that you cannot express any vector of S as a linear combination of the other elements of S.

  • P. Coretto • MEF

Tools 10 / 61

Notes Notes

slide-6
SLIDE 6

A subset S spans a vector space V if you can build any element of V as a linear combinations of vectors in S. A subset S of a vector space V, is a basis if it is LI and it spans V.

  • Example. Take the units vectors in R2, that is

ex =

  • 1
  • ,

ey =

  • 1
  • .

Now it is easy to see that S = {ex, ey} is basis for R2, that means that any point p ∈ R2 can be written as a linear combination of elements of S. For example p =

  • a

b

  • = aex + bey

for any two real numbers (a, b)

  • P. Coretto • MEF

Tools 11 / 61

Orthogonality

Assume X = {x1, x2, . . . , xp} is a set of non-zero vectors that are

  • rthogonal (pairwise), then

1 vectors of X are LI 2 If p = K, X is a basis for RK, in particular an orthogonal basis

In the example before S = {ex, ey} is an orthogonal basis for R2 Having an orthogonal basis for a vector space V is terribly cool! Why? It means that we can compress V in a convenient way. Namely, each x ∈ V can be written as a linear combination of the elements of the orthogonal basis each x ∈ V can be separated in independent contributions

  • P. Coretto • MEF

Tools 12 / 61

Notes Notes

slide-7
SLIDE 7

This is a key concept for the rest of the course. Let’s consider (wlog) R2, and take to basis St = {t1, t2}, and Su = {u1, u2} where only Su is an orthogonal basis. Now consider any point x ∈ R2. Since these are basis then it must be true that there will be scalars (a, b) and (c, d) such that x = at1 + bt2 = cu1 + du2 Question: given x, can we always determine its components (a, b) or (c, d)? Do these components uniquely relate to x?

  • P. Coretto • MEF

Tools 13 / 61

For the orthogonal basis x, u1 = cu1 + du2, u1 = c u1, u1 + d u2, u1 = c u1, u1 x, u2 = cu1 + du2, u2 = c u1, u2 + d u2, u2 = d u2, u2 Therefore, given x, we can always find c = x, u1 u1, u1 and d = x, u2 u2, u2 Of course this doesn’t happen for the non-orthogonal basis St. This example extends to more general spaces, not just RK, ant it teaches us the beauty of the concept:

  • rthogonality = possibility to represent each member of the space with a

linear decomposition, where its individual components can be uniquely identified

  • P. Coretto • MEF

Tools 14 / 61

Notes Notes

slide-8
SLIDE 8

Beauty of vector spaces with an inner product

If a vector space is endowed with a norm, then it has a distance But recall that if a vector space has an inner product, we can use it define a norm... and a distance as a consequence. Therefore, having an inner product means that we can measure angles, size, distance, and define orthogonality. Whatever complicated the vector space is, we know that if it has an

  • rthogonal basis, its very simple to map all its elements into simpler

individual components!

  • P. Coretto • MEF

Tools 15 / 61

Matrix algebra

Algebra: is a set of objects that can be combined with one or more

  • perations based on rules that make the results well defined.

Matrix sum, matrix product, along with the determinant and the inverse allow to construct an algebraic system where one can mimic the fact that with real numbers: there exists unitary element such that 1 × x = x × 1 = x for any x ∈ R

  • ne can construct the reciprocal x∗ is such that x × x∗ = x ∗ ×x = 1,

in R you can set x∗ = 1/x for any x = 0 All the burden of certain matrix algebra calculations are essentially needed to achieve these kind of things.

  • P. Coretto • MEF

Tools 16 / 61

Notes Notes

slide-9
SLIDE 9

A m × n matrix is a 2-dimensional array, we will look at matrices with real numbers. A =

     

a11 a12 · · · a1n a21 a22 · · · a2n . . . . . . ... . . . am1 am2 · · · amn

     

= (aij) ∈ Rm×n Useful notations jth column vector A·j = (a1j, a2j, . . . , amj)′ ∈ Rm ith row vector Ai· = (ai1, ai2, . . . , ain)′ ∈ Rn

  • P. Coretto • MEF

Tools 17 / 61

Basic operations Addition: A, B ∈ Rm×n, then A − B = (aij − bij) Scalar multiplication: c ∈ R, A ∈ Rm×n, then cA = Ac = (caij) Transposition: A ∈ Rm×n with A = (aij), then A′ = (aji). Note that A′ ∈ Rn×m A key operation is the matrix-product. Let A ∈ Rm×n and B ∈ Rn×p (compatible), then the matrix-product P ∈ Rm×p is obtained by all pairs

  • f inner products between rows of A and columns of B

(pij) = Ai·B·j =

n

  • k=1

aikbkj Note: in general AB = BA (matrix multiplication is not commutative) Suggestion: take any math book and do few matrix product calculations to warm up

  • P. Coretto • MEF

Tools 18 / 61

Notes Notes

slide-10
SLIDE 10

Useful properties Associative for matrix-product: A(BC) = (AB)C Distributive for matrix-product:

A(B + C) = AB + AC (B + C)A = BA + CA

For any δ ∈ R, δ(AB) = (δA)B = A(δB) (A′)′ = A (A + B)′ = A′ + B′ (AB)′ = B′A′ (ABC)′ = C′B′A′

  • P. Coretto • MEF

Tools 19 / 61

Square matrices (n = m) are particularly important. Diagonal matrix: is a matrix with all zeros outside the main diagonal A =

     

a1 · · · a2 · · · . . . . . . ... . . . · · · an

     

A particularly important diagonal matrix is the identity matrix of order n In =

     

1 · · · 1 · · · . . . . . . ... . . . · · · 1

     

For any A compatible with In we have that InA = A and AIn = A

  • P. Coretto • MEF

Tools 20 / 61

Notes Notes

slide-11
SLIDE 11

Symmetric matrices are square matrices with elements symmetrically placed around the diagonal, that is aij = aji for all i = j: A =

         

a11 a12 a13 · · · a1n a12 a22 a23 · · · a2n a13 a23 a33 · · · a3n . . . . . . . . . ... . . . a1n a2n a3n · · · ann

         

For symmetric matrices A = A′

  • P. Coretto • MEF

Tools 21 / 61

An important real valued function in matrix computing is the determinant det : {Set of Square Matrices} − → R denoted with either det(A) or |A|. There are various methods to compute the determinant, but it remains an horrible calculation to perform anyway. Fortunately we have computers these days! However, det() is of huge interest because: it “encodes” several important properties of matrices it is essential to define the inverse of a matrix... in a minute you will see that the inverse closes the circle! If det(A) = 0 we say that A is singular.

  • P. Coretto • MEF

Tools 22 / 61

Notes Notes

slide-12
SLIDE 12

Useful properties det(A) = det(A′) det(AB) = det(A) det(B) For δ ∈ R and A ∈ Rn×n, det(δA) = δn det(A) Let A be a (p × p) square matrix, the following statements are equivalent: A is invertible det(A) = 0 the columns/rows of A are linearly independent. the columns/rows of A span Rp the columns/rows of A are a basis in Rp.

  • P. Coretto • MEF

Tools 23 / 61

A square matrix A ∈ Rn×n is called invertible or non-singular if there exists a (unique) matrix B ∈ Rn×n such that AB = BA = In In practice the inverse closes the circle of the matrix algebra, it allows to define the reciprocal element in our collection of objects! In fact, when exists, the inverse of A is denoted by A−1. An orthogonal matrix is a square real matrix whose columns and rows are

  • rthogonal vectors with unit norm (orthonormal vectors). For orthogonal

matrices A′ = A−1 = ⇒ AA′ = A′A = I

  • P. Coretto • MEF

Tools 24 / 61

Notes Notes

slide-13
SLIDE 13

Useful properties det(A−1) =

1 det(A)

(AB)−1 = B−1A−1 (ABC)−1 = C−1B−1A−1 (A′)−1 = (A−1)′

  • P. Coretto • MEF

Tools 25 / 61

Another real valued function of a square matrix is the trace = sum of diagonal elements. Let A = (aij) ∈ Rn×n, then tr(A) =

n

  • i=1

aii Useful properties tr(A) = tr(A′) tr(A + B) = tr(A) + tr(B) tr(AB) = tr(BA) tr(ABC) = tr(BCA) = tr(CAB) (cyclic property) For x ∈ Rn, then x, x = x2

2 = x′x = tr(xx′)

  • P. Coretto • MEF

Tools 26 / 61

Notes Notes

slide-14
SLIDE 14

Rank of a matrix

The rank A, denoted by rank(A), is the dimension of the vector space spanned by its columns/rows. Therefore, the column/row rank of A is the number of linearly independent column/rows vectors of A It can be shown that column rank and row rank are always equal. Therefore rank A conventionally denotes the column rank.

  • P. Coretto • MEF

Tools 27 / 61

Assume A is a n × K matrix rank(A) ≤ min (n, K) if rank(A) = min (n, K) is said to be full-rank rank(A) = 0 if and only if A is the zero matrix if A is square matrix, then A is invertible if and only if A is full-rank. Additionally, for any proper matrix B rank(AB) ≤ min{rank(A), rank(B)} rank(A) + rank(B) − K ≤ rank(AB) rank(A + B) ≤ rank(A) + rank(B)

  • P. Coretto • MEF

Tools 28 / 61

Notes Notes

slide-15
SLIDE 15

Systems of linear equations

A general system of n linear equations with K unknowns {x1, x2, . . . , xK} can be written as

          

a11x1 + a12x2 + · · · + a1KxK = b1 a21x1 + a22x2 + · · · + a2KxK = b2 . . . an1x1 + an2x2 + · · · + anKxK = bn A solution of the system (if it exists) is a a vector x = (x1, x2, . . . , xK)′ such that the equations hold jointly There are three possibilities the system does not have a solution the system has a unique solution the system has infinitely many solutions

  • P. Coretto • MEF

Tools 29 / 61

The following system of 3 equations in 2 unknowns

    

4x1 + 2x2 = 6 4x1 − 2x2 = 2x1 − 4x2 = 8 = ⇒

    

x2 = −2x1 + 3 x2 = 2x1 x2 =

1 2x1 − 2

Each of these is a straight line in R2. A solution will be a point where all three straight lines intersect. A solution does not exist:

  • P. Coretto • MEF

Tools 30 / 61

Notes Notes

slide-16
SLIDE 16

Now consider this system

    

6x1 + 6x2 = 9 6x1 − 3x2 = 15 3x1 − 6x2 = 12 We have more equations than unknown, but there is a unique solution

  • P. Coretto • MEF

Tools 31 / 61

Now consider this system

    

x1 + 0.5x2 = 1.5 2x1 + x2 = 3 3x1 + 1.5x2 = 4.5 Again, more equations than unknown, but now all the equations define the same straight line therefore there are infinitely many solutions

  • P. Coretto • MEF

Tools 32 / 61

Notes Notes

slide-17
SLIDE 17

Define the column vectors x =

     

x1 x2 . . . xK

     

, b =

     

b1 b2 . . . bn

     

, aj =

     

a1j a2j . . . anj

     

An the coefficient matrix A =

     

a11 a12 · · · a1K a21 a22 · · · a2K . . . . . . ... . . . an1 an2 · · · anK

     

=

  

| | | a1 a2 . . . aK | | |

  

  • P. Coretto • MEF

Tools 33 / 61

The augmented matrix ˜ A =

     

a11 a12 · · · a1K b1 a21 a22 · · · a2K b2 . . . . . . ... . . . an1 an2 · · · anK bn

     

=

  

| | | | a1 a2 . . . aK b | | | |

  

The system can be written in vector equation form x1a1 + x2a2 + . . . + xKaK = b Or in matrix equation form Ax = b

  • P. Coretto • MEF

Tools 34 / 61

Notes Notes

slide-18
SLIDE 18

Conditions for the existence and uniqueness of solutions are given in the famous Rouché-Capelli theorem: a system of linear equations with K variables has a solution if and

  • nly if the rank(A) = rank( ˜

A) if a solution exists and rank(A) = K, then the solution is unique if a solution exists and rank(A) < K, then there is an infinite number of solutions if a solution exists and rank(A) < K, the set of solutions forms a vector space of dimension K − rank(A) There are various algorithms to solve linear systems. Note that if K = n and A−1 exists, the unique solution of the system can be easily computed as x = A−1b. Why such solution exists?

  • P. Coretto • MEF

Tools 35 / 61

Idempotent matrices and projections

A square matrix A is idempotent iff A2 = AA = A The only full rank, symmetric idempotent matrix is the identity I If A = I is symmetric idempotent = ⇒ det(A) = 0 If A is symmetric idempotent, then rank(A) = tr(A) A projection is a linear transformation from a vector space to itself. Suppose x ∈ RK then a projection looks like y = P x this transforms x in another element of the same space. The projection has to fulfill the “primitive notion” that “we dont’t change anything if we re-project the same object twice”, so we want that P y = y = ⇒ P (P x) = P 2x = P x = ⇒ that a projection matrix needs to be idempotent

  • P. Coretto • MEF

Tools 36 / 61

Notes Notes

slide-19
SLIDE 19

Orthogonal projections

  • P. Coretto • MEF

Tools 37 / 61

Quadratic forms

Let x ∈ Rp and A ∈ Rp×p, A is symmetric, the associated quadratic form is a function qA : Rp → R qA(x) =

p

  • i=1

p

  • j=1

aijxixj = x′Ax A real symmetric matrix A is positive definite (write PD or A ≻ 0) if for all x = 0 the quadratic form x′Ax > 0 A is positive semi-definite (write PSD or A 0) if for all x = 0 the quadratic form x′Ax ≥ 0 If A ≻ 0 then det(A) > 0, therefore PD matrix are non-singular Loewner ordering. A B means that A − B 0, that is A − B is PSD, or x′Ax ≥ x′Bx for all x = 0. This formalizes the concept that “A is overall bigger than B”

  • P. Coretto • MEF

Tools 38 / 61

Notes Notes

slide-20
SLIDE 20

Multivariate probability

In this course we consider vector valued random variables (rv) , take a rv X = (X1, X2, . . . , Xp)′ with values in Rp. The probability law of a (vector) rv is represented by its multivariate distribution function, for t ∈ Rp F(t) = Pr{X1 ≤ t1, X2 ≤ t2, . . . , Xp ≤ tp} In most cases F is smooth enough that has a derivative f called the density function at any x ∈ Rp f(x) = f(x1, x2, . . . , xp) = ∂F(t1, t2, . . . , tp) ∂t1∂t2, · · · ∂tp

  • t=x

These are absolutely continuous rv, for which F(t) =

t1

−∞

t2

−∞

· · ·

tp

−∞

f(x1, x2, . . . , xp)dx1dx2 · · · dxp

  • P. Coretto • MEF

Tools 39 / 61

Multivariate moments

First order moments, that is, unconditional expectation can be easily extended from the univariate case. For a rv X = (X1, X2, . . . , Xp) in Rp E[X] = (E[X1], E[X2], . . . , E[Xp])′ E[X] is a centrality/location measure. Definition of second order moments is less obvious. There are several

  • possibilities. We work with E[XX′], but the argument is a (p × p)-matrix,

so here we mean expectation element-wise. For example take X ∈ R2 E[XX′] = E

  • X2

1

X1X2 X2X1 X2

2

  • =
  • E[X2

1]

E[X1X2] E[X2X1] E[X2

2]

  • Exercise: let X ∈ Rp write down the second moment of X (as above)
  • P. Coretto • MEF

Tools 40 / 61

Notes Notes

slide-21
SLIDE 21

What about variance? Consider X in Rp with expectation µ ∈ Rp. Then for all i, j = 1, 2, . . . , p σi,j = E[(Xi − µi)(Xj − µj)] = E[XiXj] − E[Xi] E[Xj] The variance of X is summarized in the (p × p) variance-covariance matrix Var(X) = Σ =

     

σ2

1

σ1,2 · · · σ1,p σ2,1 σ2

2

· · · σ2,p . . . . . . ... . . . σp,1 σp,2 · · · σ2

p

     

Usually we say “covariance matrix” or simply “variance of X” The covariance matrix is connected to its second order moment: Σ = E[(X − µ)(X − µ)′] = E[XX′] − µµ′ it describes a lot of things about the linear dependence between the margins

  • P. Coretto • MEF

Tools 41 / 61

det(Σ) is called “generalized variance”. large det(Σ): large overall dispersion around the mean vector. small det(Σ): the distribution of X is extremely concentrated around the mean vector. Singular covariance, that is det(Σ) = 0 we can find a direction in the Euclidean space where there is absence

  • f variability. For example a marginal Xi is degenerate.

some of the components of X are collinear Usually we want to get rid of singular cases, and we always look to rv with a nice PD covariance.

  • P. Coretto • MEF

Tools 42 / 61

Notes Notes

slide-22
SLIDE 22

Correlation matrix

The covariance matrix contains information about the correlation

  • structure. Given Σ, let

Q = (diag(Σ))− 1

2 =

     

1/σ1 · · · 1/σ2 · · · . . . . . . ... . . . · · · 1/σp

     

The correlation matrix is simply given by Cor(X) = R = QΣQ

  • P. Coretto • MEF

Tools 43 / 61

Now check (wlog you can take X ∈ R2) that R(i,j) = ρi,j = σi,j σiσj In general the correlation matrix of a rv X ∈ Rp is a (p × p) symmetric matrix Cor(X) = R =

     

1 ρ1,2 · · · ρ1,p ρ2,1 1 · · · ρ2,p . . . . . . ... . . . ρp,1 ρp,2 · · · 1

     

  • P. Coretto • MEF

Tools 44 / 61

Notes Notes

slide-23
SLIDE 23

Sampling estimators of mean and covariance

Let {x1, x2, . . . , xn} be n sampling replicas of the rv X defined in Rp. Sampling estimator are simply obtained by replacing the distribution of X (called population distribution) with the empirical distribution. The empirical distribution is a uniform (discrete) distribution that attaches a probability of 1/n to each sample point. sample mean: xn = 1

n

n

i=1 xi

sample cov (biased): Sn = 1 n

n

  • i=1

(xi − xn)(xi − xn)′ =

  • 1

n

n

  • i=1

xix′

i

  • − xnxn′

sample cov (unbiased): Sn =

1 n−1

n

i=1(xi − xn)(xi − xn)′

  • P. Coretto • MEF

Tools 45 / 61

Orthogonality: probabilistic viewpoint

Consider the set H = {X : X is real-valued random variable such that E[X2] < +∞} This is set can be framed as vector space with an inner product! Therefore where one can define notions of length/size, distance angle... and

  • rthogonality as we do for the Euclidean space.

A simple way to define an inner product on H is X, Y = E[(X − E[X])(Y − E[Y ])] = Cov(X, Y ) We say X and Y are orthogonal if Cov(X, Y ) = 0 For random variables we use orthogonality as equivalent to linear

  • independence. This is confusing if we think about the geometric definition.
  • P. Coretto • MEF

Tools 46 / 61

Notes Notes

slide-24
SLIDE 24

The confusion needs to be fixed: Cov(X, Y ) = 0 = ⇒ X and Y are two orthogonal members of H. It does not exist conditional mean dependence between X, Y |Cov(X, Y )| > 0 = ⇒ X and Y are non-orthogonal. In this case there may be a linear conditional mean representation of the

  • dependence. That’s why we say that there is “linear dependence”

The general abstract notion of orthogonality is also essential in linear statistical models. This will be understood as course progresses

  • P. Coretto • MEF

Tools 47 / 61

Linear maps between random variables

Linear transformations are common in Econometrics. Take a rv X ∈ Rp with mean µ and covariance Σ. Consider t ∈ Rp and a (p × p) matrix A. Consider another rv Y ∈ Rp obtained as a linear map Y = t + AX E[Y ] = t + Aµ Var[Y ] = AΣA′

  • P. Coretto • MEF

Tools 48 / 61

Notes Notes

slide-25
SLIDE 25

Multivariate Normal Distribution (MVN)

The probability law is governed by first and second order moments. X ∼ N(µ, Σ) has density φ(x; µ, Σ) = 1 (2π)

p 2

1 det(Σ)

1 2

exp

  • −1

2(x − µ)′Σ−1(x − µ)

  • Where E[X] = µ and Cov[X] = Σ.

For µ = 0 and Σ = Ip we obtain the standard (spherical) MVN. A MNV produces elliptically symmetric scatters.

  • P. Coretto • MEF

Tools 49 / 61

The MVN is extremely important in the “linear world” because of the following properties. Assume X ∼ N(µ, Σ) Any linear combination of X is again MVN. If Y = t + AX then Y ∼ N(t + Aµ, AΣA′) Any subset of marginals of X have a joint MVN distribution Uncorrelation ⇐ ⇒ independence Conditional distributions are MVN

  • P. Coretto • MEF

Tools 50 / 61

Notes Notes

slide-26
SLIDE 26

Quadratic forms of rv

An important quadratic form is the “squared Mahalanobis distance” (SMD) (x − µ)′Σ−1(x − µ) this gives the squared “weighted” distance of x from µ by taking into account the correlation structure. Note that in the spherical case that Σ = Ip, the SMD becomes (x − µ)′(x − µ) = x − µ2

2 = d2(x, µ)2

In the SMD the Σ−1 is used to “rotate” the axis so that the scatter becomes spherical, and then the usual Euclidean norm is used to measure distances between points.

  • P. Coretto • MEF

Tools 51 / 61

Quadratic forms of rv with MNV distribution occur all the time in

  • Econometrics. The following results will be used for the rest of the course:

If X ∼ N(µ, Σ) then (x − µ)′Σ−1(x − µ) ∼ χ2(p) A special case is when If Z ∼ N(0, Ip), then Z′Z ∼ χ2(p) The more general result: if Z ∼ N(0, Ip), A is idempotent (that is AA = A), rank(A) = m, then Z′AZ ∼ χ2(m)

  • P. Coretto • MEF

Tools 52 / 61

Notes Notes

slide-27
SLIDE 27

Conditional expectation

Suppose that X is a continuous/discrete rv, and Y is and a discrete rv. We all know what E[X | Y = y] means and we know how to compute it For instance, assume that X is continuous and you know fX(x | Y = y) E[X | Y = y] =

  • x fX(x | Y = y)dx

Suppose now that Y is continuous, and consider E[X | Y ]. Have you ever thought about what this really means? If not, you should seriously think about it, because 99% of econometric models are specified in terms of conditional expectations like that

  • P. Coretto • MEF

Tools 53 / 61

In elementary statistics you learned Pr{A | B} where A and B are events And in fact, we only know how to condition on events, not on a random variable (stochastic conditioning) It turns out that stochastic conditioning is the same as the usual conditioning In order to better understand the latter, we need to better understand how we formalize the information generated from an experiment

  • P. Coretto • MEF

Tools 54 / 61

Notes Notes

slide-28
SLIDE 28

Intuitive notion of information

Consider a probability space (Ω, F, P) Ω is the corresponding sample/outcome space F is the set of all possible events for which you are able to compute probabilities P assigns probabilities to the events F is the “biggest information set” you can manage with P, it represents “knowing everything” Any H ⊆ F is a “smaller information set”, it represents “knowing something” B = {∅, Ω} is the “smallest information set” and it is contained in any H ⊆ F, it represents “knowing nothing”

  • P. Coretto • MEF

Tools 55 / 61

Intuitive notion of conditional expectation

Consider a rv X, you are asked to compute its (unconditional) expectation E[X]. Assuming you know the probability law of X, you know how to do it. Now suppose that events in H occurred. You don’t want to waste this information, so you immediately update the probability law of X, with the posterior probability law of X. Now you are asked to recompute the expectation of X. What do you do? Since you know how to use information efficiently, you redo your calculations replacing the original probability of X with the previous posterior law. Doing this, your are just computing the conditional expectation E[X | H]

  • P. Coretto • MEF

Tools 56 / 61

Notes Notes

slide-29
SLIDE 29

Stochastic conditioning

Let X and Y be rv. Define the information set generated when we perform the experiment Y as σ(Y ) ⊆ F This information can be seen as σ(Y ) = information contained in Y σ(Y ) = what we know when we see one of the many results of Y for serious math-nerd σ(Y ) = σ−algebra generated by Y We write E[X | Y ], but this is (a terribly bad) short notation for E[X | σ(Y )]. Summarizing stochastic conditioning is not that different from the usual concept E[X | Y ] = g(Y ) is a rv itself: each random experimental result produced by Y will produce a different value for E[X | Y ]

  • P. Coretto • MEF

Tools 57 / 61

Note that unconditional expectation = knowing nothing where knowing nothing = B = {∅, Ω} In fact, E[X] is also short notation for E[X] = E[X | B] = E[X | {∅, Ω}] It is also now intuitive why E[Y | F] = Y

  • P. Coretto • MEF

Tools 58 / 61

Notes Notes

slide-30
SLIDE 30

Law of total expectation (aka Tower rule)

If X and Y are rv, we know that E[X | Y ] = g(Y ) is a rv. Therefore it make sense to compute expectations of E[X | Y ] Math statement: consider two “smaller” information sets: H1 ⊆ H2 ⊆ F, then E[E[X | H2] | H1] = E[X | H1] = E[E[X | H1] | H2] Practical interpretation: in sequential conditioning the smallest information set always wins! As a special case we obtain the “Law of iterated expectation”

  • P. Coretto • MEF

Tools 59 / 61

Law of iterated expectation

Statement: consider two rv X and Y , then E[E[X | Y ]] = E[X] To see why this is a consequence of the previous result, define H2 = σ(Y ) ⊆ F H1 = B = {∅, Ω}, and by definition H1 ⊆ H2 ⊆ F The Law of total expectation implies E[ E[X | H2] | H1 ] = E[X | H1] That is E[ E[X | σ(Y )] | B ] = E[X | B] Using the “short notation” for σ(Y ), and omitting B as usual E[ E[X | Y ] ] = E[X]

  • P. Coretto • MEF

Tools 60 / 61

Notes Notes

slide-31
SLIDE 31

Pull out what’s known

Another useful property of conditional expectation is the so called “Pull

  • ut what’s known” (sometimes called “linearity of conditional

expectations”): E[XY | Y ] = Y E[X | Y ] Can you formulate any intuitive explanation of this result?

  • P. Coretto • MEF

Tools 61 / 61

Notes Notes