Matrices A brief introduction Basilio Bona DAUIN Politecnico di - - PowerPoint PPT Presentation

matrices a brief introduction
SMART_READER_LITE
LIVE PREVIEW

Matrices A brief introduction Basilio Bona DAUIN Politecnico di - - PowerPoint PPT Presentation

Matrices A brief introduction Basilio Bona DAUIN Politecnico di Torino September 2013 Basilio Bona (DAUIN) Matrices September 2013 1 / 74 Definitions Definition A matrix is a set of N real or complex numbers organized in m rows and n


slide-1
SLIDE 1

Matrices A brief introduction

Basilio Bona

DAUIN – Politecnico di Torino

September 2013

Basilio Bona (DAUIN) Matrices September 2013 1 / 74

slide-2
SLIDE 2

Definitions

Definition

A matrix is a set of N real or complex numbers organized in m rows and n columns, with N = mn A =    a11 a12 · · · a1n a21 a22 · · · a2n · · · · · · aij · · · am1 am2 · · · amn    ≡ aij

  • i = 1, . . . , m

j = 1, . . . , n A matrix is always written as a boldface capital letter viene as in A. To indicate matrix dimensions we use the following symbols Am×n Am×n A ∈ Fm×n A ∈ Fm×n where F = R for real elements and F = C for complex elements.

Basilio Bona (DAUIN) Matrices September 2013 2 / 74

slide-3
SLIDE 3

Transpose matrix

Given a matrix Am×n we define a transpose matrix the matrix obtained exchanging rows and columns AT

n×m =

    a11 a21 · · · am1 a12 a22 · · · am2 . . . . . . ... . . . a1n a2n · · · amn     The following property holds (AT)T = A

Basilio Bona (DAUIN) Matrices September 2013 3 / 74

slide-4
SLIDE 4

Square matrix

A matrix is said to be square when m = n A square n × n matrix is upper triangular when aij = 0, ∀i > j An×n =     a11 a12 · · · a1n a22 · · · a2n . . . . . . ... . . . · · · ann     If a square matrix is upper triangular its transpose is lower triangular and viceversa AT

n×n =

    a11 · · · a12 a22 · · · . . . . . . ... . . . a1n a2n · · · ann    

Basilio Bona (DAUIN) Matrices September 2013 4 / 74

slide-5
SLIDE 5

Symmetric matrix

A real square matrix is said to be symmetric if A = AT, or A − AT = O In a real symmetric matrix there are at least n(n + 1) 2 independent elements. If a matrix K has complex elements kij = aij + jbij (where j = √−1) its conjugate is K with elements kij = aij − jbij. Given a complex matrix K, an adjoint matrix K∗ is defined, as the conjugate transpose K∗ = K

T = KT

A complex matrix is called self-adjoint or hermitian when K = K∗. Some textbooks indicate this matrix as K† or KH

Basilio Bona (DAUIN) Matrices September 2013 5 / 74

slide-6
SLIDE 6

Diagonal matrix

A square matrix is diagonal if aij = 0 for i = j An×n = diag(ai) =     a1 · · · a2 · · · . . . . . . ... . . . · · · an     A diagonal matrix is always symmetric.

Basilio Bona (DAUIN) Matrices September 2013 6 / 74

slide-7
SLIDE 7

Skew-symmetric matrix

Skew-symmetric matrix

A square matrix is skew-symmetric or antisymmetric if A + AT = 0 → A = −AT Given the constraints of the above relation, a generic skew-symmetric matrix has the following structure An×n =     a12 · · · a1n −a12 · · · a2n . . . . . . ... . . . −a1n −a2n · · ·     In a skew-symmetric matrix there are at most n(n − 1) 2 non zero independent elements. We will see in the following some important properties of the skew-symmetric 3 × 3 matrices.

Basilio Bona (DAUIN) Matrices September 2013 7 / 74

slide-8
SLIDE 8

Block matrix

It is possible to represent a matrix with blocks as A =   A11 · · · A1n · · · Aij · · · Am1 · · · Amn   where the blocks Aij have suitable dimensions. Given the following matrices A1 =   A11 · · · A1n O Aij · · · O O Amn   A2 =   A11 O O · · · Aij O Am1 · · · Amn   A3 =   A11 O O O Aij O O O Amn   A1 is upper block triangular, A2 is lower block triangular, and A3 is block diagonal

Basilio Bona (DAUIN) Matrices September 2013 8 / 74

slide-9
SLIDE 9

Matrix algebra

Matrices are elements of an algebra, i.e., a vector space together with a product operator. The main operations of this algebra are: product by a scalar, sum, and matrix product

Product by a scalar

αA = α     a11 a12 · · · a1n a21 a22 · · · a2n . . . . . . ... . . . am1 am2 · · · amn     =     αa11 αa12 · · · αa1n αa21 αa22 · · · αa2n . . . . . . ... . . . αam1 αam2 · · · αamn    

Sum

A + B =     a11 + b11 a12 + b12 · · · a1n + b1n a21 + b21 a22 + b22 · · · a2n + b2n . . . . . . ... . . . am1 + bm1 am2 + bm2 · · · amn + bmn    

Basilio Bona (DAUIN) Matrices September 2013 9 / 74

slide-10
SLIDE 10

Matrix sum

Sum properties

A + O = A A + B = B + A (A + B) + C = A + (B + C) (A + B)T = AT + BT The null (neutral, zero) element O takes the name of null matrix. The subtraction (difference) operation is defined using the scalar α = −1: A − B = A + (−1)B

Basilio Bona (DAUIN) Matrices September 2013 10 / 74

slide-11
SLIDE 11

Matrix product

Matrix product

The operation is performed using the well-known rule “rows by columns”: the generic element cij of the matrix product Cm×p = Am×n · Bn×p is cij =

n

  • k=1

aikbkj The bi-linearity of the matrix product is guaranteed, since it is immediate to verify that, given a generic scalar α, the following identity holds: α(A · B) = (αA) · B = A · (αB)

Basilio Bona (DAUIN) Matrices September 2013 11 / 74

slide-12
SLIDE 12

Product

Product properties

A · B · C = (A · B) · C = A · (B · C) A · (B + C) = A · B + A · C (A + B) · C = A · C + B · C (A · B)T = BT · AT In general: the matrix product is non-commutative: A · B = B · A, apart from particular cases; A · B = A · C does not imply B = C, apart from particular cases; A · B = O does not imply A = O or B = O, apart from particular cases.

Basilio Bona (DAUIN) Matrices September 2013 12 / 74

slide-13
SLIDE 13

Identity matrix

A neutral element wrt product exists and is called identity matrix, written as In or simply I when no ambiguity arises; given a rectangular matrix Am×n the following identities hold Am×n = ImAm×n = Am×nIn

Identity matrix

I =     1 · · · · · · · · · . . . . . . ... . . . · · · 1    

Basilio Bona (DAUIN) Matrices September 2013 13 / 74

slide-14
SLIDE 14

Idempotent matrix

Given a square matrix A ∈ Rn×n, the k-th power is Ak =

k

  • ℓ=1

A A matrix is said to be idempotent if A2 = A ⇒ An = A

Basilio Bona (DAUIN) Matrices September 2013 14 / 74

slide-15
SLIDE 15

Trace

Trace

The trace of a square matrix An×n is the sum of its diagonal elements tr (A) =

n

  • k=1

akk The matrix traces satisfies the following properties tr (αA + βB) = α tr (A) + β tr (B) tr (AB) = tr (BA) tr (A) = tr (AT) tr (A) = tr (T−1AT) for non singular T (see below)

Basilio Bona (DAUIN) Matrices September 2013 15 / 74

slide-16
SLIDE 16

Minor

A minor of order p of a matrix Am×n is the determinant Dp of a square sub-matrix obtained selecting any p rows and p columns of Am×n The formal definition of determinant will be presented below There are as many minors as there are possible choices of p on m rows and of p on n columns. Given a matrix Am×n, the principal minors of order k are the determinants Dk, with k = 1, · · · , min{m, n}, obtained selecting the first k rows an kd columns of Am×n.

Basilio Bona (DAUIN) Matrices September 2013 16 / 74

slide-17
SLIDE 17

Minor and cofactor

Given A ∈ Rn×n, we indicate with A(ij) ∈ R(n−1)×(n−1) the matrix

  • btained taking out the i-th row and the j-th column of A.

We define the minor Drc of a generic element arc of a square matrix An×n, the determinant of the matrix obtained taking out the r-th row and the c-th column, i.e., det A(rc) Drc = det A(rc). We define the cofactor of an element arc of a square matrix An×n the product Arc = (−1)r+cDrc

Basilio Bona (DAUIN) Matrices September 2013 17 / 74

slide-18
SLIDE 18

Determinant

Once defined the cofactor, the determinant of a square matrix A can be defined “by row”, i.e., choosing a generic row i, det (A) =

n

  • k=1

aik(−1)i+k det (A(ik)) =

n

  • k=1

aikAik

  • r, choosing a generic column j, we have the definition “by column”:

det (A) =

n

  • k=1

akj(−1)k+j det (A(kj)) =

n

  • k=1

akjAkj Since these definition are recursive and assume the computation of determinants of smaller order minors, it is necessary to define the determinant of a matrix 1 × 1 (scalar), that is simply det (aij) = aij.

Basilio Bona (DAUIN) Matrices September 2013 18 / 74

slide-19
SLIDE 19

Properties of determinant

det(A · B) = det(A) det(B) det(AT) = det(A) det(kA) = kn det(A) if one makes a number of s exchanges between rows or columns of A,

  • btaining a new matrix As, we have det(As) = (−1)s det(A)

if A has two equal or proportional rows/columns, we have det(A) = 0 if A has a row or a column that is a linear combination of other rows

  • r columns, we have det(A) = 0

if A ` e upper or lower triangular, we have det(A) = n

i=1 aii

if A is block triangular, with p blocks Aii on the diagonal, we have det(A) = p

i=1 det Aii

Basilio Bona (DAUIN) Matrices September 2013 19 / 74

slide-20
SLIDE 20

Singular matrix and rank

A matrix A is singular if det(A) = 0. We define the rank of matrix Am×n, the number ρ(Am×n), computed as the maximum integer such that at least a non zero minor Dp exists. The following properties hold: ρ(A) ≤ min{m, n} if ρ(A) = min{m, n}, A is said to have full rank if ρ(A) < min{m, n}, the matrix does not have full rank and one says that there is a fall of rank ρ(AB) ≤ min{ρ(A), ρ(B)} ρ(A) = ρ(AT) ρ(AAT) = ρ(ATA) = ρ(A) if An×n and det A < n then it has no full rank

Basilio Bona (DAUIN) Matrices September 2013 20 / 74

slide-21
SLIDE 21

Invertible matrix

Given a square matrix A ∈ Rn×n, it is invertible of nonsingular if an inverse matrix A−1

n×n exists, such that

AA−1 = A−1A = In The matrix is invertible iff ρ(A) = n, or rather it has full rank; this implies det(A) = 0. The inverse matrix can be computed as A−1 = 1 det(A)Adj(A) The following properties hold: (A−1)−1 = A; (AT)−1 = (A−1)T. The inverse matrix, if exists, allows to compute the following matrix equation y = Ax

  • btaining the unknown x as

x = A−1y.

Basilio Bona (DAUIN) Matrices September 2013 21 / 74

slide-22
SLIDE 22

Orthonormal matrix

A square matrix is orthonormal if A−1 = AT. the following identity holds ATA = AAT = I Given two square matrices A and B of equal dimension n × n, the following identity holds (AB)−1 = B−1A−1 An important results, called Inversion lemma, establish what follows: if A, C are square invertible matrices and B, D are matrices of of suitable dimensions, then (A + BCD)−1 = A−1 − A−1B(DA−1B + C−1)−1DA−1 Matrix (DA−1B + C−1) must be invertible. The inversion lemma is useful to compute the inverse of a sum of matrices A1 + A2, when A2 is decomposable into the product BCD and C is easily invertible, for instance diagonal or triangular.

Basilio Bona (DAUIN) Matrices September 2013 22 / 74

slide-23
SLIDE 23

Matrix derivative

If a matrix A(t) is composed of elements aij(t) that are all differentiable wrt (t), then the matrix derivative is d dt A(t) = ˙ A(t) = d dt aij(t)

  • = [˙

aij(t)] If a square matrix A(t) has rank ρ(A(t)) = n for any time (t), then the derivative of its inverse is d dt A(t)−1 = −A−1(t) ˙ A(t)A(t)−1 Since the inverse operator is non linear, in general it results dA(t) dt −1 = d dt

  • A(t)−1

Basilio Bona (DAUIN) Matrices September 2013 23 / 74

slide-24
SLIDE 24

Symmetric Skew-symmetric decomposition

Given a real matrix A ∈ Rm×n, the two matrices ATA ∈ Rn×n AAT ∈ Rm×m are both symmetric. Given a square matrix A, it is always possible to factor it in a sum of two matrices, as follows: A = As + Aa where As = 1 2(A + AT) symmetric matrix Aa = 1 2(A − AT) skew-symmetric matrix

Basilio Bona (DAUIN) Matrices September 2013 24 / 74

slide-25
SLIDE 25

Similarity transformation

Given a square matrix A ∈ Rn×n and a non singular square matrix T ∈ Rn×n, the new matrix B ∈ Rn×n, obtained as B = T−1AT

  • ppure

B = TAT−1 is said to be similar to A, and the transformation T is called similarity transformation.

Basilio Bona (DAUIN) Matrices September 2013 25 / 74

slide-26
SLIDE 26

Eigenvalues and eigenvectors

Considering the similarity transformation between A and Λ, where the latter is diagonal Λ = diag(λi) A = UΛU−1 and U = u1 u2 · · · un

  • Multiplying to the right A by U one obtains

AU = UΛ and then Aui = λiui This identity is the well-known formula that relates the matrix eigenvalues to eigenvectors; the constant quantities λi are the eigenvalues of A, while vectors ui are the eigenvectors of A, usually with non-unit norm.

Basilio Bona (DAUIN) Matrices September 2013 26 / 74

slide-27
SLIDE 27

Eigenvalues and eigenvectors

Given a square matrix An×n, the solutions λi (real or complex) of the characteristic equation det(λI − A) = 0 are the eigenvalues of A. det(λI − A) is a polynomial in λ, called characteristic polynomial. If the eigenvalues arre all distinct, the vectors ui that satisfy the identity Aui = λiui are the eigenvectors of A.

Basilio Bona (DAUIN) Matrices September 2013 27 / 74

slide-28
SLIDE 28

Generalized eigenvectors

If the eigenvalues are not all distinct, one obtains the so-called generalized eigenvalues, whose characterization goes beyond the scope of these notes. From a geometrical point of view, the eigenvectors define those directions in Rn (domain of the linear transformation represented by A) that are invariant wrt the transformation A, while the eigenvalues provide the related “scale factors” along these directions. The set of eigenvalues of a matrix A will be indicated as Λ(A), or rather {λi(A)}; the set of eigenvectors of A will be indicated as {ui(A)}. In general, since the eigenvectors represent the invariant directions of the transformation, they are represented up to a constant factor, so they are usually normalized; this is a tacit assumption that will be considered here, unless otherwise stated.

Basilio Bona (DAUIN) Matrices September 2013 28 / 74

slide-29
SLIDE 29

Eigenvalues

Properties Given a matrix A and its eigenvalues {λi(A)}, the following holds true {λi(A + cI)} = {(λi(A) + c)} Given a matrix A and its eigenvalues {λi(A)}, the following holds true {λi(cA)} = {(cλi(A)} Given an upper or lower triangular matrix     a11 a12 · · · a1n a22 · · · a2n . . . . . . ... . . . · · · ann     ,     a11 · · · a21 a22 · · · . . . . . . ... . . . an1 an2 · · · ann     its eigenvalues are the terms on the diagonal {λi(A)} = {aii}; the same applies for a diagonal matrix.

Basilio Bona (DAUIN) Matrices September 2013 29 / 74

slide-30
SLIDE 30

Invariance of the eigenvalues

Given a matrix An×n and its eigenvalues {λi(A)}, the following holds true det(A) =

n

  • i=1

λi and tr (A) =

n

  • i=1

λi Given a general invertible transformation, represented by the matrix T, the eigenvalues of A are invariant to the similarity transformation B = T−1AT

  • r rather

{λi(B)} = {λi(A)}

Basilio Bona (DAUIN) Matrices September 2013 30 / 74

slide-31
SLIDE 31

Modal matrix

If we build a matrix M, whose columns are the unit eigenvalues ui(A) of A M = u1 · · · un

  • then the similarity transformation wrt M results i a diagonal matrix

Λ =     λ1 · · · λ2 · · · . . . . . . ... . . . · · · λn     = M−1AM M takes the name of modal matrix. If A is symmetric, its eigenvalues are all real and the following identity holds Λ = MTAM In this particular case M is orthonormal.

Basilio Bona (DAUIN) Matrices September 2013 31 / 74

slide-32
SLIDE 32

Singular value decomposition – SVD

Given a generic matrix A ∈ Rm×n, having rank r = ρ(A) ≤ s, with s = min{m, n}, it can be factorized according to the Singular value decomposition (SVD) in the following way: A = UΣVT =

s

  • i=1

σiuivT

i

the important elements of this decomposition are σi, ui and vi

Basilio Bona (DAUIN) Matrices September 2013 32 / 74

slide-33
SLIDE 33

SVD

σi(A) ≥ 0 are called singular values and are equal to the non-negative square roots of the eigenvalues of the symmetric matrix ATA: {σi(A)} = {

  • λi(ATA)}

σi ≥ 0 listed in decreasing order σ1 ≥ σ2 ≥ · · · ≥ σs ≥ 0 if rank r < s there are only r positive singular values; the remaining

  • nes are zero

σ1 ≥ σ2 ≥ · · · ≥ σr > 0; σr+1 = · · · = σs = 0 U is an orthonormal square matrix (m × m) U = u1 u2 · · · um

  • whose columns are the eigenvectors ui of AAT

Basilio Bona (DAUIN) Matrices September 2013 33 / 74

slide-34
SLIDE 34

SVD

V is a orthonormal square matrix (n × n) V = v1 v2 · · · vn

  • whose columns are the eigenvectors vi of ATA

Σ is a rectangular matrix (m × n) with the following structure if m < n Σ = Σs O if m = n Σ = Σs if m > n Σ =

  • Σs

O

  • .

Σs = diag(σi) is s × s diagonal, and its diagonal terms are the singular values.

Basilio Bona (DAUIN) Matrices September 2013 34 / 74

slide-35
SLIDE 35

SVD

Otherwise we can decompose A in a way that puts in evidence the positive singular values alone: A =

  • P

¯ P

  • U
  • Σr

O O O

  • Σ

QT ¯ QT

  • VT

= PΣrQT where P is a m × r orthonormal matrix; ¯ P is a m × (m − r) orthonormal matrix; Q is a n × r orthonormal matrix, ¯ QT ia a n × (n − r) orthonormal matrix; Σr is a r × r diagonal matrix with diagonal elements the positive singular values σi, i = 1, · · · , r.

Basilio Bona (DAUIN) Matrices September 2013 35 / 74

slide-36
SLIDE 36

SVD and rank

The rank r of A is equal to the number r ≤ s of nonzero singular values. Given a generic matrix A ∈ Rm×n, the two matrices ATA and AAT are symmetrical, have the same positive singular values, and differ only for the number of zero singular values.

Basilio Bona (DAUIN) Matrices September 2013 36 / 74

slide-37
SLIDE 37

Linear operators representation

Given two vector spaces X ⊆ Rn and Y ⊆ Rm, with dimensions n and m, and given two generic vectors x ∈ X and y ∈ Y, the generic linear transformation between the two spaces can be represented by the matrix

  • perator A ∈ Rm×n, as follows:

y = Ax; x ∈ Rn; y ∈ Rm. Therefore a matrix can be always interpreted as an operator that transforms a vector from the domain in X to the range Y. Conversely, a linear operator has at least one matrix that represents it.

Basilio Bona (DAUIN) Matrices September 2013 37 / 74

slide-38
SLIDE 38

Image space and null space

The image space or range of a transformation A is the subspace Y defined by the following property: R(A) = {y | y = Ax, x ∈ X}; R(A) ⊆ Y The null space or kernel of a transformation A is the subspace of X defined by the following property: N(A) = {x | 0 = Ax, x ∈ X}; N(A) ⊆ X The null space represents all the vectors in X that are trasformed into the

  • rigin of Y.

The dimensions of the range and kernel are called, respectively, rank ρ(A) and nullity ν(A): ρ(A) = dim(R(A)); ν(A) = dim(N(A)).

Basilio Bona (DAUIN) Matrices September 2013 38 / 74

slide-39
SLIDE 39

Image space and null space

If X and Y have finite dimensions, the the following equalities hold: N(A) = R(AT)⊥ R(A) = N(AT)⊥ N(A)⊥ = R(AT) R(A)⊥ = N(AT) where ⊥ indicates the orthogonal complement to the corresponding (sub-)space. We recall that {0}⊥ = R. The following orthogonal decomposition of subspaces X and Y hold X = N(A) ⊕ N(A)⊥ = N(A) ⊕ R(AT) Y = R(A) ⊕ R(A)⊥ = R(A) ⊕ N(AT) where the symbol ⊕ represents the direct sum operator between subspaces.

Basilio Bona (DAUIN) Matrices September 2013 39 / 74

slide-40
SLIDE 40

Generalized inverse

Given a generic real matrix A ∈ Rm×n, with m = n, the inverse matrix is not defined. Nevertheless, it is possible to define a class of matrices A−, called pseudo-inverses or generalized inverses, that satisfy the following relation: AA−A = A If A has full rank, i.e., ρ(A) = min{m, n}, it is possible to define two classes of generalized inverses if m < n (i.e., ρ(A) = m), the right inverse of A is a matrix Ar ∈ Rn×m such that AAr = Im×m is n < m (i.e., ρ(A) = n), the left inverse of A is a matrix Aℓ ∈ Rn×m such that AℓA = In×n

Basilio Bona (DAUIN) Matrices September 2013 40 / 74

slide-41
SLIDE 41

Pseudo-inverse matrix

Among the possible left- or right- inverses, two classes are important: right pseudo-inverse (m < n): A+

r = AT(AAT)−1

When ρ(A) = m, then (AAT)−1 exists. left pseudo-inverse (n < m): A+

ℓ = (ATA)−1AT

When ρ(A) = n, then (ATA)−1 exists; this particular left pseudo-inverse (ATA)−1AT is also known as the Moore-Penrose pseudo-inverse.

Basilio Bona (DAUIN) Matrices September 2013 41 / 74

slide-42
SLIDE 42

Moore-Penrose pseudo-inverse

In general, also if ATA is non invertible, it is always possible to define a Moore-Penrose pseudo-inverse A+ that satistfies the following relations: AA+A = A A+AA+ = A+ (AA+)T = AA+ (A+A)T = A+A (1)

Basilio Bona (DAUIN) Matrices September 2013 42 / 74

slide-43
SLIDE 43

Left and right pseudo-inverses

The two pseudo-inverses A+

r and A+ ℓ coincide with the traditional inverse

matrix A−1 when A is square and full-rank: A−1 = A+

r = A+ ℓ = A+

The linear transformation associated to A ∈ Rm×n y = Ax, with x ∈ Rn and y ∈ Rm, is equivalent to a system of m linear equations in n unknowns, whose coefficients are the elements of A; this linear system can admit one solution, no solution or an infinite number of solutions. If we use the pseudo-inverses to solve the linear system y = Ax, we must distinguish two cases, assuming that A has full rank.

Basilio Bona (DAUIN) Matrices September 2013 43 / 74

slide-44
SLIDE 44

Linear systems solution – 1

When n > m there are more unknowns than equations; among the infinite possible solutions x ∈ Rn, we choose the one with minimum norm x, given by x∗ = A+

d y = AT(AAT)−1y

All the other possible solutions of y = Ax are obtained as ¯ x = x∗ + v = A+

d y + v

where v ∈ N(A) is a vector belonging to the null space of A, with dimensions n − m. These other possible solutions can be expressed also as ¯ x = A+

d y + (I − A+ d A)w

where w ∈ Rn is a n × 1 generic vector. The matrix I − A+

d A projects w

  • n the null space of A, transforming w in v ∈ N(A); this matrix is called

projection matrix.

Basilio Bona (DAUIN) Matrices September 2013 44 / 74

slide-45
SLIDE 45

Figure: Solution of y = Ax when n > m.

Basilio Bona (DAUIN) Matrices September 2013 45 / 74

slide-46
SLIDE 46

Linear systems solution – 2

When m > n there are more equations than unknowns; no exact solutions exist for y = Ax, but only approximate solutions, with an error e = y − Ax = 0. Among these possible approximate solutions we choose that minimizing the norm of the error, i.e., ˆ x = arg min

x∈Rn y − Ax

The solution is ˆ x = A+

ℓ y = (ATA)−1ATy

Geometrically it is the orthogonal projection of y on the orthogonal complement of N(A), i.e., on the subspace N(A)⊥ = R(AT). The approximation error, also called projection error, is ˆ e = (I − AA+

s )y

and its norm is the lowest, as said above.

Basilio Bona (DAUIN) Matrices September 2013 46 / 74

slide-47
SLIDE 47

Figure: Solution of y = Ax when m > n.

Basilio Bona (DAUIN) Matrices September 2013 47 / 74

slide-48
SLIDE 48

Linear systems solution – 3

The similarity between the projection matrix I − A+

d A and the matrix that

gives the projection error I − AA+

s is important and will be studied when

projection matrices will be treated. In order to compute the generalized inverses, one can use the SVD. In particular, the pseudo-inverse is computed as A+ = V

  • Σ−1

r

O O O

  • UT = QΣ−1

r PT.

Basilio Bona (DAUIN) Matrices September 2013 48 / 74

slide-49
SLIDE 49

Projections and projection matrices

The geometrical concept of a projection of a segment on a plane can be extended and generalized to the elements of a vector space. This concept is important for the solution of a large number of problems, as approximation, estimation, prediction and filtering problems. Give a n-dimensional real vector space V(Rn) with dimensions, endowed with the scalar product, and a k ≤ n dimensional subspace W(Rk), it is possible to define the projection operator of vectors v ∈ V on the subspace W. The projection operator is the square projection matrix P ∈ Rn×n, whose columns are the projections of the base elements of V in W. A matrix is a projection matrix iff P2 = P i.e., is idempotent. The projection can be orthogonal or non orthogonal; in the first case P is symmetrical, in the second case it is generic. If P is a projection matrix, also I − P is a projection matrix.

Basilio Bona (DAUIN) Matrices September 2013 49 / 74

slide-50
SLIDE 50

Projection matrices

Some examples of projection matrices are those associated to the left pseudo-inverse P1 = AA+

s

e P2 = I − AA+

s

and to the right pseudo-inverse P3 = A+

d A

e P4 = I − A+

d A

From a geometrical point of view, P1 projects every vector v ∈ V in the range space R(A), while P2 projects v in its orthogonal complement R(A)⊥ = N(AT).

Basilio Bona (DAUIN) Matrices September 2013 50 / 74

slide-51
SLIDE 51

Matrix norm – 1

Similarly to what can be established for a vector, it is possible to provide a “measure” of the matrix, i.e., give its “magnitude”, defining the matrix norm. Since a matrix represents a linear transformation between vectors, the matrix norm measures how big this transformation is, but in some way, must “normalize” the result, to avoid that the magnitude of the transformed vector affects the norm; hence the following definition: A def = sup

x

Ax x = sup

x=1

Ax .

Basilio Bona (DAUIN) Matrices September 2013 51 / 74

slide-52
SLIDE 52

Matrix norm – 2

Given a square matrix A ∈ Rn×n, its norm must satisfy the following general (norm) axioms:

1

A > 0 for every A = O;

2

A = 0 iff A = O;

3

A + B ≤ A + B (triangular inequality);

4

αA = |α| A for any scalar α and any matrix A;

5

AB ≤ A B. Given A ∈ Rn×n and its eigenvalues {λi(A)}, the following inequality holds true 1

  • A−1

≤ |λi| ≤ A ∀i = 1, . . . , n

Basilio Bona (DAUIN) Matrices September 2013 52 / 74

slide-53
SLIDE 53

Matrix norm – 3

Taking only into account real matrices, the most used norms are: Spectral norm: A2 =

  • max

i

{λi(ATA)} Frobenius norm AF =

  • i
  • j

a2

ij =

  • tr ATA

Max singular value: Aσ =

  • max

i

{σi(A)}

Basilio Bona (DAUIN) Matrices September 2013 53 / 74

slide-54
SLIDE 54

Matrix norm – 4

1-norm or max-norm: A1 = max

j n

  • i=1

|aij| ∞-norm: A∞ = max

i n

  • j=1

|aij| In general, A2 = Aσ and A2

2 ≤ A1 A∞

Basilio Bona (DAUIN) Matrices September 2013 54 / 74

slide-55
SLIDE 55

Skew-symmetric matrices

Skew-symmetric matrix

A square matrix S is called skew-symmetric or antisymmetric when S + ST = O

  • r

S = −ST A skew-symmetric matrix has the following structure An×n =     s12 · · · s1n −s12 · · · s2n . . . . . . ... . . . −s1n −s2n · · ·     Therefore there it has at most n(n − 1) 2 independent elements.

Basilio Bona (DAUIN) Matrices September 2013 55 / 74

slide-56
SLIDE 56

Skew-symmetric matrices

For n = 3 it results n(n − 1) 2 = 3, hence an skew-symmetric matrix has as many element as a 3D vector v. Given a vector v = v1 v2 v3 T it is possible to build S, and given a matrix S it is possible to extract the associated vector v. We indicate this fact using the symbol S(v), where, by convention S(v) =   −v3 v2 v3 −v1 −v2 v1  

Basilio Bona (DAUIN) Matrices September 2013 56 / 74

slide-57
SLIDE 57

Skew-symmetric matrices

Some properties: Given any vector v ∈ R3: ST(v) = −S(v) = S(−v) Given two scalars λ1, λ2 ∈ R: S(λ1u + λ2v) = λ1S(u) + λ2S(v) Given any two vectors v, u ∈ R3: S(u)v = u × v = −v × u = S(−v)u = ST(v)u Therefore S(u) is the representation of the operator (u×) and viceversa.

Basilio Bona (DAUIN) Matrices September 2013 57 / 74

slide-58
SLIDE 58

Skew-symmetric matrices

The matrix S(u)S(u) = S2(u) is symmetrical and S2(u) = uuT − u2 I Hence the dyadic product D(u, u) = uuT = S2(u) + u2 I

Basilio Bona (DAUIN) Matrices September 2013 58 / 74

slide-59
SLIDE 59

Eigenvalues and eigenvectors of skew-symmetric matrices

Given an skew-symmetric matrix S(v), its eigenvalues are imaginary or zero. λ1 = 0, λ2,3 = ±j v The eigenvalue related to the eigenvector λ1 = 0 is v; the other two are complex conjugate. The set of skew-symmetric matrices is a vector space, denoted as so(3). Given two skew-symmetric matrices S1 and S2, we call commutator or Lie bracket the following operator [S1, S2] def = S1S2 − S2S1 that is itself skew-symmetric. Skew-symmetric matrices form a Lie algebra, which is related to the Lie group of orthogonal matrices.

Basilio Bona (DAUIN) Matrices September 2013 59 / 74

slide-60
SLIDE 60

Orthogonal matrices

A square matrix A ∈ Rn is called orthogonal when ATA =     α1 · · · α2 · · · . . . . . . ... . . . · · · αn     with αi = 0. A square orthogonal matrix U ∈ Rn is called orthonormal when all the constants αi are 1: UTU = UUT = I Therefore U−1 = UT

Basilio Bona (DAUIN) Matrices September 2013 60 / 74

slide-61
SLIDE 61

Orthonormal matrices

Other properties: The columns, as well as the rows, of U or orthogonal to each other and have unit norm. U = 1; The determinant of U has unit module: |det(U)| = 1 therefore it can be +1 or −1. Given a vector x, its orthonormal transformation is y = Ux.

Basilio Bona (DAUIN) Matrices September 2013 61 / 74

slide-62
SLIDE 62

Orthonormal matrices

If U is an orthonormal matrix, then AU = UA = A. Property in general valid also for unitary matrices, i.e., U∗U = I. When U ∈ R3×3, only 3 out of 9 elements are independent. Scalar product is invariant to orthonormal transformations, (Ux) · (Uy) = (Ux)T(Uy) = xTUTUy = xTy = x · y This means that vector lengths are invariant wrt orthonormal trasformations Ux = (Ux)T(Ux) = xTUTUx = xTIx = xTx = x

Basilio Bona (DAUIN) Matrices September 2013 62 / 74

slide-63
SLIDE 63

Orthonormal matrices

When considering orthonormal transformations, it is important to distinguish the two cases: When det(U) = +1, U represents a proper rotation or simply a rotation, when det(U) = −1, U represents an improper rotation or reflection. The set of rotations forms a continuous non-commutative (wrt product) group; the set of reflections do not have this “quality”. Intuitively this means that infinitesimal rotations exist, while infinitesimal reflections do not have any meaning. Reflections are the most basic transformation in 3D spaces, in the sense that translations, rotations and roto-reflections (slidings) are obtained from the composition of two or three reflections

Basilio Bona (DAUIN) Matrices September 2013 63 / 74

slide-64
SLIDE 64

Figure: Reflections.

Basilio Bona (DAUIN) Matrices September 2013 64 / 74

slide-65
SLIDE 65

Orthonormal matrices

If U is an orthonormal matrix, the distributive property wrt the cross product holds: U(x × y) = (Ux) × (Uy) (with general A matrices this is not true). For any proper rotation matrix U and a generic vector x the following holds US(x)UTy = U

  • x × (UTy)
  • =

(Ux) × (UUTy) = (Ux) × y = S(Ux)y where S(x) is the skew-symmetric matrix associated with x; therefore: US(x)UT = S(Ux) US(x) = S(Ux)U

Basilio Bona (DAUIN) Matrices September 2013 65 / 74

slide-66
SLIDE 66

Bilinear and quadratic forms

A bilinear form associated to the matrix A ∈ Rm×n is the scalar quantity defined as b(x, y) def = xTAy = yTATx A quadratic form associated to the square matrix A ∈ Rn×n is the scalar quantity defined as q(x) def = xTAx = xTATx Every quadratic form associated to a skew-symmetric matrix S(y) is identically zero xTS(y)x ≡ 0 ∀x Indeed, assuming w = S(y)x = y × x, one obtains xTS(y)x = xTw, but since, by definition, w is orthogonal to both y and x, the scalar product xTw will be always zero, and also the quadratic form at the left hand side.

Basilio Bona (DAUIN) Matrices September 2013 66 / 74

slide-67
SLIDE 67

Definite positive matrices – 1

Recalling the standard decomposition of a generic square matrix A in symmetric term As and an skew-symmetric one Aa, one concludes that the quadratic form depends only on the symmetric part of the matrix: q(x) = xTAx = xT(As + Aa)x = xTAsx A square matrix A is said to be positive definite if the associated quadratic form xTAx satisfies to the following conditions xTAx > 0 ∀x = 0 xTAx = 0 x = 0 A square matrix A is said to be positive semidefinite if the associated quadratic form xTAx satisfies to the following conditions xTAx ≥ 0 ∀x A square matrix A is said to be negative definite if −A is positive definite; similarly, a square matrix A is semidefinite negative if −A ` e semidefinite positive.

Basilio Bona (DAUIN) Matrices September 2013 67 / 74

slide-68
SLIDE 68

Definite positive matrices – 2

Often we use the following notations: definite positive matrix: A ≻ 0 semidefinite positive matrix: A 0 definite negative matrix: A ≺ 0 semidefinite negative matrix: A 0 A necessary but not sufficient condition for a square matrix A to bepositive definite is that the elements on its diagonal are all strictly positive. A necessary and sufficient condition for a square matrix A to be definite positive is that all its eigenvalues are strictly positive.

Basilio Bona (DAUIN) Matrices September 2013 68 / 74

slide-69
SLIDE 69

Sylvester criterion

The Sylvester criterion states that a square matrix A is positive definite iff all its principal minors are strictly positive. A definite positive matrix has full rank and is always invertible The associated quadratic form xTAx satisfies the following identity λmin(A) x2 ≤ xTAx ≤ λmax(A) x2 where λmin(A) and λmax(A) are, respectively, the minimum and the maximum eigenvalues.

Basilio Bona (DAUIN) Matrices September 2013 69 / 74

slide-70
SLIDE 70

Semidefinite matrix and rank

A semidefinite positive matrix An×n has rank ρ(A) = r < n, i.e., it has r strictly positive eigenvalues and n − r zero eigenvalues. The quadratic form sgoes to zero for every vector x ∈ N(A). Given a real matrix of generic dimensions Am×n, we have seen that both ATA and AAT are symmetrical; in addition we know that ρ(ATA) = ρ(AAT) = ρ(A) These matrices have all real, non negative eigenvalues, and therefore they are definite or semidefinite positive: in particular, if Am×n has full rank, then if m < n, ATA 0 and AAT ≻ 0, if m = n, ATA ≻ 0 and AAT ≻ 0, if m > n, ATA ≻ 0 and AAT 0.

Basilio Bona (DAUIN) Matrices September 2013 70 / 74

slide-71
SLIDE 71

Matrix derivatives – 1

If matrix A has its elements that are functions of a quantity x, one can define the matrix derivative wrt x as d dx A(x) :=

  • daij

dx

  • If x is the time t, one writes

d dt A(t) ≡ ˙ A(t) :=

  • daij(t)

dt

˙ aij

  • If A is a time function through the variable x(t), then

d dt A(x(t)) ≡ ˙ A(x(t)) :=

  • ∂aij(x)

∂x dx(t) dt

  • ∂aij(x)

∂x

  • ˙

x(t)

Basilio Bona (DAUIN) Matrices September 2013 71 / 74

slide-72
SLIDE 72

Matrix derivatives – 2

Given a vector-values scalar function φ(x) defined as φ(·) : Rn → R1, the gradient of the function φ wrt to x is a column vector ∇xφ = ∂φ ∂x :=      ∂φ(x) ∂x1 · · · ∂φ(x) ∂xn      i.e., ∇x :=      ∂ ∂x1 · · · ∂ ∂xn      = gradx If x(t) is a differentiable time function, then dφ(x) dt ≡ ˙ φ(x) = ∇T

x φ(x)˙

x (Notice the convention: the gradient for us is a column vector, although many textbooks assume it is a row vector)

Basilio Bona (DAUIN) Matrices September 2013 72 / 74

slide-73
SLIDE 73

Jacobian matrix

Given a m × 1 vector function f(x) = f1(x) · · · fm(x)T , x ∈ Rn, the Jacobian matrix (or simply the jacobian) is a m × n matrix defined as Jf(x) =       ∂f1(x) ∂x T · · · ∂fm(x) ∂x T       =        ∂f1(x) ∂x1 · · · ∂f1(x) ∂xn · · · ∂fi(x) ∂xj · · · ∂fm(x) ∂x1 · · · ∂fm(x) ∂xn        =   (gradxf1)T · · · (gradxfm)T   and if x(t) is a differentiable time function, then ˙ f(x) ≡ df(x) dt = df(x) dx ˙ x(t) = Jf(x)˙ x(t) Notice that the rows of Jf are the transpose of the gradients of the various functions.

Basilio Bona (DAUIN) Matrices September 2013 73 / 74

slide-74
SLIDE 74

Gradient

Given a bilinear form b(x, y) = xTAy, we call gradients the following vectors: gradient wrt x: gradxb(x, y) def = ∂b(x, y) ∂x = Ay gradient wrt y: gradyb(x, y) def = ∂b(x, y) ∂y = ATx Given the quadratic form q(x) = xTAx, we call gradient wrt x the following vector: ∇xq(x) ≡ gradxq(x) def = ∂q(x) ∂x = 2Ax

Basilio Bona (DAUIN) Matrices September 2013 74 / 74