Appendix B Matrices and Matrix Algebra The interested readers may - - PDF document

appendix b matrices and matrix algebra
SMART_READER_LITE
LIVE PREVIEW

Appendix B Matrices and Matrix Algebra The interested readers may - - PDF document

Appendix B Matrices and Matrix Algebra The interested readers may find a complete treatment in several Italian [33, 40] or English textbooks; a classical one is [15], while for linear algebra my preference goes to the book of Strang [45]. His


slide-1
SLIDE 1

Appendix B Matrices and Matrix Algebra

The interested readers may find a complete treatment in several Italian [33, 40] or English textbooks; a classical one is [15], while for linear algebra my preference goes to the book of Strang [45]. His video lessons are available at MIT OpenCourseWare [50].

B.1 Definitions

A matrix is a set of N real or complex numbers organized in m rows and n columns, with N = mn A =     a11 a12 · · · a1n a21 a22 · · · a2n · · · · · · aij · · · am1 am2 · · · amn     ≡ [ aij ] i = 1, . . . , m j = 1, . . . , n A matrix is always written as a boldface capital letter as in A. To indicate matrix dimensions we use the following symbols Am×n Am×n A ∈ Fm×n A ∈ Fm×n where the field F is F = R when the matrix has real elements and F = C when the matrix has complex elements. Given a matrix Am×n we define a transpose matrix the matrix obtained exchang- ing rows and columns AT

n×m =

     a11 a21 · · · am1 a12 a22 · · · am2 . . . . . . ... . . . a1n a2n · · · amn      181

slide-2
SLIDE 2

182 Basilio Bona - Dynamic Modelling The following property holds (AT)T = A. A matrix is said to be square when m = n A square n × n matrix is upper triangular when aij = 0, ∀i > j An×n =      a11 a12 · · · a1n a22 · · · a2n . . . . . . ... . . . · · · ann      A square matrix is lower triangular if its transpose is upper triangular and vicev- ersa. AT

n×n =

     a11 · · · a12 a22 · · · . . . . . . ... . . . a1n a2n · · · ann      A real square matrix is said to be symmetric if A = AT → A − AT = O In a real symmetric matrix there are at most n(n + 1) 2 independent elements. If a matrix K has complex elements kij = aij + jbij (where j = √−1) its conjugate is K with elements kij = aij − jbij. Given a complex matrix K, an adjoint matrix K ∗ is defined, as the conjugate transpose K ∗ = K

T = K T

A complex matrix is called self-adjoint or hermitian when K = K ∗. Some textbooks indicate this matrix as K † or K H. A square matrix is diagonal if aij = 0 for i ̸= j An×n = diag(ai) =      a1 · · · a2 · · · . . . . . . ... . . . · · · an      A diagonal matrix is always symmetric and has at most n non zero elements. A square matrix is skew-symmetric or antisymmetric if A + AT = 0 → A = −AT Antisymmetric matrix properties will be described in Appendix B.6.

slide-3
SLIDE 3

Basilio Bona - Dynamic Modelling 183

B.1.1 Block matrix

It is possible to represent a matrix with blocks as A =    A11 · · · A1n . . . Aij . . . Am1 · · · Amn    where the blocks Aij have suitable dimensions. Given the following matrices A1 =    A11 · · · A1n O Aij . . . O O Amn    A2 =    A11 O O . . . Aij O Am1 · · · Amn    A3 =   A11 O O O Aij O O O Amn   A1 is upper block triangular, A2 is lower block triangular, and A3 is block diagonal

  • Example B.1.1

Given two matrices A and B A =   −1 3 7 −2 4 −5   ; B = [ 0 −5 4 −3 1 6 ] Matrix A is 3 × 2, while B is 2 × 3. The transpose matrices are AT = [−1 7 4 3 −2 −5 ] ; BT =   −3 −5 1 4 6   The matrix C =   −4 6 3 9 2 −5 7 −1 8   is square and its transpose is C T =   −4 9 7 6 2 −1 3 −5 8   The matrix D =   1 2 −3 2 4 5 −3 5 −6  

slide-4
SLIDE 4

184 Basilio Bona - Dynamic Modelling is symmetrical, since D − DT =   1 2 −3 2 4 5 −3 5 −6   −   1 2 −3 2 4 5 −3 5 −6   =     The matrix E =   1 2 3   is diagonal and symmetrical. The matrix F =   1 −4 −1 6 4 −6   is skew-symmetric, since F + F T =   1 −4 −1 6 4 −6   +   −1 4 1 −6 −4 6   =     The 4 × 4 matrix G =     1 2 3 4 5 6 7 8 9 1 2 3 4 5 6     can be written as the following block matrix G = [G11G12 G21G22 ] with G11 = [1 2 5 6 ] G21 = [3 4 7 8 ] G12 = [9 3 4 ] G22 = [1 2 5 6 ]

  • r

G11 =   1 2 3 5 6 7 9 1   G21 =   4 8 2   G12 = [ 3 4 5 ] G22 = [ 6 ]

slide-5
SLIDE 5

Basilio Bona - Dynamic Modelling 185

B.1.2 Matrix algebra

Matrices are elements of an algebra, i.e., a vector space together with a product

  • perator.

The main operations associated to the matrix algebra are: product by a scalar, sum, and matrix product. Product by a scalar αA = α      a11 a12 · · · a1n a21 a22 · · · a2n . . . . . . ... . . . am1 am2 · · · amn      =      αa11 αa12 · · · αa1n αa21 αa22 · · · αa2n . . . . . . ... . . . αam1 αam2 · · · αamn      Sum A + B =      a11 + b11 a12 + b12 · · · a1n + b1n a21 + b21 a22 + b22 · · · a2n + b2n . . . . . . ... . . . am1 + bm1 am2 + bm2 · · · amn + bmn      The matrix sum properties are: A + O = A existence of a null element A + B = B + A commutativity (A + B) + C = A + (B + C) distributivity (A + B)T = AT + BT The null (neutral, zero) element O takes the name of null matrix. The subtraction (difference) operation is defined using the scalar α = −1: A − B = A + (−1)B

  • Example B.1.2

Given the two square matrices A e B A =   −1 3 5 7 −2 8 1 3 −2   ; B =   −5 4 −3 1 6 9 −2 5  

slide-6
SLIDE 6

186 Basilio Bona - Dynamic Modelling and the two scalars α = −2; β = 5 Matrix C, obtained as a linear combination C = αA + βB is C = −2   −1 3 5 7 −2 8 1 3 −2   + 5   −5 4 −3 1 6 9 −2 5   =   2 −31 10 −29 9 14 43 −16 29  

  • Matrix product

The operation is performed using the well-known rule “rows by columns”: the generic element cij of the matrix product C m×p = Am×n · Bn×p is cij =

n

k=1

aikbkj The bi-linearity of the matrix product is guaranteed, since it is immediate to verify that, given a generic scalar α, the following identity holds: α(A · B) = (αA) · B = A · (αB) The properties of the matrix product are: A · B · C = (A · B) · C = A · (B · C) A · (B + C) = A · B + A · C (A + B) · C = A · C + B · C (A · B)T = BT · AT In general it is important to notice that

  • the matrix product is non-commutative: A·B ̸= B ·A, apart from particular

cases;

  • A · B = A · C does not imply B = C, apart from particular cases;
  • A · B = O does not imply A = O or B = O, apart from particular cases.

It is a common habit to omit the · sign of product, so that A · B ≡ AB

slide-7
SLIDE 7

Basilio Bona - Dynamic Modelling 187 The product between block matrices follows the general rules; for instance given A = [A11 A12 A21 A22 ] B = [B11 B12 B21 B22 ] The product AB yields AB = [A11B11 + A12B21 A11B12 + A12B22 A21B11 + A22B21 A21B12 + A22B22 ]

  • Example B.1.3

Given the two matrices A e B A =   −1 3 5 7 −2 8 1 3 −2   ; B =   −5 4 −3 1 6 9 −2 5   We compute C 1 = AB and C 2 = BA as: C 1 = AB =   36 −2 39 78 −53 56 −27 2 12   C 2 = BA =   −31 22 −48 16 7 −19 −18 46 19   Now we compute BTAT and see that it is equal to C T

1 :

BTAT =   36 78 −27 −2 −53 2 39 56 12   = C T

1

We present a case where AB = AC does not imply B = C. Given A =   1 1 −1 −2 2   B =   1 2 2 1 3   C =   1 1 1   We compute the product obtaining AC = AB =   1 1 −1 −2 2   We present a case where AB = O does not imply A = O or B = O. A =   1 1 −1 −2 2   ̸= O B =   1 2 1 2   ̸= O AB = O

slide-8
SLIDE 8

188 Basilio Bona - Dynamic Modelling

B.1.3 Special matrices

Some matrices have special structures Identity matrix A neutral element with respect to the product exists and is called identity matrix. It is a square n × n matrix written is a I n or simply I when no ambiguity arises I =      1 · · · 1 · · · . . . . . . ... . . . · · · 1      Given a rectangular matrix Am×n the following identities hold Am×n = I mAm×n = Am×nI n Idempotent matrix Given a square matrix A ∈ Rn×n, its k-th power is Ak =

k

ℓ=1

A A matrix is said to be idempotent if A2 = A ⇒ An = A

  • Example B.1.4

Given the A matrix A =   −1 3 5 7 −2 4 1 3 −2   we compute the power A3: A3 = AAA =   12 60 165 295 −68 25 −60 135 12  

slide-9
SLIDE 9

Basilio Bona - Dynamic Modelling 189 Matrix B B =   1 0.5 0.5 0.5 0.5   is idempotent since it results that BB = B2 =   1 0.5 0.5 0.5 0.5  

  • B.1.4

Trace

The trace of a square matrix An×n is the sum of its diagonal elements tr (A) =

n

k=1

akk The trace satisfies the following properties tr (αA + βB) = α tr (A) + β tr (B) tr (AB) = tr (BA) tr (A) = tr (AT) tr (A) = tr (T −1AT) for non singular T (for definition see page 194) where α, β are scalar parameters.

  • Example B.1.5

Given the two matrices A and B A =   −1 3 5 7 2 4 1 3 −2   ; B =   1 3 5 7 −2 4 1 3 −2   we compute tr (A): tr (A) = −1 + 2 − 2 = −1; tr B = 1 − 2 − 2 = −3 notice that tr (A + B) = tr   6 10 14 8 2 6 −4   = −4 = tr A + tr B

slide-10
SLIDE 10

190 Basilio Bona - Dynamic Modelling and that tr (AB) = tr   25 6 −3 25 29 35 20 −9 21   = 75 = tr (BA)   25 24 7 −17 29 19 18 3 21  

  • B.1.5

Minors and cofactors

A minor of a generic matrix A ∈ Rm×n is the determinant of a smaller square matrix,

  • btained from A by removing one or more of its rows or columns.

A minor of order p of a matrix A ∈ Rm×n is the determinant1 Dp of a square sub-matrix B ∈ Rp×p obtained cancelling any (m − p) rows and (n − p) columns of A ∈ Rm×n, or equivalently selecting p rows and p columns. There are as many minors as there are possible choices of p out of m rows and of p

  • ut of n columns.

Given a matrix A ∈ Rm×n, the principal minors of order k are the determinants Dk, with k = 1, · · · , min{m, n}, obtained selecting the first k rows and k columns

  • f A.

Minors obtained by removing just one row and one column from square matrices (called also principal or first minors) are required for calculating matrix cofactors, which are necessary for computing both the determinant and the inverse of square matrices. Given a square matrix A ∈ Rn×n, we indicate with A(ij) ∈ R(n−1)×(n−1) the sub- matrix obtained taking out the i-th row and the j-th column of A. We define the minor Drc of a generic element arc of a square matrix A ∈ Rn×n, the determinant of the matrix obtained deleting the r-th row and the c-th column, i.e., Drc = det A(rc). We define the cofactor of an element arc of a square matrix A ∈ Rn×n the product Arc = (−1)r+cDrc

  • Example B.1.6

1The formal definition of determinant will be presented in a subsequent Section.

slide-11
SLIDE 11

Basilio Bona - Dynamic Modelling 191 Given the 3 × 3 matrix A =   1 −3 5 7 2 4 −1 3 2   we compute a generic minor D1, for instance, that obtained removing the second row and the third column. The reduced matrix becomes the following B = [ 1 −3 −1 3 ] whose determinant is D1 = det(B) = 3 × 1 − (−3 × −1) = 0 Example B.1.7 Given the 3 × 4 matrix A =   1 −3 5 1 7 2 −4 2 −1 3 2 3   we compute the principal minors of order 3, D1, D2, D3. D1 = det(A1) = det [ 1 ] = 1 D2 = det(A2) = det [1 −3 7 2 ] = 1 × 2 − (−3 × 7) = 23 D3 = det(A3) = det   1 −3 5 7 2 −4 −1 3 2   = 1×(4+12)−3×(14−4)+5×(21+2) = 101 we notice that the principal minors of A are equal to the principal minors of AT. Example B.1.8 Given the 3 × 3 matrix A =   1 5 1 7 −4 2 −1 3 3   we compute the minor of a32: D32 = det A(32) = det [1 1 7 2 ] = 1 × 2 − (1 × 7) = −5 and the cofactor of a21 as: A21 = (−1)2+1 det A(21) = − det [5 1 3 3 ] = −(5 × 3 − 3 × 1) = −12

slide-12
SLIDE 12

192 Basilio Bona - Dynamic Modelling

B.1.6 Determinants

Once introduced the cofactor, the determinant of a square matrix A can be defined as a linear combination of elements and cofactors “by row”, i.e., choosing a generic row i det (A) =

n

k=1

aik(−1)i+k det (A(ik)) =

n

k=1

aikAik

  • r as a linear combination of elements and cofactors “by column” i.e., choosing a

generic column j det (A) =

n

k=1

akj(−1)k+j det (A(kj)) =

n

k=1

akjAkj Since these definition are recursive and assume the computation of determinants of smaller order minors, it is necessary to define the determinant of a 1×1 matrix, i.e., a scalar, that is simply det (aij) = aij. Properties of the determinant

  • det(AB) = det(A) det(B)
  • det(AT) = det(A)
  • det(kA) = kn det(A)
  • if one exchanges s rows or s columns of A, obtaining a new matrix As, we

have det(As) = (−1)s det(A)

  • if A has two equal or proportional rows/columns, we have det(A) = 0
  • if A has a row or a column that is a linear combination of other rows or

columns, we have det(A) = 0

  • if A `

e upper or lower triangular, we have det(A) = ∏n

i=1 aii

  • if A is block triangular, with p blocks Aii on the diagonal, we have det(A) =

∏p

i=1 det Aii

  • Example B.1.9
slide-13
SLIDE 13

Basilio Bona - Dynamic Modelling 193 Given the following 3 × 3 matrix A =   1 5 1 7 −4 2 −1 3 3   we compute its determinant as det(A) = (−1)(1+1)·[1]·det [−4 2 3 3 ] +(−1)(1+2)·[5]·det [ 7 2 −1 3 ] +(−1)(1+3)·[1]·det [ 7 −4 −1 3 ] = − 116 Given the following 3 × 3 matrices A =   1 5 1 7 −4 2 −1 3 3   and B =   1 −1 1 3 −4 2 −1 3 2   we compute the determinant of the product as the product of the two determinants det(A) = −116; det(B) = −1; det(AB) = det(BA) = 116 dove AB =   15 −18 13 −7 15 3 5 −2 11   ; BA =   −7 12 2 −27 37 1 18 −11 11  

  • B.1.7

Rank and singular matrix

We define the rank of matrix A ∈ Rm×n, the maximum integer ρ(A), such that at least a non zero minor Dp exists. The following properties hold:

  • ρ(A) ≤ min{m, n}
  • if ρ(A) = min{m, n}, A is said to have full rank
  • if ρ(A) < min{m, n}, the matrix does not have full rank and one says that

there is a fall of rank

  • ρ(AB) ≤ min{ρ(A), ρ(B)}
  • ρ(A) = ρ(AT)
slide-14
SLIDE 14

194 Basilio Bona - Dynamic Modelling

  • ρ(AAT) = ρ(ATA) = ρ(A)
  • if A ∈ Rn×n and det A < n then A has no full rank

A square matrix A is singular if its rank is not full, i.e., if det(A) = 0.

  • Example B.1.10

Given two square matrices A =   1 5 1 2 −4 2 −1 2 −1   and B =   1 −1 1 2 2 −2 2 4 −1 1 3 −3   their rank is ρ(A) = 2; ρ(B) = 2; ρ(AB) = 2 We note immediately that the first and third column of A are equal, so its rank is not full, while the second row of B is twice the first row; since the rank of B cannot be larger that 3, this linear dependence loers the rank to 2.

  • B.1.8

Adjoint matrix

Given a square matrix A ∈ Rn×n, the adjoint matrix is defined as the square matrix Adj(A) = {αij} whose elements are defined as αij = (−1)i+jDji namely, the matrix that has in its row i and column j the cofactor of the corre- sponding element aji of row j and column i.

  • Example B.1.11

Given two square matrices A = [1 5 2 −4 ] ; B =   1 −1 2 2 −2 2 −1 2 3  

slide-15
SLIDE 15

Basilio Bona - Dynamic Modelling 195 we compute their adjoint matrices H e G. h11 = (−1)1+1DA,11 = (1) × (−4) = −4 h12 = (−1)1+2DA,21 = (−1) × (5) = −5 h21 = (−1)2+1DA,12 = (−1) × (2) = −2 h22 = (−1)2+2DA,22 = (1) × (1) = 1 H = [−4 −5 −2 1 ] and g11 = (−1)1+1DB,11 = (1) × [(−6) + (−4)] = −10 g12 = (−1)1+2DB,21 = (−1) × [(−3) + (−4)] = 7 g13 = (−1)1+3DB,31 = (1) × [(−2) + (4)] = 2 g21 = (−1)2+1DB,12 = (−1) × [(6) + (2)] = −8 g22 = (−1)2+2DB,22 = (1) × [(3) + (2)] = 5 g23 = (−1)2+3DB,32 = (−1) × [(2) + (−4)] = 2 g31 = (−1)3+1DB,13 = (1) × [(4) + (−2)] = 2 g32 = (−1)3+2DB,23 = (−1) × [(2) + (−1)] = −1 g33 = (−1)3+3DB,33 = (1) × [(−2) + (2)] = 0 G =   −10 7 2 −8 5 2 2 −1  

  • B.1.9

Invertible matrix

Given a square matrix A ∈ Rn×n, it is invertible or nonsingular if an inverse matrix A−1 ∈ Rn×n exists, such that AA−1 = A−1A = I n The matrix is invertible iff ρ(A) = n, or it has full rank; this implies det(A) ̸= 0. The inverse matrix can be computed as A−1 = 1 det(A)Adj(A) The following properties hold: (A−1)−1 = A; (AT)−1 = (A−1)T. Given two square matrices A and B of equal dimension n×n, the following identity holds (AB)−1 = B−1A−1

slide-16
SLIDE 16

196 Basilio Bona - Dynamic Modelling An important results, called Inversion lemma, establish what follows: if A, C are square invertible matrices and B, D are matrices of suitable dimensions, then (A + BCD)−1 = A−1 − A−1B(DA−1B + C −1)−1DA−1 Matrix (DA−1B + C −1) must be invertible. The inversion lemma is useful for computing the inverse of a sum of matrices A1+A2, when A2 is decomposable into the product BCD and C is easily invertible, being, for instance, diagonal or triangular.

  • Example B.1.12

Let’s take the two matrices A e B in Example B.1.11 A = [1 5 2 −4 ] ; B =   1 −1 2 2 −2 2 −1 2 3   whose adjoints are known; since A−1 = 1 det(A)Adj(A); B−1 = 1 det(B)Adj(B); we compute det(A) = −14; det(B) = 2 to obtain A−1 = [0.2857 0.3571 0.1429 −0.0714 ] ; B−1 =   −5.0 3.5 1.0 −4.0 2.5 1.0 1.0 −0.5 0.0  

  • B.1.10

Similarity transformation

Given a square matrix A ∈ Rn×n and a non singular square matrix T ∈ Rn×n, the new matrix B ∈ Rn×n, obtained as B = T −1AT

  • r

B = TAT −1 is said to be similar to A, and the transformation T is called similarity trans- formation.

slide-17
SLIDE 17

Basilio Bona - Dynamic Modelling 197

B.2 Eigenvalues and eigenvectors

Considering the similarity transformation between A and Λ, A = U ΛU −1 where Λ = diag(λi), and U = [ u1 u2 · · · un ] Multiplying to the right A by U one obtains AU = U Λ that, given the matrix structures, implies the following identity Aui = λiui This identity is the well-known formula that relates the matrix eigenvalues to eigen- vectors; the constant quantities λi are the eigenvalues of A, while vectors ui are the eigenvectors of A, usually with non-unit norm. Given a square matrix An×n, the solutions λi (real or complex) of the characteristic equation P(λ)

def

= det(λI − A) = 0 are the eigenvalues of A. P(λ) is a polynomial in λ, called characteristic poly- nomial. If the eigenvalues are all distinct, the vectors ui that satisfy the identity Aui = λiui are the eigenvectors of A.

B.2.1 Generalized eigenvectors

If the eigenvalues are not all distinct, one obtains the so-called generalized eigen- values, whose characterization goes beyond the scope of these notes. From a geometrical point of view, the eigenvectors define those directions in Rn (i.e., the domain of the linear transformation represented by the matrix operator A) that are invariant with respect to the transformation A, while the eigenvalues provide the related “scale factors” along these directions. The set of eigenvalues of a matrix A will be indicated as Λ(A), or rather {λi(A)}; the set of eigenvectors of A will be indicated as {ui(A)}. In general, since the eigenvectors give the invariant directions of the transformation, they are represented up to a constant factor, so they are usually normalized; this is a implicit assumption that will be assumed in these notes, unless otherwise stated.

slide-18
SLIDE 18

198 Basilio Bona - Dynamic Modelling

B.2.2 Eigenvalues Properties

Given a matrix A and its eigenvalues {λi(A)}, the following holds true {λi(A + cI )} = {(λi(A) + c)} {λi(cA)} = {(cλi(A)} Given an upper or lower triangular matrix      a11 a12 · · · a1n a22 · · · a2n . . . . . . ... . . . · · · ann      ,      a11 · · · a21 a22 · · · . . . . . . ... . . . an1 an2 · · · ann      its eigenvalues are the elements on the diagonal, {λi(A)} = {aii}; the same applies for a diagonal matrix.

B.2.3 Invariance of the eigenvalues

Given a matrix An×n and its eigenvalues {λi(A)}, the following holds true det(A) =

n

i=1

λi and tr (A) =

n

i=1

λi Given a general invertible transformation, represented by the matrix T, the eigen- values of A are invariant to the similarity transformation B = T −1AT i.e., {λi(B)} = {λi(A)}

B.2.4 Modal matrix

If we build a matrix M , whose columns are the unit eigenvalues ui(A) of A M = [ u1 · · · un ]

slide-19
SLIDE 19

Basilio Bona - Dynamic Modelling 199 then the similarity transformation with respect to M results in a diagonal matrix Λ =      λ1 · · · λ2 · · · . . . . . . ... . . . · · · λn      = M −1AM M takes the name of modal matrix. If A is symmetric, its eigenvalues are all real and the following identity holds Λ = M TAM In this particular case M is orthonormal.2

B.3 Singular value decomposition – SVD

Given a generic matrix A ∈ Rm×n, having rank r = ρ(A) ≤ s, with s = min{m, n}, it can be factorized according to the Singular value decomposition in the fol- lowing way: A = U ΣV T =

s

i=1

σiuiv T

i

(B.1) the important elements of this decomposition are σi, ui and v i

  • σi(A) ≥ 0 are called singular values and are equal to the non-negative square

roots of the eigenvalues of the symmetric matrix ATA: {σi(A)} = { √ λi(ATA)} σi ≥ 0 listed in decreasing order σ1 ≥ σ2 ≥ · · · ≥ σs ≥ 0 if rank r < s there are only r positive singular values; the remaining ones are zero σ1 ≥ σ2 ≥ · · · ≥ σr > 0; σr+1 = · · · = σs = 0

  • U ∈ R(m×m) is an orthonormal square matrix

U = [ u1 u2 · · · um ] whose columns are the eigenvectors ui of AAT

2Orthonormal matrices will be described in a subsequent Section.

slide-20
SLIDE 20

200 Basilio Bona - Dynamic Modelling

  • V ∈ R(n×n) is a orthonormal square matrix

V = [ v 1 v 2 · · · v n ] whose columns are the eigenvectors v i of ATA

  • Σ ∈ R(m×n) is a rectangular matrix with the following structure

if m < n Σ = [ Σ s O ] if m = n Σ = Σ s if m > n Σ = [Σ s O ] Σ s ∈ R(s×s) = diag(σi) is diagonal, and its diagonal terms are the singular values Σ s =      σ1 · · · σ2 · · · . . . . . . ... . . . · · · σs      Otherwise we can decompose A to put in evidence the positive singular values alone: A = [ P ¯ P ]

  • U

[Σ r O O O ]

  • Σ

[QT ¯ Q

T

]

V T

= PΣ rQT where

  • P is a m × r orthonormal matrix; ¯

P is a m × (m − r) orthonormal matrix;

  • Q is a n × r orthonormal matrix, ¯

Q

T is a n × (n − r) orthonormal matrix;

  • Σ r is a r × r diagonal matrix where the positive singular values σi > 0,

i = 1, · · · , r ≤ s are the diagonal elements.

B.3.1 SVD and rank

The rank r of A is equal to the number r ≤ s of nonzero singular values. Given a generic matrix A ∈ Rm×n, the two matrices ATA and AAT are symmetrical, have the same positive singular values, and differ only for the number of zero singular values.

slide-21
SLIDE 21

Basilio Bona - Dynamic Modelling 201 Example B.3.1 Given the matrix A =   1 1 1 1   where m = 3, n = 2 and r = ρ(A) = s = 2. Its SVD decomposition gives A = U ΣV T where: U = [ u1 u2 u3 ] =   − √ 6/3 − √ 3/3 − √ 6/6 − √ 2/2 √ 3/3 − √ 6/6 √ 2/2 √ 3/3   Σ = [Σ s O ] =   √ 3 1   V = [ v 1 v 2 ] = [− √ 2/2 √ 2/2 − √ 2/2 − √ 2/2 ] if we compute the eigenvalues of ATA = [2 1 1 2 ] we obtain λ1 = 3, λ2 = 1 and notice that the elements on the diagonal of Σ s are exactly the square roots of the eigenvalues, i.e., σ1 = √λ1, σ2 = √λ2. We now apply the right end part of (B.1) obtaining σ1u1v T

1 + σ2u2v T 2 =

√ 3   − √ 6/3 − √ 6/6 − √ 6/6   [ − √ 2/2 − √ 2/2 ] + √ 1   − √ 2/2 √ 2/2   [√ 2/2 − √ 2/2 ] =   1.0 1.0 0.5 0.5 0.5 0.5   +   0.0 0.0 −0.5 0.5 0.5 −0.5   =   1 1 1 1   = A

slide-22
SLIDE 22

202 Basilio Bona - Dynamic Modelling

B.4 Linear Transformations

Given the notion of vector space, introduced in A.1, we can now define a linear transformation. Given two vector spaces X ⊆ Rn and Y ⊆ Rm, respectively with dimensions n and m, and given two generic vectors x ∈ X and y ∈ Y, a generic linear transformation between the two spaces can be represented by the matrix operator A ∈ Rm×n, as follows: y = Ax; x ∈ Rn; y ∈ Rm. Therefore a matrix can be always interpreted as a linear operator that transforms a vector from the domain space X to the image space Y. Conversely, any linear operator has at least one matrix that represents it.

B.4.1 Image space and null space

The image space or range (space) of a transformation A is the subspace Y defined by the following property: R(A) = {y | y = Ax, x ∈ X}; R(A) ⊆ Y The null space or kernel of a transformation A is the subspace of X defined by the following property: N(A) = {x | 0 = Ax, x ∈ X}; N(A) ⊆ X The null space contains all the vectors in X that are transformed into the null element (or origin) of Y. The dimensions of the range and kernel spaces are called, respectively, rank ρ(A) and nullity ν(A): ρ(A) = dim(R(A)); ν(A) = dim(N(A)). If X and Y have finite dimensions, the following equalities holds: N(A) = R(AT)⊥ R(A) = N(AT)⊥ N(A)⊥ = R(AT) R(A)⊥ = N(AT) where ⊥ indicates the orthogonal complement to the corresponding (sub-)space. We recall that {0}⊥ = R. The following orthogonal decomposition of subspaces X and Y hold X = N(A) ⊕ N(A)⊥ = N(A) ⊕ R(AT) Y = R(A) ⊕ R(A)⊥ = R(A) ⊕ N(AT) where the symbol ⊕ represents the direct sum operator between subspaces.

slide-23
SLIDE 23

Basilio Bona - Dynamic Modelling 203

B.4.2 Generalized inverse

Given a generic real matrix A ∈ Rm×n, with m ̸= n, the inverse matrix is not defined. Nevertheless, it is possible to define a class of matrices A−, called pseudo-inverses

  • r generalized inverses, that satisfy the following relation:

AA−A = A If A has full rank, i.e., ρ(A) = min{m, n}, it is possible to define two classes of generalized inverses

  • if m < n (i.e., ρ(A) = m), the right inverse of A is a matrix Ar ∈ Rn×m

such that A(A−A) = AAr = I m×m

  • is n < m (i.e., ρ(A) = n), the left inverse of A is a matrix Aℓ ∈ Rn×m such

that (AA−)A = AℓA = I n×n

B.4.3 Pseudo-inverse matrix

Among the possible left- or right- inverses, two classes are important:

  • right pseudo-inverse (m < n):

A+

r = AT(AAT)−1

When ρ(A) = m, then (AAT)−1 exists.

  • left pseudo-inverse (n < m):

A+

ℓ = (ATA)−1AT

When ρ(A) = n, then (ATA)−1 exists; this particular left pseudo-inverse (ATA)−1AT is also known as the Moore-Penrose pseudo-inverse.

B.4.4 Moore-Penrose pseudo-inverse

In general, also if ATA is non invertible, it is always possible to define a Moore- Penrose pseudo-inverse A+ that satistfies the following relations: (AA+)A = A A+(AA+) = A+ (AA+)T = AA+ (A+A)T = A+A (B.2)

slide-24
SLIDE 24

204 Basilio Bona - Dynamic Modelling

B.4.5 Left and right pseudo-inverses

When A is square and full-rank, the two pseudo-inverses A+

r and A+ ℓ , and the

Moore-Penrose pseudo-inverse coincide with the traditional inverse matrix A−1 : A+

r = A+ ℓ = A+ = A−1

B.4.6 Linear equation systems

The linear transformation associated to A ∈ Rm×n y = Ax, with x ∈ Rn and y ∈ Rm, is equivalent to a system of m linear equations in n unknowns, whose coefficients are the elements of A; this linear system may admit

  • ne solution, no solution or an infinite number of solutions.

If we use the pseudo-inverses to solve the linear system y = Ax, we must distinguish three cases, assuming that A has full rank.

  • n = m

There are as many unknowns as equations, and X, Y have the same dimensions. The inverse matrix exists, since A has full rank, so the unknown x is obtained as x = A−1y.

  • n > m

The dimension of the space of the unknowns X is larger than the dimension of Y, hence the system has more unknowns than equations; the system is undercon-

  • strained. Among the infinite possible solutions x ∈ Rn, we choose the one with

minimum norm ∥x∥, given by x ∗ = A+

r y = AT(AAT)−1y

All the other possible solutions of y = Ax are obtained as ¯ x = x ∗ + v = A+

r y + v

where v ∈ N(A) is a vector belonging to the null space of A, with dimensions n − m. These other possible solutions can be expressed also defining v = (I − A+

r A)w, so

that ¯ x = A+

r y + (I − A+ r A)w

slide-25
SLIDE 25

Basilio Bona - Dynamic Modelling 205 Figure B.1: Solution of y = Ax when m < n; in this case m = 1, n = 2. where w ∈ Rn is a n × 1 generic vector. The matrix I − A+

r A projects w on the null space of A, transforming w in

v ∈ N(A); this matrix is called projection matrix. An example of such linear transformation is sketched in Figure B.1.

  • m > n

The dimension of the space of the unknowns X is smaller than the dimension of Y, hence the system has more equations than unknowns; the system is overconstrained. No exact solutions exist for y = Ax, but only approximate solutions, with an error e = y − Ax ̸= 0. Among these possible approximate solutions we choose the one minimizing the norm of the error, i.e., ˆ x = arg min

x∈Rn ∥y − Ax∥

The solution is ˆ x = A+

ℓ y = (ATA)−1ATy

Geometrically ˆ x it is the orthogonal projection of y on the orthogonal comple- ment of N(A), i.e., on the subspace N(A)⊥ = R(AT). The approximation error, also called projection error, is ˆ e = (I − AA+

ℓ )y

and its norm is the lowest among all possible norms, as said above. An example of such linear transformation is sketched in Figure B.4.6.

slide-26
SLIDE 26

206 Basilio Bona - Dynamic Modelling Figure B.2: Solution of y = Ax when m > n. The similarity between the projection matrix I − A+

r A and the matrix that gives

the projection error I − AA+

ℓ is important and will be studied when projection

matrices will be treated. Aggiungere esempi sviluppati con Matlab In order to compute the generalized inverses, one can use the SVD. In particular, the pseudo-inverse is computed as A+ = V [ Σ −1

r

O O O ] U T = QΣ −1

r PT.

B.5 Projections and projection matrices

The geometrical concept of a projection of a vector on a plane can be extended and generalized to the elements of a vector space. This concept is important for the solution of a large number of problems, as approximation, estimation, prediction and filtering problems. Give a n-dimensional real vector space V(Rn), endowed with the scalar product, and a k ≤ n dimensional subspace W(Rk), it is possible to define the projection

  • perator of vectors v ∈ V on the subspace W.

The projection operator is the square projection matrix P ∈ Rn×n, whose columns are the projections of the base elements of V in W. A matrix is a projection matrix iff P2 = P i.e., is idempotent. Notice that the projection matrix is a square matrix, since the projected vector, although it belongs to a lesser dimensional space, it is nonetheless defined in Rn. The projection can be orthogonal or non orthogonal; in the first case P is symmet- rical, in the second case it is generic. If P is a projection matrix, also I − P is a projection matrix. Some examples of projection matrices are those associated to the left pseudo-inverse P1 = AA+

e P2 = I − AA+

slide-27
SLIDE 27

Basilio Bona - Dynamic Modelling 207 and to the right pseudo-inverse P3 = A+

r A

e P4 = I − A+

r A

From a geometrical point of view, P1 projects every vector v ∈ V in the range space R(A), while P2 projects v in its orthogonal complement R(A)⊥ = N(AT).

B.5.1 Matrix norm

Similarly to what can be established for a vector, it is possible to provide a “measure”

  • f the matrix, i.e., give its “magnitude”, defining the matrix norm.

Since a matrix represents a linear transformation between vectors, the matrix norm measures how “big” this transformation is, but in some way, must be “normalized”, to avoid the fact that the magnitude of the transformed vector affects the norm; hence the following definition: ∥A∥

def

= sup

∥x∥

∥Ax∥ ∥x∥ = sup

∥x∥=1

∥Ax∥ . Given a square matrix A ∈ Rn×n, its norm must satisfy the following general (norm) axioms:

  • 1. ∥A∥ > 0 for every A ̸= O;
  • 2. ∥A∥ = 0 iff A = O;
  • 3. ∥A + B∥ ≤ ∥A∥ + ∥B∥ (triangular inequality);
  • 4. ∥αA∥ = |α| ∥A∥ for any scalar α and any matrix A;
  • 5. ∥AB∥ ≤ ∥A∥ ∥B∥.

Given A ∈ Rn×n and its eigenvalues {λi(A)}, the following inequalities hold 1

  • A−1

≤ |λi| ≤ ∥A∥ ∀i = 1, . . . , n Taking into account only real matrices, the most used matrix norms are:

  • Spectral norm:

∥A∥2 = √ max

i {λi(ATA)}

  • Frobenius norm

∥A∥F = √∑

i

j

a2

ij =

√ tr ATA

slide-28
SLIDE 28

208 Basilio Bona - Dynamic Modelling

  • Max singular value:

∥A∥σ = √ max

i {σi(A)}

  • 1-norm or max-norm:

∥A∥1 = max

j n

i=1

|aij|

  • ∞-norm:

∥A∥∞ = max

i n

j=1

|aij| In general, ∥A∥2 = ∥A∥σ and ∥A∥2

2 ≤ ∥A∥1 ∥A∥∞

B.6 Antisymmetric matrices

A square matrix S is called antisymmetric or skew-symmetric when S + S T = O

  • r

S = −S T A skew-symmetric matrix has the following structure S n×n =      s12 · · · s1n −s12 · · · s2n . . . . . . ... . . . −s1n −s2n · · ·      Therefore there it has at most n(n − 1) 2 independent elements. For n = 3 it results n(n − 1) 2 = 3, hence an antisymmetric matrix has as many element as a 3D vector v. Given a vector v = [ v1 v2 v3 ]T it is possible to build S, and given a matrix S it is possible to extract the associated vector v. We indicate this fact using the symbol S(v), where, by convention S(v) =   −v3 v2 v3 −v1 −v2 v1   Some properties:

slide-29
SLIDE 29

Basilio Bona - Dynamic Modelling 209

  • Given any vector v ∈ R3:

S T(v) = −S(v) = S(−v)

  • Given two scalars λ1, λ2 ∈ R:

S(λ1u + λ2v) = λ1S(u) + λ2S(v) (B.3)

  • Given any two vectors v, u ∈ R3, it follows from simple inspection that

S(u)v = u × v = −v × u = S(−v)u = S T(v)u Therefore S(u) is the representation of the cross product operator (u×) and viceversa. The matrix S(u)S(u) = S 2(u) is symmetrical and S 2(u) = uuT − ∥u∥2 I Hence we can define the dyadic product as D(u, u) = uuT = S 2(u) + ∥u∥2 I Another property that will be used to characterize the energy in a rigid body is the following: any quadratic form associated to a skew-symmetric matrix is identically zero, i.e., x TS(u)x ≡ 0 (B.4) for any x and any u. The proof is quite simple: let us define w = S(u)x = u × x, we will have x TS(u)x = x Tw, but since w is orthogonal both to u and x, the scalar prod- uct is zero, x Tw = 0. If R is an orthonormal/rotation matrix, the distributive property with respect to the external product holds3: R(v × u) = (Rv) × (Ru) (B.5) For any orthonormal matrix R and any vector v, the above properties allow to state the following relations RS(v)RTu = R ( v × (RTu) ) = (Rv) × (RRTu) = (Rv) × u = S(Rv)u (B.6) Since they are valid for any generic u, we obtain RS(v)RT = S(Rv) RS(v) = S(Rv)R (B.7)

3 This property is not always true, but when R is orthogonal, it holds.

slide-30
SLIDE 30

210 Basilio Bona - Dynamic Modelling

B.6.1 Eigenvalues and eigenvectors of antisymmetric matri- ces

Given an antisymmetric matrix S(v) ∈ R3, its eigenvalues are imaginary or zero. λ1 = 0, λ2,3 = ±j ∥v∥ The eigenvalue related to the eigenvector λ1 = 0 is v; the other two are complex conjugate. The set of antisymmetric matrices is a vector space, denoted as so(3). Antisymmetric matrices form a Lie algebra, which is related to the Lie group of

  • rthogonal matrices.

Given two antisymmetric matrices S 1 and S 2, we call commutator or Lie bracket the following operator [S 1, S 2]

def

= S 1S 2 − S 2S 1 that is itself antisymmetric.

B.6.2 Symmetric-antisymmetric factorization

Given a square matrix A, it is always possible to factor it in a sum of two matrices, as follows: A = As + Aa where As = 1 2(A + AT) symmetric matrix Aa = 1 2(A − AT) skew-symmetric matrix

  • Example B.6.1

Given the following vectors v 1 = [ 1 2 3 ]T v 2 = [ 3 2 1 ] we have the skew-symmetric matrices S 1(v 1) =   −3 2 3 −1 −2 1   S 2(v 2) =   −1 2 1 −3 −2 3  

slide-31
SLIDE 31

Basilio Bona - Dynamic Modelling 211 Now we compute the commutator [S 1, S 2] = S 1S 2 − S 2S 1 =   −7 6 9 2 −6 6 1 2 −7   −   −7 2 1 6 −6 2 9 6 −7   =   4 8 −4 4 −8 −4  

  • B.7

Orthogonal matrices

A square matrix A ∈ Rn is called orthogonal when ATA =      α1 · · · α2 · · · . . . . . . ... . . . · · · αn      with αi ̸= 0.

B.7.1 Orthonormal matrices

A square orthogonal matrix U ∈ Rn is called orthonormal when all the constants αi are 1: U TU = U U T = I Therefore U −1 = U T Other properties:

  • The columns, as well as the rows, of U or orthogonal to each other and have

unit norm.

  • ∥U ∥ = 1;
  • The determinant of U has unit module:

|det(U )| = 1 therefore it can be +1 or −1.

  • Given a vector x, its orthonormal transformation is y = U x.
slide-32
SLIDE 32

212 Basilio Bona - Dynamic Modelling If U is an orthonormal matrix, then ∥AU ∥ = ∥U A∥ = ∥A∥. When U ∈ R3×3, only 3 out of 9 elements are independent. Scalar product is invariant to orthonormal transformations, (U x) · (U y) = (U x)T(U y) = x TU TU y = x Ty = x · y This means that vector lengths are invariant with respect to orthonormal trasfor- mations ∥U x∥ = (U x)T(U x) = x TU TU x = x TI x = x Tx = ∥x∥ When considering orthonormal transformations, it is important to distinguish the two cases:

  • When det(U ) = +1, U represents a proper rotation or simply a rotation,
  • When det(U ) = −1, U represents an improper rotation or reflection.

The set of rotations forms a continuous non-commutative group (with respect to the product); the set of reflections do not have this “quality”; intuitively this means that infinitesimal rotations exist, while infinitesimal reflections are not defined. Reflections are the most basic transformation in 3D spaces, in the sense that trans- lations, rotations and roto-reflections (slidings) are obtained from the composition

  • f two or three reflections, as shown in Figure B.3, where it is shown that a com-

position of two reflections with respect to intersecting axes gives a rotation, while a composition of two reflections with respect to parallel axes results in a translation. Figure B.3: Composition of reflections give both rotations and translations.

slide-33
SLIDE 33

Basilio Bona - Dynamic Modelling 213 If U is an orthonormal matrix, the distributive property with respect to the cross product holds: U (x × y) = (U x) × (U y) while this property does not hold with generic square A matrices . For any proper rotation matrix U and a generic vector x the following holds U S(x)U Ty = U ( x × (U Ty) ) = (U x) × (U U Ty) = (U x) × y = S(U x)y where S(x) is the antisymmetric matrix associated with x; therefore: U S(x)U T = S(U x) U S(x) = S(U x)U

B.8 Bilinear and quadratic forms

A bilinear form associated to the matrix A ∈ Rm×n is the scalar quantity defined as b(x, y)

def

= x TAy = y TATx A quadratic form associated to the square matrix A ∈ Rn×n is the scalar quantity defined as q(x)

def

= x TAx = x TATx Every quadratic form associated to a skew-symmetric matrix S(y) is identically zero x TS(y)x ≡ 0 ∀x Indeed, assuming w = S(y)x = y ×x, one obtains x TS(y)x = x Tw, but since, by definition, w is orthogonal to both y and x, the scalar product x Tw will be always zero, and also the quadratic form at the left hand side.

B.8.1 Definite positive matrices

Recalling the standard decomposition of a generic square matrix A in symmetric terms As and anti-symmetric terms Aa, one concludes that the quadratic form depends only on the symmetric part of the matrix: q(x) = x TAx = x T(As + Aa)x = x TAsx A square matrix A is said to be positive definite if the associated quadratic form x TAx satisfies to the following conditions x TAx > 0 ∀x ̸= 0 x TAx = 0 x = 0

slide-34
SLIDE 34

214 Basilio Bona - Dynamic Modelling A square matrix A is said to be positive semidefinite if the associated quadratic form x TAx satisfies to the following conditions x TAx ≥ 0 ∀x A square matrix A is said to be negative definite if −A is positive definite; similarly, a square matrix A is semidefinite negative if −A ` e semidefinite positive. Often we use the following notations: definite positive matrix: A ≻ 0 semidefinite positive matrix: A ≽ 0 definite negative matrix: A ≺ 0 semidefinite negative matrix: A ≼ 0 A necessary but not sufficient condition for a square matrix A to be positive definite is that the elements on its diagonal are all strictly positive. A necessary and sufficient condition for a square matrix A to be definite positive is that all its eigenvalues are strictly positive.

B.8.2 Sylvester criterion

The Sylvester criterion states that a square matrix A is positive definite iff all its principal minors are strictly positive. A definite positive matrix has full rank and is always invertible. The associated quadratic form x TAx satisfies the following identity λmin(A) ∥x∥2 ≤ x TAx ≤ λmax(A) ∥x∥2 where λmin(A) and λmax(A) are, respectively, the minimum and the maximum eigen- values.

B.8.3 Semidefinite matrices and rank

A semidefinite positive matrix An×n has rank ρ(A) = r < n, i.e., it has r strictly positive eigenvalues and n − r zero eigenvalues. The quadratic form goes to zero for every vector x ∈ N(A). Given a real matrix of generic dimensions Am×n, we have seen that both ATA and AAT are symmetrical; in addition we know that ρ(ATA) = ρ(AAT) = ρ(A) These matrices have all real non negative eigenvalues, and therefore they are definite

  • r semidefinite positive: in particular, if Am×n has full rank, then
slide-35
SLIDE 35

Basilio Bona - Dynamic Modelling 215

  • if m < n, ATA ≽ 0 and AAT ≻ 0,
  • if m = n, ATA ≻ 0 and AAT ≻ 0,
  • if m > n, ATA ≻ 0 and AAT ≽ 0.

B.9 Matrix derivatives

If a matrix A(t) is composed of elements aij(t) that are all differentiable with respect to a parameter t, then the matrix derivative is d dtA(t) = [daij(t) dt ] If t is time, the matrix time derivative is also written as d dtA(t) = ˙ A(t) = [˙ aij(t)] If a square matrix A(t) has rank ρ(A(t)) = n for any time t, then the derivative of its inverse is d dtA(t)−1 = −A−1(t) ˙ A(t)A(t)−1 Since the inverse operator is a nonlinear operator, in general it results that [dA(t) dt ]−1 ̸= d dt [ A(t)−1]

  • Example B.9.1

Given the matrix A(t) A(t) = [2 t t t2 ] ; its derivative with respect to t is d dtA(t) = [0 1 1 t ] We compute its inverse A−1 A−1 = 1 t2 [ t2 −t −t 2 ] = [ 1 −1/t −1/t 2/t2 ]

slide-36
SLIDE 36

216 Basilio Bona - Dynamic Modelling We compute d dtA(t)−1 d dtA(t)−1 = − [ 1 −1/t −1/t 2/t2 ] [0 1 1 t ] [ 1 −1/t −1/t 2/t2 ] = [ 0 1/t2 1/t2 −4/t3 ] and we can see that it is different from ( d dtA(t) )−1 = [−2t 1 1 ]

  • If A is a time function through the variable x(t), then

d dtA(x(t)) ≡ ˙ A(x(t))

def

= [ ∂aij(x) ∂x dx(t) dt ] ≡ [ ∂aij(x) ∂x ] ˙ x(t) Given a vector-values scalar function φ(x) defined as φ(·) : Rn → R1, the gradient

  • f the function φ with respect to to x is a column vector

∇xφ = ∂φ ∂x

def

=      ∂φ(x) ∂x1 · · · ∂φ(x) ∂xn      i.e., ∇x

def

=      ∂ ∂x1 · · · ∂ ∂xn      = gradx If x(t) is a differentiable time function, then dφ(x) dt ≡ ˙ φ(x) = ∇T

x φ(x) ˙

x (Notice the convention: the gradient for us is a column vector, although many textbooks assume it is a row vector)

B.9.1 Gradient

Given a bilinear form b(x, y) = x TAy, we call gradients the following vectors: gradient with respect to x: gradxb(x, y)

def

= ∂b(x, y) ∂x = Ay gradient with respect to y: gradyb(x, y)

def

= ∂b(x, y) ∂y = ATx Given the quadratic form q(x) = x TAx, we call gradient with respect to x the following vector: ∇xq(x) ≡ gradxq(x)

def

= ∂q(x) ∂x = 2Ax

slide-37
SLIDE 37

Basilio Bona - Dynamic Modelling 217

B.9.2 Jacobian matrix

Given a m×1 vector function f (x) = [ f1(x) · · · fm(x) ]T , x ∈ Rn, the Jacobian matrix (or simply the jacobian) is a m × n matrix defined as J f (x) =       (∂f1(x) ∂x )T · · · (∂fm(x) ∂x )T       =        ∂f1(x) ∂x1 · · · ∂f1(x) ∂xn · · · ∂fi(x) ∂xj · · · ∂fm(x) ∂x1 · · · ∂fm(x) ∂xn        =   (gradxf 1)T · · · (gradxf m)T   and if x(t) is a differentiable time function, then ˙ f (x) ≡ df (x) dt = df (x) dx ˙ x(t) = J f (x) ˙ x(t) Notice that the rows of J f are the transpose of the gradients of the various func- tions.