Machine Learning for Signal Processing Fundamentals of Linear - - PowerPoint PPT Presentation

machine learning for signal
SMART_READER_LITE
LIVE PREVIEW

Machine Learning for Signal Processing Fundamentals of Linear - - PowerPoint PPT Presentation

Machine Learning for Signal Processing Fundamentals of Linear Algebra - 2 Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview Vectors and matrices Vector spaces Basic vector/matrix operations Various matrix


slide-1
SLIDE 1

Machine Learning for Signal Processing

Fundamentals of Linear Algebra - 2

Class 3. 8 Sep 2016 Instructor: Bhiksha Raj

11-755/18-797 1

slide-2
SLIDE 2

Overview

  • Vectors and matrices
  • Vector spaces
  • Basic vector/matrix operations
  • Various matrix types
  • Projections
  • More on matrix types
  • Matrix determinants
  • Matrix inversion
  • Eigenanalysis
  • Singular value decomposition
  • Matrix Calculus

11-755/18-797 2

slide-3
SLIDE 3

The importance of Bases

  • Conventional 3D representation

– Each point (vector) is just a triplet of coordinates – In reality, the coordinates are weights! – X = x.u1 + y.u2 + z.u3 – u1 = [0 0 1], u2 = [0 1 0], u3 = [1 0 0]

  • Unit vectors in each of the three directions

11-755/18-797 3

(x, y, z)

u1

x y z

u2 u3

slide-4
SLIDE 4

The importance of Bases

  • Specialty of u1, u2, u3

– Every point in the space can be expressed as some x.u1 + y.u2 + z.u3 – All three “bases” u1, u2, u3 are required

11-755/18-797 4

(x, y, z)

u1

x y z

u2 u3

𝒗1 = 1 𝒗2 = 1 𝒗3 = 1 𝒀 = 𝑦 𝑧 𝑨 = 𝑦 1 + 𝑧 1 + 𝑨 1

slide-5
SLIDE 5

The importance of Bases

  • Is there any other set v1, v2,.. vn which share

this property

– Any point can be expressed as a.v1 + b.v2 + c.v3 … – How many “v”s will we require

11-755/18-797 5

(a, b, c)

v1 v2 v3

𝑌 = 𝑏𝑾1+𝑐𝑾2+𝑑𝑾3

aV1 bV2 cV3

slide-6
SLIDE 6

u1 u2 u3 v1 v2 v3

Basis based representation

  • A “good” basis captures data structure
  • Here u1, u2 and u3 all take large values for data in

the set

  • But in the (v1 v2 v3) set, coordinate values along

v3 are always small for data on the blue sheet

– v3 likely represents a “noise subspace” for these data

11-755/18-797 6

slide-7
SLIDE 7

Basis based representation

  • The most important challenge in ML: Find the

best set of bases for a given data set

11-755/18-797 7

u1 u2 u3 v1 v2 v3

slide-8
SLIDE 8

Matrix as a Basis transform

  • A matrix transforms a representation in terms of a

standard basis u1 u2 u3 to a representation in terms of a different bases v1 v2 v3

  • Finding best bases: Find matrix that transforms

standard representation to these bases

11-755/18-797 8

𝑏 𝑐 𝑑 = 𝐔 𝑦 𝑧 𝑨 𝐘 = 𝑏𝐰1 + 𝑐𝐰2 + 𝑑𝐰3, 𝐘 = 𝑦𝐯1 + 𝑧𝐯2 + 𝑨𝐯3

slide-9
SLIDE 9
  • Going on to more mundane stuff..

11-755/18-797 9

slide-10
SLIDE 10

Orthogonal/Orthonormal vectors

  • Two vectors are orthogonal if they are perpendicular to one another

– A.B = 0 – A vector that is perpendicular to a plane is orthogonal to every vector on

  • Two vectors are orthonormal if

– They are orthogonal – The length of each vector is 1.0 – Orthogonal vectors can be made orthonormal by scaling their lengths to 1.0

11-755/18-797 10

         z y x A          w v u B .      zw yv xu B A

slide-11
SLIDE 11

Orthogonal matrices

  • Orthogonal Matrix : AAT = ATA = I

– The matrix is square – All row vectors are orthonormal to one another

  • Every vector is perpendicular to the hyperplane formed by all other vectors

– All column vectors are also orthonormal to one another – Observation: In an orthogonal matrix if the length of the row vectors is 1.0, the length of the column vectors is also 1.0 – Observation: In an orthogonal matrix no more than one row can have all entries with the same polarity (+ve or –ve)

11-755/18-797 11

            5 . 75 . 375 . 125 . 5 . 375 . 125 . 5 .

All 3 at 90o to

  • n another
slide-12
SLIDE 12

Orthogonal Matrices

  • Orthogonal matrices will retain the length and relative

angles between transformed vectors

– Essentially, they are combinations of rotations, reflections and permutations – Rotation matrices and permutation matrices are all orthogonal

11-755/18-797 12

q

Ax

slide-13
SLIDE 13

Orthogonal and Orthonormal Matrices

  • If the vectors in the matrix are not unit length, it cannot

be orthogonal

– AAT != I, ATA != I – AAT = Diagonal or ATA = Diagonal, but not both – If all the entries are the same length, we can get AAT = ATA = Diagonal, though

  • A non-square matrix cannot be orthogonal

– AAT=I or ATA = I, but not both

11-755/18-797 13

            5 . 75 . 375 . 125 . 5 . 1875 . 0675 . 1

slide-14
SLIDE 14

Matrix Rank and Rank-Deficient Matrices

  • Some matrices will eliminate one or more dimensions during

transformation

– These are rank deficient matrices – The rank of the matrix is the dimensionality of the transformed version of a full-dimensional object

11-755/18-797 14

P * Cone =

slide-15
SLIDE 15

Matrix Rank and Rank-Deficient Matrices

  • Some matrices will eliminate one or more dimensions during

transformation

– These are rank deficient matrices – The rank of the matrix is the dimensionality of the transformed version of a full-dimensional object

11-755/18-797 15

Rank = 2 Rank = 1

slide-16
SLIDE 16

Projections are often examples of rank-deficient transforms

11-755/18-797 16

 P = W (WTW)-1 WT ; Projected Spectrogram = P*M  The original spectrogram can never be recovered

 P is rank deficient

 P explains all vectors in the new spectrogram as a mixture of

  • nly the 4 vectors in W

 There are only a maximum of 4 linearly independent bases  Rank of P is 4

M = W =

slide-17
SLIDE 17

Non-square Matrices

  • Non-square matrices add or subtract axes

– More rows than columns  add axes

  • But does not increase the dimensionality of the dataaxes
  • May reduce dimensionality of the data

11-755/18-797 17

     

N N

y y y x x x . . . .

2 1 2 1

X = 2D data           6 . 9 . 1 . 9 . 8 . P = transform PX = 3D, rank 2

       

N N N

z z z y y y x x x ˆ . . ˆ ˆ ˆ . . ˆ ˆ ˆ . . ˆ ˆ

2 1 2 1 2 1

slide-18
SLIDE 18

Non-square Matrices

  • Non-square matrices add or subtract axes

– More rows than columns  add axes

  • But does not increase the dimensionality of the data

– Fewer rows than columns  reduce axes

  • May reduce dimensionality of the data

11-755/18-797 18

     

N N

y y y x x x ˆ . . ˆ ˆ ˆ . . ˆ ˆ

2 1 2 1

X = 3D data, rank 3

      1 1 5 . 2 . 1 3 .

P = transform PX = 2D, rank 2

         

N N N

z z z y y y x x x . . . . . .

2 1 2 1 2 1

slide-19
SLIDE 19

The Rank of a Matrix

  • The matrix rank is the dimensionality of the transformation of a full-

dimensioned object in the original space

  • The matrix can never increase dimensions

– Cannot convert a circle to a sphere or a line to a circle

  • The rank of a matrix can never be greater than the lower of its two

dimensions

11-755/18-797 19

      1 1 5 . 2 . 1 3 .           6 . 9 . 1 . 9 . 8 .

slide-20
SLIDE 20

The Rank of Matrix

11-755/18-797 20

 Projected Spectrogram = P * M

 Every vector in it is a combination of only 4 bases

 The rank of the matrix is the smallest no. of bases required to

describe the output

 E.g. if note no. 4 in P could be expressed as a combination of notes 1,2

and 3, it provides no additional information

 Eliminating note no. 4 would give us the same projection  The rank of P would be 3!

M =

slide-21
SLIDE 21

Matrix rank is unchanged by transposition

  • If an N-dimensional object is compressed to a

K-dimensional object by a matrix, it will also be compressed to a K-dimensional object by the transpose of the matrix

11-755/18-797 21

          86 . 44 . 42 . 9 . 4 . 1 . 8 . 5 . 9 .           86 . 9 . 8 . 44 . 4 . 5 . 42 . 1 . 9 .

slide-22
SLIDE 22

Matrix Determinant

  • The determinant is the “volume” of a matrix
  • Actually the volume of a parallelepiped formed from its

row vectors

– Also the volume of the parallelepiped formed from its column vectors

  • Standard formula for determinant: in text book

11-755/18-797 22 (r1) (r2) (r1+r2) (r1) (r2)

slide-23
SLIDE 23

Matrix Determinant: Another Perspective

  • The determinant is the ratio of N-volumes

– If V1 is the volume of an N-dimensional sphere “O” in N-dimensional space

  • O is the complete set of points or vertices that specify the object

– If V2 is the volume of the N-dimensional ellipsoid specified by A*O, where A is a matrix that transforms the space – |A| = V2 / V1

11-755/18-797 23

Volume = V1 Volume = V2

          7 . 9 . 7 . 8 . 8 . . 1 7 . 8 .

slide-24
SLIDE 24

Matrix Determinants

  • Matrix determinants are only defined for square matrices

– They characterize volumes in linearly transformed space of the same dimensionality as the vectors

  • Rank deficient matrices have determinant 0

– Since they compress full-volumed N-dimensional objects into zero- volume N-dimensional objects

  • E.g. a 3-D sphere into a 2-D ellipse: The ellipse has 0 volume (although it

does have area)

  • Conversely, all matrices of determinant 0 are rank deficient

– Since they compress full-volumed N-dimensional objects into zero-volume objects

11-755/18-797 24

slide-25
SLIDE 25

Multiplication properties

  • Properties of vector/matrix products

– Associative – Distributive – NOT commutative!!!

  • left multiplications ≠ right multiplications

– Transposition

11-755/18-797 25

฀ A (BC)  (A B)C

฀ A B  BA ฀ A (B  C)  A B  A C

 

T T T

A B B A   

slide-26
SLIDE 26

Determinant properties

  • Associative for square matrices

– Scaling volume sequentially by several matrices is equal to scaling

  • nce by the product of the matrices
  • Volume of sum != sum of Volumes
  • Commutative

– The order in which you scale the volume of an object is irrelevant

11-755/18-797 26

C B A C B A     

B A A B B A      C B C B    ) (

slide-27
SLIDE 27

Matrix Inversion

  • A matrix transforms an

N-dimensional object to a different N-dimensional

  • bject
  • What transforms the new
  • bject back to the original?

– The inverse transformation

  • The inverse transformation is

called the matrix inverse

11-755/18-797 28

1

? ? ? ? ? ? ? ? ?

            T Q

           7 . 9 . 7 . 8 . 8 . . 1 7 . 8 . T

slide-28
SLIDE 28

Matrix Inversion

  • The product of a matrix and its inverse is the

identity matrix

– Transforming an object, and then inverse transforming it gives us back the original object

11-755/18-797 29

T T-1

I T T D TD T   

  1 1

I TT D D TT   

  1 1

slide-29
SLIDE 29
  • Given the Transform T and transformed vector

Y, how do we determine X?

11-755/18-797 30

The Inverse Transform and Simultaneous Equations

T X Y

slide-30
SLIDE 30

Inverse Transform and Simultaneous Equation

  • Inverting the transform is identical to solving

simultaneous equations

11-755/18-797 31

𝑏 𝑐 𝑑 = 𝐔 𝑦 𝑧 𝑨 𝑏 = 𝑈

11𝑦 + 𝑈 12𝑧 + 𝑈 13𝑨

𝑐 = 𝑈21𝑦 + 𝑈22𝑧 + 𝑈23𝑨 𝑑 = 𝑈31𝑦 + 𝑈32𝑧 + 𝑈33𝑨

Given 𝑏 𝑐 𝑑 find 𝑦 𝑧 𝑨

𝐔= 𝑈

11

𝑈

12

𝑈

13

𝑈21 𝑈22 𝑈23 𝑈31 𝑈32 𝑈33

slide-31
SLIDE 31

Inverting rank-deficient matrices

  • Rank deficient matrices “flatten” objects

– In the process, multiple points in the original object get mapped to the same point in the transformed object

  • It is not possible to go “back” from the flattened object to the original
  • bject

– Because of the many-to-one forward mapping

  • Rank deficient matrices have no inverse

11-755/18-797 32

            75 . 433 . 433 . 25 . 1

slide-32
SLIDE 32

Inverse Transform and Simultaneous Equation

  • Inverting the transform is identical to solving

simultaneous equations

  • Rank-deficient transforms result in too-few equations

– Cannot be inverted to obtain a unique solution

11-755/18-797 33

𝑏 𝑐 = 𝐔 𝑦 𝑧 𝑨

Given 𝑏 𝑐 find 𝑦 𝑧 𝑨

𝐔= 𝑈

11

𝑈

12

𝑈

13

𝑈21 𝑈22 𝑈23 𝑏 = 𝑈

11𝑦 + 𝑈 12𝑧 + 𝑈 13𝑨

𝑐 = 𝑈21𝑦 + 𝑈22𝑧 + 𝑈23𝑨

slide-33
SLIDE 33

Inverse Transform and Simultaneous Equation

  • Inverting the transform is identical to solving simultaneous

equations

  • Rank-deficient transforms result in too-few equations

– Cannot be inverted to obtain a unique solution

  • Or too many equations

– Cannot be inverted to obtain an exact solution

11-755/18-797 34

𝑏 𝑐 𝑑 = 𝐔 𝑦 𝑧 𝐔= 𝑈

11

𝑈

12

𝑈21 𝑈22 𝑈31 𝑈32 𝑏 = 𝑈

11𝑦 + 𝑈 12𝑧

𝑐 = 𝑈21𝑦 + 𝑈22𝑧 𝑑 = 𝑈31𝑦 + 𝑈32𝑧

Given 𝑏 𝑐 𝑑 find 𝑦 𝑧

slide-34
SLIDE 34

Rank Deficient Matrices

11-755/18-797 35

 The projection matrix is rank deficient  You cannot recover the original spectrogram from the

projected one..

M =

slide-35
SLIDE 35

Revisiting Projections and Least Squares

  • Projection computes a least squared error estimate
  • For each vector V in the music spectrogram matrix

– Approximation: Vapprox = a.note1 + b.note2 + c.note3.. – Error vector E = V – Vapprox – Squared error energy for V e(V) = norm(E)2

  • Projection computes Vapprox for all vectors such that Total error is

minimized

  • But WHAT ARE “a” “b” and “c”?

11-755/18-797 36

           T

note1 note2 note3

           c b a T Vapprox

slide-36
SLIDE 36

The Pseudo Inverse (PINV)

  • We are approximating spectral vectors V as the

transformation of the vector [a b c]T

– Note – we’re viewing the collection of bases in T as a transformation

  • The solution is obtained using the pseudo inverse

– This give us a LEAST SQUARES solution

  • If T were square and invertible Pinv(T) = T-1, and V=Vapprox

11-755/18-797 37

           c b a T Vapprox            c b a T V

𝑏 𝑐 𝑑 = 𝑄𝑗𝑜𝑤 𝑈 𝑊

slide-37
SLIDE 37

Generalization to matrices

11-755/18-797 38

𝐘 = 𝐔𝐙 𝐙 = 𝐔−1𝐘 𝐘 = 𝐙𝐔 𝐙 = 𝐘𝐔−1

Left multiplication Right multiplication

𝐘 = 𝐔𝐙 𝐙 = Pinv(𝐔)𝐘 𝐘 = 𝐙𝐔 𝐙 = 𝐘Pinv(𝐔)

Left multiplication Right multiplication

  • Unique exact solution exists
  • T must be square
  • No unique exact solution exists

– At least one (if not both) of the forward and backward equations may be inexact

  • T may or may not be square
slide-38
SLIDE 38

The Pseudo Inverse

  • Case 1: Too many solutions
  • Pinv(X)Y picks the shortest solution

11-755/18-797 39

𝑏 𝑐 = 𝐔 𝑦 𝑧 𝑨 𝑏 = 𝑈

11𝑦 + 𝑈 12𝑧 + 𝑈 13𝑨

𝑐 = 𝑈

21𝑦 + 𝑈 22𝑧 + 𝑈 23𝑨

𝑦 𝑧 𝑨 = 𝑄𝑗𝑜𝑤(𝐔) 𝑏 𝑐 X Y Z Plane of solutions Shortest solution Figure only meant for illustration for the above equations, actual set of solutions is a line, not a

  • plane. Pinv(T)A will be the point on

the line closest to origin

slide-39
SLIDE 39

The Pseudo Inverse

  • Case 2: No exact solution
  • Pinv(X)Y picks the solution that results in the

lowest error

11-755/18-797 40

𝑦 𝑧 = 𝑄𝑗𝑜𝑤(𝐔) 𝑏 𝑐 𝑑 Figure only meant for illustration for the above equations, Pinv(T) will actually have 6 components. The error is a quadratic in 6 dimensions 𝑏 𝑐 𝑑 = 𝐔 𝑦 𝑧 𝐔= 𝑈

11

𝑈

12

𝑈21 𝑈22 𝑈31 𝑈32 𝑏 = 𝑈

11𝑦 + 𝑈 12𝑧

𝑐 = 𝑈21𝑦 + 𝑈22𝑧 𝑑 = 𝑈31𝑦 + 𝑈32𝑧 (Pinv(T))12 (Pinv(T))11 ||X – Pinv(T)A||2 “Optimal” Pinv(T)

slide-40
SLIDE 40

Explaining music with one note

11-755/18-797 41

Recap: P = W (WTW)-1 WT, Projected Spectrogram = P*M

Approximation: M = W*X

The amount of W in each vector = X = PINV(W).M

W.Pinv(W).M = Projected Spectrogram

W.Pinv(W) = Projection matrix!! M = W = PINV(W) = (WTW)-1WT X =PINV(W).M

slide-41
SLIDE 41

Explanation with multiple notes

11-755/18-797 42

 X = Pinv(W).M; Projected matrix = W.X = W.Pinv(W).M

M = W = X=PINV(W)M

slide-42
SLIDE 42

How about the other way?

11-755/18-797 43

 W = M Pinv(V) U = WV

M = W =

? ?

V = U =

slide-43
SLIDE 43

Pseudo-inverse (PINV)

  • Pinv() applies to non-square matrices
  • Pinv ( Pinv (A))) = A
  • A.Pinv(A)= projection matrix!

– Projection onto the columns of A

  • If A = K x N matrix and K > N, A projects N-D vectors

into a higher-dimensional K-D space

– Pinv(A) = NxK matrix – Pinv(A).A = I in this case

  • Otherwise A.Pinv(A) = I

11-755/18-797 44

slide-44
SLIDE 44

Finding the Transform

  • Given examples

– T.X1 = Y1 – T.X2 = Y2 – .. – T.XN = YN

  • Find T

11-755/18-797 45

slide-45
SLIDE 45

Finding the Transform

  • Pinv works here too

11-755/18-797 46

𝐘 = ↑ ⋮ ↑ 𝑌1 ⋱ 𝑌𝑂 ↓ ⋮ ↓ 𝐙 = ↑ ⋮ ↑ 𝑍

1

⋱ 𝑍

𝑂

↓ ⋮ ↓ 𝐘 𝐙

𝐙 = 𝐔𝐘 𝐔 = 𝐙Pinv(𝐘)

slide-46
SLIDE 46

Finding the Transform: Inexact

  • Even works for inexact solutions
  • We desire to find a linear transform T that maps X to Y

– But such a linear transform doesn’t really exist

  • Pinv will give us the “best guess” for T that minimizes the total

squared error between Y and TX

47

𝐘 = ↑ ⋮ ↑ 𝑌1 ⋱ 𝑌𝑂 ↓ ⋮ ↓ 𝐙 = ↑ ⋮ ↑ 𝑍

1

⋱ 𝑍

𝑂

↓ ⋮ ↓ 𝐘 𝐙

𝐙 ≈ 𝐔𝐘 𝐔 = 𝐙Pinv(𝐘)

minimizes ||𝑍

𝑗 − 𝑈𝑌𝑗||2 𝑗

slide-47
SLIDE 47

Matrix inversion (division)

  • The inverse of matrix multiplication

– Not element-wise division!!

  • Provides a way to “undo” a linear transformation

– Inverse of the unit matrix is itself – Inverse of a diagonal is diagonal – Inverse of a rotation is a (counter)rotation (its transpose!) – Inverse of a rank deficient matrix does not exist!

  • But pseudoinverse exists
  • For square matrices: Pay attention to multiplication side!
  • If matrix is not square, of the matrix is not invertible, use a

matrix pseudoinverse:

11-755/18-797 48

฀ A B  C, A  CB1, B  A 1 C

C A B B C A C B A      

 

, ,

slide-48
SLIDE 48

Eigenanalysis

  • If something can go through a process mostly

unscathed in character it is an eigen-something

– Sound example:

  • A vector that can undergo a matrix multiplication and

keep pointing the same way is an eigenvector

– Its length can change though

  • How much its length changes is expressed by its

corresponding eigenvalue

– Each eigenvector of a matrix has its eigenvalue

  • Finding these “eigenthings” is called eigenanalysis

11-755/18-797 49

slide-49
SLIDE 49

EigenVectors and EigenValues

  • Vectors that do not change angle upon

transformation

– They may change length – V = eigen vector – l = eigen value

50

V MV l 

         . 1 7 . 7 . 5 . 1 M

Black vectors are eigen vectors

11-755/18-797

slide-50
SLIDE 50

Eigen vector example

11-755/18-797 51

slide-51
SLIDE 51

Matrix multiplication revisited

  • Matrix transformation “transforms” the space

– Warps the paper so that the normals to the two vectors now lie along the axes

11-755/18-797 52

         2 . 1 1 . 1 07 . . 1 A

slide-52
SLIDE 52

A stretching operation

  • Draw two lines
  • Stretch / shrink the paper along these lines by factors l1

and l2

– The factors could be negative – implies flipping the paper

  • The result is a transformation of the space

11-755/18-797 53

1.4 0.8

slide-53
SLIDE 53

A stretching operation

11-755/18-797 54

 Draw two lines  Stretch / shrink the paper along these lines by factors l1

and l2

 The factors could be negative – implies flipping the paper

 The result is a transformation of the space

slide-54
SLIDE 54

A stretching operation

11-755/18-797 55

 Draw two lines  Stretch / shrink the paper along these lines by factors l1

and l2

 The factors could be negative – implies flipping the paper

 The result is a transformation of the space

slide-55
SLIDE 55

Physical interpretation of eigen vector

  • The result of the stretching is exactly the same as transformation by a

matrix

  • The axes of stretching/shrinking are the eigenvectors

– The degree of stretching/shrinking are the corresponding eigenvalues

  • The EigenVectors and EigenValues convey all the information about the

matrix

11-755/18-797 56

slide-56
SLIDE 56

Physical interpretation of eigen vector

  • The result of the stretching is exactly the same as transformation by a

matrix

  • The axes of stretching/shrinking are the eigenvectors

– The degree of stretching/shrinking are the corresponding eigenvalues

  • The EigenVectors and EigenValues convey all the information about the

matrix

11-755/18-797 57

 

1 2 1 2 1 

           V V M V V V l l

slide-57
SLIDE 57

Eigen Analysis

  • Not all square matrices have nice eigen values and

vectors

– E.g. consider a rotation matrix – This rotates every vector in the plane

  • No vector that remains unchanged
  • In these cases the Eigen vectors and values are complex

11-755/18-797 58

                      ' ' cos sin sin cos y x X y x X

new

q q q q

q

R

q

slide-58
SLIDE 58

Singular Value Decomposition

  • Matrix transformations convert circles to ellipses
  • Eigen vectors are vectors that do not change direction in the

process

  • There is another key feature of the ellipse to the left that carries

information about the transform

– Can you identify it?

11-755/18-797 59

         2 . 1 1 . 1 07 . . 1 A

slide-59
SLIDE 59

Singular Value Decomposition

  • The major and minor axes of the transformed ellipse

define the ellipse

– They are at right angles

  • These are transformations of right-angled vectors on

the original circle!

11-755/18-797 60

         2 . 1 1 . 1 07 . . 1 A

slide-60
SLIDE 60

Singular Value Decomposition

  • U and V are orthonormal matrices

– Columns are orthonormal vectors

  • S is a diagonal matrix
  • The right singular vectors in V are transformed to the left singular vectors

in U

– And scaled by the singular values that are the diagonal entries of S

11-755/18-797 61

         2 . 1 1 . 1 07 . . 1 A

matlab: [U,S,V] = svd(A) A = U S VT V1 V2 s1U1 s2U2

slide-61
SLIDE 61

Singular Value Decomposition

  • The left and right singular vectors are not the same

– If A is not a square matrix, the left and right singular vectors will be of different dimensions

  • The singular values are always real
  • The largest singular value is the largest amount by which a

vector is scaled by A

– Max (|Ax| / |x|) = smax

  • The smallest singular value is the smallest amount by which

a vector is scaled by A

– Min (|Ax| / |x|) = smin – This can be 0 (for low-rank or non-square matrices)

11-755/18-797 62

slide-62
SLIDE 62

The Singular Values

  • Square matrices: product of singular values = determinant of the matrix

– This is also the product of the eigen values – I.e. there are two different sets of axes whose products give you the area of an ellipse

  • For any “broad” rectangular matrix A, the largest singular value of any

square submatrix B cannot be larger than the largest singular value of A

– An analogous rule applies to the smallest singular value – This property is utilized in various problems, such as compressive sensing

63

s1U1 s2U1

11-755/18-797

slide-63
SLIDE 63

SVD vs. Eigen Analysis

  • Eigen analysis of a matrix A:

– Find two vectors such that their absolute directions are not changed by the transform

  • SVD of a matrix A:

– Find two vectors such that the angle between them is not changed by the transform

  • For one class of matrices, these two operations are the same

11-755/18-797 64

s1U1 s2U1

slide-64
SLIDE 64

A matrix vs. its transpose

  • Multiplication by matrix A:

– Transforms right singular vectors in V to left singular vectors U

  • Multiplication by its transpose AT:

– Transforms left singular vectors U to right singular vector V

  • A AT : Converts V to U, then brings it back to V

– Result: Only scaling

11-755/18-797 65

        1 1 . 7 . A

T

A

slide-65
SLIDE 65

Symmetric Matrices

  • Matrices that do not change on transposition

– Row and column vectors are identical

  • The left and right singular vectors are identical

– U = V – A = U S UT

  • They are identical to the Eigen vectors of the matrix
  • Symmetric matrices do not rotate the space

– Only scaling and, if Eigen values are negative, reflection

11-755/18-797 66

        1 7 . 7 . 5 . 1

slide-66
SLIDE 66

Symmetric Matrices

  • Matrices that do not change on transposition

– Row and column vectors are identical

  • Symmetric matrix: Eigen vectors and Eigen values are

always real

  • Eigen vectors are always orthogonal

– At 90 degrees to one another

11-755/18-797 67

        1 7 . 7 . 5 . 1

slide-67
SLIDE 67

Symmetric Matrices

  • Eigen vectors point in the direction of the

major and minor axes of the ellipsoid resulting from the transformation of a spheroid

– The eigen values are the lengths of the axes

11-755/18-797 68

        1 7 . 7 . 5 . 1

slide-68
SLIDE 68

Symmetric matrices

  • Eigen vectors Vi are orthonormal

– Vi

TVi = 1

– Vi

TVj = 0, i != j

  • Listing all eigen vectors in matrix form V

– VT = V-1 – VT V = I – V VT= I

  • M Vi = lVi
  • In matrix form : M V = V 

–  is a diagonal matrix with all eigen values

  • M = V  VT

11-755/18-797 69

slide-69
SLIDE 69

Square root of a symmetric matrix

11-755/18-797 70

T T

V Sqrt V C Sqrt V V C ). ( . ) (    

T T

V Sqrt V V Sqrt V C Sqrt C Sqrt ). ( . ). ( . ) ( ). (    C V V V Sqrt Sqrt V

T T

      ) ( ). ( .

slide-70
SLIDE 70

Definiteness..

  • SVD: Singular values are always positive!
  • Eigen Analysis: Eigen values can be real or imaginary

– Real, positive Eigen values represent stretching of the space along the Eigen vector – Real, negative Eigen values represent stretching and reflection (across origin) of Eigen vector – Complex Eigen values occur in conjugate pairs

  • A square (symmetric) matrix is positive definite if all Eigen

values are real and positive, and are greater than 0

– Transformation can be explained as stretching and rotation – If any Eigen value is zero, the matrix is positive semi-definite

11-755/18-797 71

slide-71
SLIDE 71

Positive Definiteness..

  • Property of a positive definite matrix: Defines

inner product norms

– xTAx is always positive for any vector x if A is positive definite

  • Positive definiteness is a test for validity of Gram

matrices

– Such as correlation and covariance matrices – We will encounter these and other gram matrices later

11-755/18-797 72

slide-72
SLIDE 72

SVD on data-container matrices

  • We can also perform SVD on matrices that are data containers
  • S is a d x N rectangular matrix

– N vectors of dimension d

  • U is an orthogonal matrix of d vectors of size d

– All vectors are length 1

  • V is an orthogonal matrix of N vectors of size N
  • S is a d x N diagonal matrix with non-zero entries only on diagonal

11-755/18-797 73

[ [

[ [

𝐘 = 𝑌1 𝑌2 ⋯ 𝑌𝑂 𝐘 = 𝐕𝐓𝐖T

slide-73
SLIDE 73

SVD on data-container matrices

11-755/18-797 74

[ [

[ [

𝐘 = 𝑌1 𝑌2 ⋯ 𝑌𝑂 𝐘 = 𝐕𝐓𝐖T 𝐘 = 𝐕 = 𝐖T = 𝐓 =

|Ui| = 1.0 for every vector in U |Vi| = 1.0 for every vector in V

slide-74
SLIDE 74

SVD on data-container matrices

11-755/18-797 75

𝐘 = 𝐕𝐓𝐖T = 𝐕 = 𝐖T = 𝐓 =

+ + …

𝐘 = 𝑡𝑗𝑉𝑗𝑊𝑗

𝑈 𝑗

slide-75
SLIDE 75

Expanding the SVD

  • Each left singular vector and the corresponding right singular vector

contribute on “basic” component to the data

  • The “magnitude” of its contribution is the corresponding singular

value

11-755/18-797 76

[ [

[ [

𝐘 = 𝑌1 𝑌2 ⋯ 𝑌𝑂 𝐘 = 𝐕𝐓𝐖T

...

4 4 4 3 3 3 2 2 2 1 1 1

    

T T T T

V U s V U s V U s V U s X

slide-76
SLIDE 76

Expanding the SVD

  • Each left singular vector and the corresponding right singular vector

contribute on “basic” component to the data

  • The “magnitude” of its contribution is the corresponding singular

value

  • Low singular-value components contribute little, if anything

– Carry little information – Are often just “noise” in the data

11-755/18-797 77

...

4 4 4 3 3 3 2 2 2 1 1 1

    

T T T T

V U s V U s V U s V U s X

slide-77
SLIDE 77

Expanding the SVD

  • Low singular-value components contribute little, if anything

– Carry little information – Are often just “noise” in the data

  • Data can be recomposed using only the “major” components with

minimal change of value

– Minimum squared error between original data and recomposed data – Sometimes eliminating the low-singular-value components will, in fact “clean” the data

11-755/18-797 78

...

4 4 4 3 3 3 2 2 2 1 1 1

    

T T T T

V U s V U s V U s V U s X

T T

V U s V U s

2 2 2 1 1 1

  X

slide-78
SLIDE 78

An audio example

  • The spectrogram has 974 vectors of dimension 1025

– A 1024x974 matrix!

  • Decompose: M = USVT = Si siUi Vi

T

  • U is 1024 x 1024
  • V is 974 x 974
  • There are 974 non-zero singular values Si

11-755/18-797 79

slide-79
SLIDE 79

Singular Values

  • Singular values for spectrogram M

– Most Singluar values are close to zero

– The corresponding components are “unimportant”

11-755/18-797 80

slide-80
SLIDE 80

An audio example

  • The same spectrogram constructed from only the 25

highest singular-value components

– Looks similar

  • With 100 components, it would be indistinguishable from the
  • riginal

– Sounds pretty close – Background “cleaned up”

11-755/18-797 81

slide-81
SLIDE 81

With only 5 components

  • The same spectrogram constructed from only the

5 highest-valued components

– Corresponding to the 5 largest singular values – Highly recognizable – Suggests that there are actually only 5 significant unique note combinations in the music

11-755/18-797 82

slide-82
SLIDE 82

Trace

  • The trace of a matrix is the sum of the

diagonal entries

  • It is equal to the sum of the Eigen values!

11-755/18-797 83

            

44 43 42 41 34 33 32 31 24 23 22 21 14 13 12 11

a a a a a a a a a a a a a a a a A

44 33 22 11

) ( a a a a A Tr    

i i i

a A Tr

,

) (

 

 

i i i i i

a A Tr l

,

) (

slide-83
SLIDE 83

Trace

  • Often appears in Error formulae
  • Useful to know some properties..

11-755/18-797 84

            

44 43 42 41 34 33 32 31 24 23 22 21 14 13 12 11

d d d d a a a d d d d d d d d d D             

44 43 42 41 34 33 32 31 24 23 22 21 14 13 12 11

c c c c c c c c c c c c c c c c C

C D E  

j i j i

E error

, 2 ,

) (

T

EE Tr error 

slide-84
SLIDE 84

Properties of a Trace

  • Linearity: Tr(A+B) = Tr(A) + Tr(B)

Tr(c.A) = c.Tr(A)

  • Cycling invariance:

– Tr (ABCD) = Tr(DABC) = Tr(CDAB) = Tr(BCDA) – Tr(AB) = Tr(BA)

  • Frobenius norm F(A) = Si,j aij

2 = Tr(AA T)

11-755/18-797 85

slide-85
SLIDE 85

Decompositions of matrices

  • Square A: LU decomposition

– Decompose A = L U – L is a lower triangular matrix

  • All elements above diagonal are 0

– R is an upper triangular matrix

  • All elements below diagonal are zero

– Cholesky decomposition: A is symmetric, L = UT

  • QR decompositions: A = QR

– Q is orthgonal: QQT = I – R is upper triangular

  • Generally used as tools to

compute Eigen decomposition or least square solutions

11-755/18-797 86

= =

slide-86
SLIDE 86

Calculus of Matrices

  • Derivative of scalar w.r.t. vector
  • For any scalar z that is a function of a vector x
  • The dimensions of dz / dx are the same as the

dimensions of x

11-755/18-797 87

          

N

x x 

1

x               

N

dx dz dx dz d dz 

1

x

N x 1 vector N x 1 vector

slide-87
SLIDE 87

Calculus of Matrices

  • Derivative of scalar w.r.t. matrix
  • For any scalar z that is a function of a matrix X
  • The dimensions of dz / dX are the same as

the dimensions of X

11-755/18-797 88

      

23 22 21 13 12 11

x x x x x x X             

23 22 21 13 12 11

dx dz dx dz dx dz dx dz dx dz dx dz d dz X

N x M matrix N x M matrix

slide-88
SLIDE 88

Calculus of Matrices

  • Derivative of vector w.r.t. vector
  • For any Mx1 vector y that is a function of an

Nx1 vector x

  • dy / dx is an MxN matrix

11-755/18-797 89

              

N M M N

dx dy dx dy dx dy dx dy d d     

1 1 1 1

x y

M x N matrix

          

N

x x 

1

x           

M

y y 

1

y

slide-89
SLIDE 89

Calculus of Matrices

  • Derivative of vector w.r.t. matrix
  • For any Mx1 vector y that is a function of an

NxL matrx X

  • dy / dX is an MxLxN tensor (note order)

11-755/18-797 90

 X y d d

M x 3 x 2 tensor

          

M

y y 

1

y       

23 22 21 13 12 11

x x x x x x X

(i,j,k)th element =

j k i

dx dy

,

slide-90
SLIDE 90

Calculus of Matrices

  • Derivative of matrix w.r.t. matrix
  • For any MxK vector Y that is a function of an

NxL matrx X

  • dY / dX is an MxKxLxN tensor (note order)

11-755/18-797 91

      

23 22 21 13 12 11

y y y y y y Y

(i,j)th element =

i j

dx dy

, 11

slide-91
SLIDE 91

In general

  • The derivative of an N1 x N2 x N3 x … tensor

w.r.t to an M1 x M2 x M3 x … tensor

  • Is an N1 x N2 x N3 x … x ML x ML-1 x … x M1

tensor

11-755/18-797 92

slide-92
SLIDE 92

Compound Formulae

  • Let Y = f ( g ( h ( X ) ) )
  • Chain rule (note order of multiplication)
  • The # represents a transposition operation

– That is appropriate for the tensor

11-755/18-797 93

)) ( ( )) ( ( ( ) ( )) ( ( ) (

# #

X X X X X X X Y h dg h g df dh h dg d dh d d 

slide-93
SLIDE 93

Example

  • y is N x 1
  • x is M x 1
  • A is N x M
  • Compute dz/dA

– On board

11-755/18-797 94

2

Ax y   z