Machine Learning for Signal Processing
Fundamentals of Linear Algebra - 2
Class 3. 8 Sep 2015 Instructor: Bhiksha Raj
11-755/18-797 1
Machine Learning for Signal Processing Fundamentals of Linear - - PowerPoint PPT Presentation
Machine Learning for Signal Processing Fundamentals of Linear Algebra - 2 Class 3. 8 Sep 2015 Instructor: Bhiksha Raj 11-755/18-797 1 Overview Vectors and matrices Basic vector/matrix operations Various matrix types
11-755/18-797 1
11-755/18-797 2
– A.B = 0 – A vector that is perpendicular to a plane is orthogonal to every vector on the plane
– They are orthogonal – The length of each vector is 1.0 – Orthogonal vectors can be made orthonormal by normalizing their lengths to 1.0
11-755/18-797 3
z y x A w v u B
. zw yv xu B A
– The matrix is square – All row vectors are orthonormal to one another
– All column vectors are also orthonormal to one another – Observation: In an orthogonal matrix if the length of the row vectors is 1.0, the length of the column vectors is also 1.0 – Observation: In an orthogonal matrix no more than one row can have all entries with the same polarity (+ve or –ve)
11-755/18-797 4
5 . 75 . 375 . 125 . 5 . 375 . 125 . 5 .
All 3 at 90o to
– Essentially, they are combinations of rotations, reflections and permutations – Rotation matrices and permutation matrices are all orthonormal
11-755/18-797 5
q
– AAT != I, ATA != I – AAT = Diagonal or ATA = Diagonal, but not both – If all the entries are the same length, we can get AAT = ATA = Diagonal, though
– AAT=I or ATA = I, but not both
11-755/18-797 6
5 . 75 . 375 . 125 . 5 . 1875 . 0675 . 1
transformation
– These are rank deficient matrices – The rank of the matrix is the dimensionality of the transformed version of a full-dimensional object
11-755/18-797 7
P * Cone =
– These are rank deficient matrices – The rank of the matrix is the dimensionality of the transformed version of a full-dimensional object
11-755/18-797 8
Rank = 2 Rank = 1
11-755/18-797 9
P = W (WTW)-1 WT ; Projected Spectrogram = P*M The original spectrogram can never be recovered
P is rank deficient
P explains all vectors in the new spectrogram as a mixture of
There are only a maximum of 4 linearly independent bases Rank of P is 4
M = W =
– More rows than columns add axes
11-755/18-797 10
N N
2 1 2 1
X = 2D data
6 . 9 . 1 . 9 . 8 .
P = transform PX = 3D, rank 2
N N N
z z z y y y x x x ˆ . . ˆ ˆ ˆ . . ˆ ˆ ˆ . . ˆ ˆ
2 1 2 1 2 1
– More rows than columns add axes
– Fewer rows than columns reduce axes
11-755/18-797 11
N N
2 1 2 1
X = 3D data, rank 3
1 1 5 . 2 . 1 3 .
P = transform PX = 2D, rank 2
N N N
z z z y y y x x x . . . . . .
2 1 2 1 2 1
dimensioned object in the original space
– Cannot convert a circle to a sphere or a line to a circle
dimensions
11-755/18-797 12
1 1 5 . 2 . 1 3 . 6 . 9 . 1 . 9 . 8 .
11-755/18-797 13
Projected Spectrogram = P * M
Every vector in it is a combination of only 4 bases
The rank of the matrix is the smallest no. of bases required to
E.g. if note no. 4 in P could be expressed as a combination of notes 1,2
and 3, it provides no additional information
Eliminating note no. 4 would give us the same projection The rank of P would be 3!
M =
11-755/18-797 14
86 . 44 . 42 . 9 . 4 . 1 . 8 . 5 . 9 . 86 . 9 . 8 . 44 . 4 . 5 . 42 . 1 . 9 .
– Also the volume of the parallelepiped formed from its column vectors
11-755/18-797 15 (r1) (r2) (r1+r2) (r1) (r2)
– If V1 is the volume of an N-dimensional sphere “O” in N-dimensional space
– If V2 is the volume of the N-dimensional ellipsoid specified by A*O, where A is a matrix that transforms the space – |A| = V2 / V1
11-755/18-797 16
Volume = V1 Volume = V2
7 . 9 . 7 . 8 . 8 . . 1 7 . 8 .
– They characterize volumes in linearly transformed space of the same dimensionality as the vectors
– Since they compress full-volumed N-dimensional objects into zero- volume N-dimensional objects
does have area)
– Since they compress full-volumed N-dimensional objects into zero-volume objects
11-755/18-797 17
– Associative – Distributive – NOT commutative!!!
– Transposition
11-755/18-797 18
T T T
– Scaling volume sequentially by several matrices is equal to scaling
– The order in which you scale the volume of an object is irrelevant
11-755/18-797 19
– The inverse transformation
11-755/18-797 20
1
? ? ? ? ? ? ? ? ?
T Q
7 . 9 . 7 . 8 . 8 . . 1 7 . 8 . T
11-755/18-797 21
T T-1
1 1
1 1
– In the process, multiple points in the original object get mapped to the same point in the transformed object
– Because of the many-to-one forward mapping
11-755/18-797 22
75 . 433 . 433 . 25 . 1
11-755/18-797 23
The projection matrix is rank deficient You cannot recover the original spectrogram from the
M =
– Approximation: Vapprox = a*note1 + b*note2 + c*note3.. – Error vector E = V – Vapprox – Squared error energy for V e(V) = norm(E)2
minimized
11-755/18-797 24
T
note1 note2 note3
c b a T Vapprox
– Note – we’re viewing the collection of bases in T as a transformation
– This give us a LEAST SQUARES solution
11-755/18-797 25
c b a T Vapprox c b a T V
V T PINV c b a * ) (
11-755/18-797 26
Recap: P = W (WTW)-1 WT, Projected Spectrogram = P*M
Approximation: M = W*X
The amount of W in each vector = X = PINV(W)*M
W*Pinv(W)*M = Projected Spectrogram
W*Pinv(W) = Projection matrix!! M = W = X =PINV(W)*M PINV(W) = (WTW)-1WT
11-755/18-797 27
X = Pinv(W) * M; Projected matrix = W*X = W*Pinv(W)*M
M = W = X=PINV(W)M
11-755/18-797 28
W = M Pinv(V) U = WV
M = W =
V = U =
11-755/18-797 29
– Not element-wise division!!
– Inverse of the unit matrix is itself – Inverse of a diagonal is diagonal – Inverse of a rotation is a (counter)rotation (its transpose!) – Inverse of a rank deficient matrix does not exist!
11-755/18-797 30
– Sound example:
– Its length can change though
– Each eigenvector of a matrix has its eigenvalue
11-755/18-797 31
32
. 1 7 . 7 . 5 . 1 M
Black vectors are eigen vectors
11-755/18-797
11-755/18-797 33
11-755/18-797 34
2 . 1 1 . 1 07 . . 1 A
– The factors could be negative – implies flipping the paper
11-755/18-797 35
1.4 0.8
11-755/18-797 36
Draw two lines Stretch / shrink the paper along these lines by factors l1
The factors could be negative – implies flipping the paper
The result is a transformation of the space
11-755/18-797 37
Draw two lines Stretch / shrink the paper along these lines by factors l1
The factors could be negative – implies flipping the paper
The result is a transformation of the space
matrix
– The degree of stretching/shrinking are the corresponding eigenvalues
matrix
11-755/18-797 38
matrix
– The degree of stretching/shrinking are the corresponding eigenvalues
matrix
11-755/18-797 39
1 2 1 2 1
11-755/18-797 40
' ' cos sin sin cos y x X y x X
new
q q q q
q
R
q
process
information about the transform
– Can you identify it?
11-755/18-797 41
2 . 1 1 . 1 07 . . 1 A
11-755/18-797 42
2 . 1 1 . 1 07 . . 1 A
– Columns are orthonormal vectors
in U
– And scaled by the singular values that are the diagonal entries of S
11-755/18-797 43
2 . 1 1 . 1 07 . . 1 A
matlab: [U,S,V] = svd(A) A = U S VT V1 V2 s1U1 s2U2
– If A is not a square matrix, the left and right singular vectors will be of different dimensions
– Max (|Ax| / |x|) = smax
– Min (|Ax| / |x|) = smin – This can be 0 (for low-rank or non-square matrices)
11-755/18-797 44
– This is also the product of the eigen values – I.e. there are two different sets of axes whose products give you the area of an ellipse
square submatrix B cannot be larger than the largest singular value of A
– An analogous rule applies to the smallest singular value – This property is utilized in various problems, such as compressive sensing
45
s1U1 s2U1
11-755/18-797
– Find two vectors such that their absolute directions are not changed by the transform
– Find two vectors such that the angle between them is not changed by the transform
11-755/18-797 46
s1U1 s2U1
11-755/18-797 47
1 1 . 7 . A
T
A
– Row and column vectors are identical
– U = V – A = U S UT
– Only scaling and, if Eigen values are negative, reflection
11-755/18-797 48
11-755/18-797 49
11-755/18-797 50
1 7 . 7 . 5 . 1
– Vi
TVi = 1
– Vi
TVj = 0, i != j
– VT = V-1 – VT V = I – V VT= I
– is a diagonal matrix with all eigen values
11-755/18-797 51
11-755/18-797 52
T T
T T
T T
– Real, positive Eigen values represent stretching of the space along the Eigen vector – Real, negative Eigen values represent stretching and reflection (across origin) of Eigen vector – Complex Eigen values occur in conjugate pairs
– Transformation can be explained as stretching and rotation – If any Eigen value is zero, the matrix is positive semi-definite
11-755/18-797 53
11-755/18-797 54
– C = (1/N) AAT
– Represents the directions in which the “energy” in the signal lies
vectors, C is the covariance matrix
– covariance = correlation + mean * meanT
– Represents the directions in which the “spread” of the signal lies
– Off diagonal elements represent how two components are related
11-755/18-797 55
A AT = C 1/NSia1,i
2
D 1/NSiak,iak,j
11-755/18-797 56
11-755/18-797 57
Any vector V = aV,1 * eigenvec1 + aV,2 *eigenvec2 + .. SV aV,i = eigenvalue(i)
11-755/18-797 58
values
vectors
11-755/18-797 59
dim dim 25 1
low reduced ted reconstruc reduced low reduced T
1025x1000 1025x1025 1025x25 25x1000 1025x1000 V = 1025x1025
11-755/18-797 60
T
– The average value of ai is proportional to the square root of the Eigenvalue – Ignoring these will not affect the composition of the spectrogram
11-755/18-797 61
Vec = a1 *eigenvec1 + a2 * eigenvec2 + a3 * eigenvec3 …
compose a least squares approximation to the spectrogram
11-755/18-797 62
reduced low reduced
dim 25 1
– Looks similar
– Sounds pretty close – But now sufficient to store 25 numbers per vector (instead of 1024)
11-755/18-797 63
dim low reduced ted reconstruc
– SVD: D = U S VT – DDT = U S VT V S UT = U S2 UT
– Show the directions of greatest importance
– Show the importance of the Eigen vector
11-755/18-797 67
decomposition
– All that is required if N < M
non-zero singular values are computed
11-755/18-797 68
. . = A U VT NxM NxN NxM MxM
– Strong statistical grounding – Can display complex interactions between the data – Can uncover irrelevant parts of the data we can throw out
– A set of elements to compactly describe our data – Indispensable for performing compression and classification
amazingly well
11-755/18-797 69
Eigenfaces Using a linear transform of the above “eigenvectors” we can compose various faces
11-755/18-797 70
44 43 42 41 34 33 32 31 24 23 22 21 14 13 12 11
44 33 22 11
i i i
,
i i i i i
,
11-755/18-797 71
44 43 42 41 34 33 32 31 24 23 22 21 14 13 12 11
44 43 42 41 34 33 32 31 24 23 22 21 14 13 12 11
j i j i
, 2 ,
T
2 = Tr(AA T)
11-755/18-797 72
11-755/18-797 73
11-755/18-797 74
N
1
N
1
N x 1 vector N x 1 vector
11-755/18-797 75
23 22 21 13 12 11
23 22 21 13 12 11
N x M matrix N x M matrix
11-755/18-797 76
N M M N
1 1 1 1
M x N matrix
N
1
M
1
11-755/18-797 77
M x 3 x 2 tensor
M
1
23 22 21 13 12 11
j k i
,
11-755/18-797 78
23 22 21 13 12 11
i j
, 11
11-755/18-797 79
11-755/18-797 80
# #
11-755/18-797 81