Machine Learning for Signal Processing
Fundamentals of Linear Algebra - 2
Class 3. 8 Sep 2016 Instructor: Bhiksha Raj
11-755/18-797 1
Machine Learning for Signal Processing Fundamentals of Linear - - PowerPoint PPT Presentation
Machine Learning for Signal Processing Fundamentals of Linear Algebra - 2 Class 3. 8 Sep 2016 Instructor: Bhiksha Raj 11-755/18-797 1 Overview Vectors and matrices Vector spaces Basic vector/matrix operations Various matrix
11-755/18-797 1
11-755/18-797 2
11-755/18-797 3
(x, y, z)
u1
x y z
u2 u3
11-755/18-797 4
(x, y, z)
u1
x y z
u2 u3
𝒗1 = 1 𝒗2 = 1 𝒗3 = 1 𝒀 = 𝑦 𝑧 𝑨 = 𝑦 1 + 𝑧 1 + 𝑨 1
11-755/18-797 5
(a, b, c)
v1 v2 v3
𝑌 = 𝑏𝑾1+𝑐𝑾2+𝑑𝑾3
aV1 bV2 cV3
u1 u2 u3 v1 v2 v3
– v3 likely represents a “noise subspace” for these data
11-755/18-797 6
11-755/18-797 7
u1 u2 u3 v1 v2 v3
standard basis u1 u2 u3 to a representation in terms of a different bases v1 v2 v3
standard representation to these bases
11-755/18-797 8
𝑏 𝑐 𝑑 = 𝐔 𝑦 𝑧 𝑨 𝐘 = 𝑏𝐰1 + 𝑐𝐰2 + 𝑑𝐰3, 𝐘 = 𝑦𝐯1 + 𝑧𝐯2 + 𝑨𝐯3
11-755/18-797 9
– A.B = 0 – A vector that is perpendicular to a plane is orthogonal to every vector on
– They are orthogonal – The length of each vector is 1.0 – Orthogonal vectors can be made orthonormal by scaling their lengths to 1.0
11-755/18-797 10
z y x A w v u B . zw yv xu B A
– The matrix is square – All row vectors are orthonormal to one another
– All column vectors are also orthonormal to one another – Observation: In an orthogonal matrix if the length of the row vectors is 1.0, the length of the column vectors is also 1.0 – Observation: In an orthogonal matrix no more than one row can have all entries with the same polarity (+ve or –ve)
11-755/18-797 11
5 . 75 . 375 . 125 . 5 . 375 . 125 . 5 .
All 3 at 90o to
angles between transformed vectors
– Essentially, they are combinations of rotations, reflections and permutations – Rotation matrices and permutation matrices are all orthogonal
11-755/18-797 12
q
be orthogonal
– AAT != I, ATA != I – AAT = Diagonal or ATA = Diagonal, but not both – If all the entries are the same length, we can get AAT = ATA = Diagonal, though
– AAT=I or ATA = I, but not both
11-755/18-797 13
5 . 75 . 375 . 125 . 5 . 1875 . 0675 . 1
transformation
– These are rank deficient matrices – The rank of the matrix is the dimensionality of the transformed version of a full-dimensional object
11-755/18-797 14
P * Cone =
transformation
– These are rank deficient matrices – The rank of the matrix is the dimensionality of the transformed version of a full-dimensional object
11-755/18-797 15
Rank = 2 Rank = 1
Projections are often examples of rank-deficient transforms
11-755/18-797 16
P = W (WTW)-1 WT ; Projected Spectrogram = P*M The original spectrogram can never be recovered
P is rank deficient
P explains all vectors in the new spectrogram as a mixture of
There are only a maximum of 4 linearly independent bases Rank of P is 4
M = W =
– More rows than columns add axes
11-755/18-797 17
N N
y y y x x x . . . .
2 1 2 1
X = 2D data 6 . 9 . 1 . 9 . 8 . P = transform PX = 3D, rank 2
N N N
z z z y y y x x x ˆ . . ˆ ˆ ˆ . . ˆ ˆ ˆ . . ˆ ˆ
2 1 2 1 2 1
– More rows than columns add axes
– Fewer rows than columns reduce axes
11-755/18-797 18
N N
y y y x x x ˆ . . ˆ ˆ ˆ . . ˆ ˆ
2 1 2 1
X = 3D data, rank 3
1 1 5 . 2 . 1 3 .
P = transform PX = 2D, rank 2
N N N
z z z y y y x x x . . . . . .
2 1 2 1 2 1
dimensioned object in the original space
– Cannot convert a circle to a sphere or a line to a circle
dimensions
11-755/18-797 19
1 1 5 . 2 . 1 3 . 6 . 9 . 1 . 9 . 8 .
11-755/18-797 20
Projected Spectrogram = P * M
Every vector in it is a combination of only 4 bases
The rank of the matrix is the smallest no. of bases required to
describe the output
E.g. if note no. 4 in P could be expressed as a combination of notes 1,2
and 3, it provides no additional information
Eliminating note no. 4 would give us the same projection The rank of P would be 3!
M =
11-755/18-797 21
86 . 44 . 42 . 9 . 4 . 1 . 8 . 5 . 9 . 86 . 9 . 8 . 44 . 4 . 5 . 42 . 1 . 9 .
row vectors
– Also the volume of the parallelepiped formed from its column vectors
11-755/18-797 22 (r1) (r2) (r1+r2) (r1) (r2)
– If V1 is the volume of an N-dimensional sphere “O” in N-dimensional space
– If V2 is the volume of the N-dimensional ellipsoid specified by A*O, where A is a matrix that transforms the space – |A| = V2 / V1
11-755/18-797 23
Volume = V1 Volume = V2
7 . 9 . 7 . 8 . 8 . . 1 7 . 8 .
– They characterize volumes in linearly transformed space of the same dimensionality as the vectors
– Since they compress full-volumed N-dimensional objects into zero- volume N-dimensional objects
does have area)
– Since they compress full-volumed N-dimensional objects into zero-volume objects
11-755/18-797 24
– Associative – Distributive – NOT commutative!!!
– Transposition
11-755/18-797 25
T T T
– Scaling volume sequentially by several matrices is equal to scaling
– The order in which you scale the volume of an object is irrelevant
11-755/18-797 26
N-dimensional object to a different N-dimensional
– The inverse transformation
called the matrix inverse
11-755/18-797 28
1
? ? ? ? ? ? ? ? ?
T Q
7 . 9 . 7 . 8 . 8 . . 1 7 . 8 . T
11-755/18-797 29
T T-1
1 1
1 1
11-755/18-797 30
T X Y
11-755/18-797 31
𝑏 𝑐 𝑑 = 𝐔 𝑦 𝑧 𝑨 𝑏 = 𝑈
11𝑦 + 𝑈 12𝑧 + 𝑈 13𝑨
𝑐 = 𝑈21𝑦 + 𝑈22𝑧 + 𝑈23𝑨 𝑑 = 𝑈31𝑦 + 𝑈32𝑧 + 𝑈33𝑨
Given 𝑏 𝑐 𝑑 find 𝑦 𝑧 𝑨
𝐔= 𝑈
11
𝑈
12
𝑈
13
𝑈21 𝑈22 𝑈23 𝑈31 𝑈32 𝑈33
– In the process, multiple points in the original object get mapped to the same point in the transformed object
– Because of the many-to-one forward mapping
11-755/18-797 32
75 . 433 . 433 . 25 . 1
simultaneous equations
– Cannot be inverted to obtain a unique solution
11-755/18-797 33
𝑏 𝑐 = 𝐔 𝑦 𝑧 𝑨
Given 𝑏 𝑐 find 𝑦 𝑧 𝑨
𝐔= 𝑈
11
𝑈
12
𝑈
13
𝑈21 𝑈22 𝑈23 𝑏 = 𝑈
11𝑦 + 𝑈 12𝑧 + 𝑈 13𝑨
𝑐 = 𝑈21𝑦 + 𝑈22𝑧 + 𝑈23𝑨
equations
– Cannot be inverted to obtain a unique solution
– Cannot be inverted to obtain an exact solution
11-755/18-797 34
𝑏 𝑐 𝑑 = 𝐔 𝑦 𝑧 𝐔= 𝑈
11
𝑈
12
𝑈21 𝑈22 𝑈31 𝑈32 𝑏 = 𝑈
11𝑦 + 𝑈 12𝑧
𝑐 = 𝑈21𝑦 + 𝑈22𝑧 𝑑 = 𝑈31𝑦 + 𝑈32𝑧
Given 𝑏 𝑐 𝑑 find 𝑦 𝑧
11-755/18-797 35
The projection matrix is rank deficient You cannot recover the original spectrogram from the
projected one..
M =
– Approximation: Vapprox = a.note1 + b.note2 + c.note3.. – Error vector E = V – Vapprox – Squared error energy for V e(V) = norm(E)2
minimized
11-755/18-797 36
T
note1 note2 note3
c b a T Vapprox
transformation of the vector [a b c]T
– Note – we’re viewing the collection of bases in T as a transformation
– This give us a LEAST SQUARES solution
11-755/18-797 37
c b a T Vapprox c b a T V
𝑏 𝑐 𝑑 = 𝑄𝑗𝑜𝑤 𝑈 𝑊
11-755/18-797 38
𝐘 = 𝐔𝐙 𝐙 = 𝐔−1𝐘 𝐘 = 𝐙𝐔 𝐙 = 𝐘𝐔−1
Left multiplication Right multiplication
𝐘 = 𝐔𝐙 𝐙 = Pinv(𝐔)𝐘 𝐘 = 𝐙𝐔 𝐙 = 𝐘Pinv(𝐔)
Left multiplication Right multiplication
– At least one (if not both) of the forward and backward equations may be inexact
11-755/18-797 39
𝑏 𝑐 = 𝐔 𝑦 𝑧 𝑨 𝑏 = 𝑈
11𝑦 + 𝑈 12𝑧 + 𝑈 13𝑨
𝑐 = 𝑈
21𝑦 + 𝑈 22𝑧 + 𝑈 23𝑨
𝑦 𝑧 𝑨 = 𝑄𝑗𝑜𝑤(𝐔) 𝑏 𝑐 X Y Z Plane of solutions Shortest solution Figure only meant for illustration for the above equations, actual set of solutions is a line, not a
the line closest to origin
11-755/18-797 40
𝑦 𝑧 = 𝑄𝑗𝑜𝑤(𝐔) 𝑏 𝑐 𝑑 Figure only meant for illustration for the above equations, Pinv(T) will actually have 6 components. The error is a quadratic in 6 dimensions 𝑏 𝑐 𝑑 = 𝐔 𝑦 𝑧 𝐔= 𝑈
11
𝑈
12
𝑈21 𝑈22 𝑈31 𝑈32 𝑏 = 𝑈
11𝑦 + 𝑈 12𝑧
𝑐 = 𝑈21𝑦 + 𝑈22𝑧 𝑑 = 𝑈31𝑦 + 𝑈32𝑧 (Pinv(T))12 (Pinv(T))11 ||X – Pinv(T)A||2 “Optimal” Pinv(T)
11-755/18-797 41
Recap: P = W (WTW)-1 WT, Projected Spectrogram = P*M
Approximation: M = W*X
The amount of W in each vector = X = PINV(W).M
W.Pinv(W).M = Projected Spectrogram
W.Pinv(W) = Projection matrix!! M = W = PINV(W) = (WTW)-1WT X =PINV(W).M
11-755/18-797 42
X = Pinv(W).M; Projected matrix = W.X = W.Pinv(W).M
M = W = X=PINV(W)M
11-755/18-797 43
W = M Pinv(V) U = WV
M = W =
V = U =
– Projection onto the columns of A
– Pinv(A) = NxK matrix – Pinv(A).A = I in this case
11-755/18-797 44
– T.X1 = Y1 – T.X2 = Y2 – .. – T.XN = YN
11-755/18-797 45
11-755/18-797 46
𝐘 = ↑ ⋮ ↑ 𝑌1 ⋱ 𝑌𝑂 ↓ ⋮ ↓ 𝐙 = ↑ ⋮ ↑ 𝑍
1
⋱ 𝑍
𝑂
↓ ⋮ ↓ 𝐘 𝐙
– But such a linear transform doesn’t really exist
squared error between Y and TX
47
𝐘 = ↑ ⋮ ↑ 𝑌1 ⋱ 𝑌𝑂 ↓ ⋮ ↓ 𝐙 = ↑ ⋮ ↑ 𝑍
1
⋱ 𝑍
𝑂
↓ ⋮ ↓ 𝐘 𝐙
minimizes ||𝑍
𝑗 − 𝑈𝑌𝑗||2 𝑗
– Not element-wise division!!
– Inverse of the unit matrix is itself – Inverse of a diagonal is diagonal – Inverse of a rotation is a (counter)rotation (its transpose!) – Inverse of a rank deficient matrix does not exist!
matrix pseudoinverse:
11-755/18-797 48
unscathed in character it is an eigen-something
– Sound example:
keep pointing the same way is an eigenvector
– Its length can change though
corresponding eigenvalue
– Each eigenvector of a matrix has its eigenvalue
11-755/18-797 49
– They may change length – V = eigen vector – l = eigen value
50
. 1 7 . 7 . 5 . 1 M
Black vectors are eigen vectors
11-755/18-797
11-755/18-797 51
11-755/18-797 52
2 . 1 1 . 1 07 . . 1 A
and l2
– The factors could be negative – implies flipping the paper
11-755/18-797 53
1.4 0.8
11-755/18-797 54
Draw two lines Stretch / shrink the paper along these lines by factors l1
and l2
The factors could be negative – implies flipping the paper
The result is a transformation of the space
11-755/18-797 55
Draw two lines Stretch / shrink the paper along these lines by factors l1
and l2
The factors could be negative – implies flipping the paper
The result is a transformation of the space
matrix
– The degree of stretching/shrinking are the corresponding eigenvalues
matrix
11-755/18-797 56
matrix
– The degree of stretching/shrinking are the corresponding eigenvalues
matrix
11-755/18-797 57
1 2 1 2 1
vectors
– E.g. consider a rotation matrix – This rotates every vector in the plane
11-755/18-797 58
' ' cos sin sin cos y x X y x X
new
q q q q
q
R
q
process
information about the transform
– Can you identify it?
11-755/18-797 59
2 . 1 1 . 1 07 . . 1 A
define the ellipse
– They are at right angles
the original circle!
11-755/18-797 60
2 . 1 1 . 1 07 . . 1 A
– Columns are orthonormal vectors
in U
– And scaled by the singular values that are the diagonal entries of S
11-755/18-797 61
2 . 1 1 . 1 07 . . 1 A
matlab: [U,S,V] = svd(A) A = U S VT V1 V2 s1U1 s2U2
– If A is not a square matrix, the left and right singular vectors will be of different dimensions
vector is scaled by A
– Max (|Ax| / |x|) = smax
a vector is scaled by A
– Min (|Ax| / |x|) = smin – This can be 0 (for low-rank or non-square matrices)
11-755/18-797 62
– This is also the product of the eigen values – I.e. there are two different sets of axes whose products give you the area of an ellipse
square submatrix B cannot be larger than the largest singular value of A
– An analogous rule applies to the smallest singular value – This property is utilized in various problems, such as compressive sensing
63
s1U1 s2U1
11-755/18-797
– Find two vectors such that their absolute directions are not changed by the transform
– Find two vectors such that the angle between them is not changed by the transform
11-755/18-797 64
s1U1 s2U1
– Transforms right singular vectors in V to left singular vectors U
– Transforms left singular vectors U to right singular vector V
– Result: Only scaling
11-755/18-797 65
1 1 . 7 . A
T
A
– Row and column vectors are identical
– U = V – A = U S UT
– Only scaling and, if Eigen values are negative, reflection
11-755/18-797 66
– Row and column vectors are identical
always real
– At 90 degrees to one another
11-755/18-797 67
11-755/18-797 68
1 7 . 7 . 5 . 1
– Vi
TVi = 1
– Vi
TVj = 0, i != j
– VT = V-1 – VT V = I – V VT= I
– is a diagonal matrix with all eigen values
11-755/18-797 69
11-755/18-797 70
T T
T T
T T
– Real, positive Eigen values represent stretching of the space along the Eigen vector – Real, negative Eigen values represent stretching and reflection (across origin) of Eigen vector – Complex Eigen values occur in conjugate pairs
values are real and positive, and are greater than 0
– Transformation can be explained as stretching and rotation – If any Eigen value is zero, the matrix is positive semi-definite
11-755/18-797 71
– xTAx is always positive for any vector x if A is positive definite
– Such as correlation and covariance matrices – We will encounter these and other gram matrices later
11-755/18-797 72
– N vectors of dimension d
– All vectors are length 1
11-755/18-797 73
11-755/18-797 74
|Ui| = 1.0 for every vector in U |Vi| = 1.0 for every vector in V
11-755/18-797 75
𝑈 𝑗
contribute on “basic” component to the data
value
11-755/18-797 76
4 4 4 3 3 3 2 2 2 1 1 1
T T T T
contribute on “basic” component to the data
value
– Carry little information – Are often just “noise” in the data
11-755/18-797 77
4 4 4 3 3 3 2 2 2 1 1 1
T T T T
– Carry little information – Are often just “noise” in the data
minimal change of value
– Minimum squared error between original data and recomposed data – Sometimes eliminating the low-singular-value components will, in fact “clean” the data
11-755/18-797 78
4 4 4 3 3 3 2 2 2 1 1 1
T T T T
T T
2 2 2 1 1 1
– A 1024x974 matrix!
T
11-755/18-797 79
– Most Singluar values are close to zero
– The corresponding components are “unimportant”
11-755/18-797 80
– Looks similar
– Sounds pretty close – Background “cleaned up”
11-755/18-797 81
– Corresponding to the 5 largest singular values – Highly recognizable – Suggests that there are actually only 5 significant unique note combinations in the music
11-755/18-797 82
11-755/18-797 83
44 43 42 41 34 33 32 31 24 23 22 21 14 13 12 11
a a a a a a a a a a a a a a a a A
44 33 22 11
) ( a a a a A Tr
i i i
a A Tr
,
) (
i i i i i
a A Tr l
,
) (
11-755/18-797 84
44 43 42 41 34 33 32 31 24 23 22 21 14 13 12 11
d d d d a a a d d d d d d d d d D
44 43 42 41 34 33 32 31 24 23 22 21 14 13 12 11
c c c c c c c c c c c c c c c c C
C D E
j i j i
E error
, 2 ,
) (
T
EE Tr error
2 = Tr(AA T)
11-755/18-797 85
– Decompose A = L U – L is a lower triangular matrix
– R is an upper triangular matrix
– Cholesky decomposition: A is symmetric, L = UT
– Q is orthgonal: QQT = I – R is upper triangular
compute Eigen decomposition or least square solutions
11-755/18-797 86
11-755/18-797 87
N
1
N
1
N x 1 vector N x 1 vector
11-755/18-797 88
23 22 21 13 12 11
23 22 21 13 12 11
N x M matrix N x M matrix
11-755/18-797 89
N M M N
1 1 1 1
M x N matrix
N
1
M
1
11-755/18-797 90
M x 3 x 2 tensor
M
1
23 22 21 13 12 11
(i,j,k)th element =
j k i
,
11-755/18-797 91
23 22 21 13 12 11
(i,j)th element =
i j
, 11
11-755/18-797 92
11-755/18-797 93
# #
11-755/18-797 94
2