SLIDE 1
compsci 514: algorithms for data science
Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 14
SLIDE 2 logistics
- Midterm grades are on Moodle.
- Average was 32.67, median 33, standard deviation 6.8
- Come to office hours if you would like to see your
exam/discuss solutions.
1
SLIDE 3 summary
Last Few Weeks: Low-Rank Approximation and PCA
- Compress data that lies close to a k-dimensional subspace.
- Equivalent to finding a low-rank approximation of the data
matrix X: X XVVT.
- Optimal solution via PCA (eigendecomposition of XTX or
equivalently, SVD of X). This Class: Non-linear dimensionality reduction.
- How do we compress data that does not lie close to a
k-dimensional subspace?
- Spectral methods (SVD and eigendecomposition) are still key
techniques in this setting.
- Spectral graph theory, spectral clustering.
2
SLIDE 4 summary
Last Few Weeks: Low-Rank Approximation and PCA
- Compress data that lies close to a k-dimensional subspace.
- Equivalent to finding a low-rank approximation of the data
matrix X: X ≈ XVVT.
- Optimal solution via PCA (eigendecomposition of XTX or
equivalently, SVD of X). This Class: Non-linear dimensionality reduction.
- How do we compress data that does not lie close to a
k-dimensional subspace?
- Spectral methods (SVD and eigendecomposition) are still key
techniques in this setting.
- Spectral graph theory, spectral clustering.
2
SLIDE 5 summary
Last Few Weeks: Low-Rank Approximation and PCA
- Compress data that lies close to a k-dimensional subspace.
- Equivalent to finding a low-rank approximation of the data
matrix X: X ≈ XVVT.
- Optimal solution via PCA (eigendecomposition of XTX or
equivalently, SVD of X). This Class: Non-linear dimensionality reduction.
- How do we compress data that does not lie close to a
k-dimensional subspace?
- Spectral methods (SVD and eigendecomposition) are still key
techniques in this setting.
- Spectral graph theory, spectral clustering.
2
SLIDE 6 summary
Last Few Weeks: Low-Rank Approximation and PCA
- Compress data that lies close to a k-dimensional subspace.
- Equivalent to finding a low-rank approximation of the data
matrix X: X ≈ XVVT.
- Optimal solution via PCA (eigendecomposition of XTX or
equivalently, SVD of X). This Class: Non-linear dimensionality reduction.
- How do we compress data that does not lie close to a
k-dimensional subspace?
- Spectral methods (SVD and eigendecomposition) are still key
techniques in this setting.
- Spectral graph theory, spectral clustering.
2
SLIDE 7 entity embeddings
End of Last Class: Embedding objects other than vectors into Euclidean space.
- Documents (for topic-based search and classification)
- Words (to identify synonyms, translations, etc.)
- Nodes in a social network
Usual Approach: Convert each item into a high-dimensional feature vector and then apply low-rank approximation
3
SLIDE 8 entity embeddings
End of Last Class: Embedding objects other than vectors into Euclidean space.
- Documents (for topic-based search and classification)
- Words (to identify synonyms, translations, etc.)
- Nodes in a social network
Usual Approach: Convert each item into a high-dimensional feature vector and then apply low-rank approximation
3
SLIDE 9 entity embeddings
End of Last Class: Embedding objects other than vectors into Euclidean space.
- Documents (for topic-based search and classification)
- Words (to identify synonyms, translations, etc.)
- Nodes in a social network
Usual Approach: Convert each item into a high-dimensional feature vector and then apply low-rank approximation
3
SLIDE 10
example: latent semantic analysis
4
SLIDE 11
example: latent semantic analysis
4
SLIDE 12 example: latent semantic analysis
YZT F is small, then on average, Xi a YZT
i a
yi za
1 when doci contains worda.
- If doci and docj both contain worda, yi za
yj za 1.
5
SLIDE 13 example: latent semantic analysis
- If the error ∥X − YZT∥F is small, then on average,
Xi,a ≈ (YZT)i,a = ⟨⃗ yi,⃗ za⟩.
1 when doci contains worda.
- If doci and docj both contain worda, yi za
yj za 1.
5
SLIDE 14 example: latent semantic analysis
- If the error ∥X − YZT∥F is small, then on average,
Xi,a ≈ (YZT)i,a = ⟨⃗ yi,⃗ za⟩.
yi,⃗ za⟩ ≈ 1 when doci contains worda.
- If doci and docj both contain worda, yi za
yj za 1.
5
SLIDE 15 example: latent semantic analysis
- If the error ∥X − YZT∥F is small, then on average,
Xi,a ≈ (YZT)i,a = ⟨⃗ yi,⃗ za⟩.
yi,⃗ za⟩ ≈ 1 when doci contains worda.
- If doci and docj both contain worda, ⟨⃗
yi,⃗ za⟩ ≈ ⟨⃗ yj,⃗ za⟩ = 1.
5
SLIDE 16
example: latent semantic analysis
If doci and docj both contain worda, ⟨⃗ yi,⃗ za⟩ ≈ ⟨⃗ yj,⃗ za⟩ = 1 Another View: Each column of Y represents a ‘topic’. yi j indicates how much doci belongs to topic j. za j indicates how much worda associates with that topic.
6
SLIDE 17
example: latent semantic analysis
If doci and docj both contain worda, ⟨⃗ yi,⃗ za⟩ ≈ ⟨⃗ yj,⃗ za⟩ = 1 Another View: Each column of Y represents a ‘topic’. ⃗ yi(j) indicates how much doci belongs to topic j. ⃗ za(j) indicates how much worda associates with that topic.
6
SLIDE 18 example: latent semantic analysis
- Just like with documents, ⃗
za and ⃗ zb will tend to have high dot product if wordi and wordj appear in many of the same documents.
- In an SVD decomposition we set Z
kVT K.
- The columns of Vk are equivalently: the top k eigenvectors of XXT.
The eigendecomposition of XXT is XXT V
2VT.
- What is the best rank-k approximation of XXT? I.e.
arg minrank
k B XXT
B F
Vk
2 kVT k
ZZT.
7
SLIDE 19 example: latent semantic analysis
- Just like with documents, ⃗
za and ⃗ zb will tend to have high dot product if wordi and wordj appear in many of the same documents.
- In an SVD decomposition we set Z = ΣkVT
K.
- The columns of Vk are equivalently: the top k eigenvectors of XXT.
The eigendecomposition of XXT is XXT V
2VT.
- What is the best rank-k approximation of XXT? I.e.
arg minrank
k B XXT
B F
Vk
2 kVT k
ZZT.
7
SLIDE 20 example: latent semantic analysis
- Just like with documents, ⃗
za and ⃗ zb will tend to have high dot product if wordi and wordj appear in many of the same documents.
- In an SVD decomposition we set Z = ΣkVT
K.
- The columns of Vk are equivalently: the top k eigenvectors of XXT.
The eigendecomposition of XXT is XXT V
2VT.
- What is the best rank-k approximation of XXT? I.e.
arg minrank
k B XXT
B F
Vk
2 kVT k
ZZT.
7
SLIDE 21 example: latent semantic analysis
- Just like with documents, ⃗
za and ⃗ zb will tend to have high dot product if wordi and wordj appear in many of the same documents.
- In an SVD decomposition we set Z = ΣkVT
K.
- The columns of Vk are equivalently: the top k eigenvectors of XXT.
The eigendecomposition of XXT is XXT = VΣ2VT.
- What is the best rank-k approximation of XXT? I.e.
arg minrank
k B XXT
B F
Vk
2 kVT k
ZZT.
7
SLIDE 22 example: latent semantic analysis
- Just like with documents, ⃗
za and ⃗ zb will tend to have high dot product if wordi and wordj appear in many of the same documents.
- In an SVD decomposition we set Z = ΣkVT
K.
- The columns of Vk are equivalently: the top k eigenvectors of XXT.
The eigendecomposition of XXT is XXT = VΣ2VT.
- What is the best rank-k approximation of XXT? I.e.
arg minrank −k B ∥XXT − B∥F
Vk
2 kVT k
ZZT.
7
SLIDE 23 example: latent semantic analysis
- Just like with documents, ⃗
za and ⃗ zb will tend to have high dot product if wordi and wordj appear in many of the same documents.
- In an SVD decomposition we set Z = ΣkVT
K.
- The columns of Vk are equivalently: the top k eigenvectors of XXT.
The eigendecomposition of XXT is XXT = VΣ2VT.
- What is the best rank-k approximation of XXT? I.e.
arg minrank −k B ∥XXT − B∥F
kVT k = ZZT.
7
SLIDE 24 example: word embedding
LSA gives a way of embedding words into k-dimensional space.
- Embedding is via low-rank approximation of XXT: where (XXT)a,b is
the number of documents that both worda and wordb appear in.
- Think about XXT as a similarity matrix (gram matrix, kernel matrix)
with entry a b being the similarity between worda and wordb.
- Many ways to measure similarity: number of sentences both occur
in, number of time both appear in the same window of w words, in similar positions of documents in different languages, etc.
- Replacing XXT with these different metrics (sometimes
appropriately transformed) leads to popular word embedding algorithms: word2vec, GloVe, fastTest, etc.
8
SLIDE 25 example: word embedding
LSA gives a way of embedding words into k-dimensional space.
- Embedding is via low-rank approximation of XXT: where (XXT)a,b is
the number of documents that both worda and wordb appear in.
- Think about XXT as a similarity matrix (gram matrix, kernel matrix)
with entry (a, b) being the similarity between worda and wordb.
- Many ways to measure similarity: number of sentences both occur
in, number of time both appear in the same window of w words, in similar positions of documents in different languages, etc.
- Replacing XXT with these different metrics (sometimes
appropriately transformed) leads to popular word embedding algorithms: word2vec, GloVe, fastTest, etc.
8
SLIDE 26 example: word embedding
LSA gives a way of embedding words into k-dimensional space.
- Embedding is via low-rank approximation of XXT: where (XXT)a,b is
the number of documents that both worda and wordb appear in.
- Think about XXT as a similarity matrix (gram matrix, kernel matrix)
with entry (a, b) being the similarity between worda and wordb.
- Many ways to measure similarity: number of sentences both occur
in, number of time both appear in the same window of w words, in similar positions of documents in different languages, etc.
- Replacing XXT with these different metrics (sometimes
appropriately transformed) leads to popular word embedding algorithms: word2vec, GloVe, fastTest, etc.
8
SLIDE 27 example: word embedding
LSA gives a way of embedding words into k-dimensional space.
- Embedding is via low-rank approximation of XXT: where (XXT)a,b is
the number of documents that both worda and wordb appear in.
- Think about XXT as a similarity matrix (gram matrix, kernel matrix)
with entry (a, b) being the similarity between worda and wordb.
- Many ways to measure similarity: number of sentences both occur
in, number of time both appear in the same window of w words, in similar positions of documents in different languages, etc.
- Replacing XXT with these different metrics (sometimes
appropriately transformed) leads to popular word embedding algorithms: word2vec, GloVe, fastTest, etc.
8
SLIDE 28
example: word embedding
Note: word2vec is typically described as a neural-network method, but it is really just low-rank approximation of a specific similarity matrix. Neural word embedding as implicit matrix factorization, Levy and Goldberg.
9
SLIDE 29
example: word embedding
Note: word2vec is typically described as a neural-network method, but it is really just low-rank approximation of a specific similarity matrix. Neural word embedding as implicit matrix factorization, Levy and Goldberg.
9
SLIDE 30 similarity via graphs
A common way of encoding similarity is via a graph. E.g., a k-nearest neighbor graph.
- Connect items to similar items, possibly with higher weight
edges when they are more similar. Is this set of points compressible? Does it lie close to a low-dimensional subspace?
10
SLIDE 31 similarity via graphs
A common way of encoding similarity is via a graph. E.g., a k-nearest neighbor graph.
- Connect items to similar items, possibly with higher weight
edges when they are more similar. Is this set of points compressible? Does it lie close to a low-dimensional subspace?
10
SLIDE 32 similarity via graphs
A common way of encoding similarity is via a graph. E.g., a k-nearest neighbor graph.
- Connect items to similar items, possibly with higher weight
edges when they are more similar. Is this set of points compressible? Does it lie close to a low-dimensional subspace?
10
SLIDE 33 similarity via graphs
A common way of encoding similarity is via a graph. E.g., a k-nearest neighbor graph.
- Connect items to similar items, possibly with higher weight
edges when they are more similar. Is this set of points compressible? Does it lie close to a low-dimensional subspace?
10
SLIDE 34 similarity via graphs
A common way of encoding similarity is via a graph. E.g., a k-nearest neighbor graph.
- Connect items to similar items, possibly with higher weight
edges when they are more similar. Is this set of points compressible? Does it lie close to a low-dimensional subspace?
10
SLIDE 35
linear algebraic representation of a graph
Once we have connected n data points x1, . . . , xn into a graph, we can represent that graph by its (weighted) adjacency matrix. A ∈ Rn×n with Ai,j = edge weight between nodes i and j In LSA example, when X is the term-document matrix, XTX is like an adjacency matrix, where worda and wordb are connected if they appear in at least 1 document together (edge weight is documents they appear in together).
11
SLIDE 36
linear algebraic representation of a graph
Once we have connected n data points x1, . . . , xn into a graph, we can represent that graph by its (weighted) adjacency matrix. A ∈ Rn×n with Ai,j = edge weight between nodes i and j In LSA example, when X is the term-document matrix, XTX is like an adjacency matrix, where worda and wordb are connected if they appear in at least 1 document together (edge weight is documents they appear in together).
11
SLIDE 37
linear algebraic representation of a graph
Once we have connected n data points x1, . . . , xn into a graph, we can represent that graph by its (weighted) adjacency matrix. A ∈ Rn×n with Ai,j = edge weight between nodes i and j In LSA example, when X is the term-document matrix, XTX is like an adjacency matrix, where worda and wordb are connected if they appear in at least 1 document together (edge weight is # documents they appear in together).
11
SLIDE 38
normalized adjacency matrix
What is the sum of entries in the ith column of A? The (weighted) degree of vertex i. Often, A is normalized as A D
1 2AD 1 2 where D is the
degree matrix. Spectral graph theory is the field of representing graphs as matrices and applying linear algebraic techniques.
12
SLIDE 39
normalized adjacency matrix
What is the sum of entries in the ith column of A? The (weighted) degree of vertex i. Often, A is normalized as A D
1 2AD 1 2 where D is the
degree matrix. Spectral graph theory is the field of representing graphs as matrices and applying linear algebraic techniques.
12
SLIDE 40
normalized adjacency matrix
What is the sum of entries in the ith column of A? The (weighted) degree of vertex i. Often, A is normalized as ¯ A = D−1/2AD−1/2 where D is the degree matrix. Spectral graph theory is the field of representing graphs as matrices and applying linear algebraic techniques.
12
SLIDE 41
normalized adjacency matrix
What is the sum of entries in the ith column of A? The (weighted) degree of vertex i. Often, A is normalized as ¯ A = D−1/2AD−1/2 where D is the degree matrix. Spectral graph theory is the field of representing graphs as matrices and applying linear algebraic techniques.
12
SLIDE 42
normalized adjacency matrix
What is the sum of entries in the ith column of A? The (weighted) degree of vertex i. Often, A is normalized as ¯ A = D−1/2AD−1/2 where D is the degree matrix. Spectral graph theory is the field of representing graphs as matrices and applying linear algebraic techniques.
12
SLIDE 43
normalized adjacency matrix
What is the sum of entries in the ith column of A? The (weighted) degree of vertex i. Often, A is normalized as ¯ A = D−1/2AD−1/2 where D is the degree matrix. Spectral graph theory is the field of representing graphs as matrices and applying linear algebraic techniques.
12
SLIDE 44 adjacency matrix eigenvectors
How do we compute an optimal low-rank approximation of A?
- Project onto the top k eigenvectors of ATA
A2. These are just the eigenvectors of A.
13
SLIDE 45 adjacency matrix eigenvectors
How do we compute an optimal low-rank approximation of A?
- Project onto the top k eigenvectors of ATA = A2.
These are just the eigenvectors of A.
13
SLIDE 46 adjacency matrix eigenvectors
How do we compute an optimal low-rank approximation of A?
- Project onto the top k eigenvectors of ATA = A2. These are
just the eigenvectors of A.
13
SLIDE 47 adjacency matrix eigenvectors
- Similar vertices (close with regards to graph proximity)
should have similar embeddings. I.e., Vk i should be similar to Vk j .
14
SLIDE 48 adjacency matrix eigenvectors
- Similar vertices (close with regards to graph proximity)
should have similar embeddings. I.e., Vk(i) should be similar to Vk(j).
14
SLIDE 49
spectral embedding
15
SLIDE 50
spectral clustering
A very common task aside from just embedding points via graph based similarity and SVD, is to partition or cluster vertices based on this similarity. Non-linearly separable data.
16
SLIDE 51
spectral clustering
A very common task aside from just embedding points via graph based similarity and SVD, is to partition or cluster vertices based on this similarity. Non-linearly separable data.
16
SLIDE 52
spectral clustering
A very common task aside from just embedding points via graph based similarity and SVD, is to partition or cluster vertices based on this similarity. Community detection in naturally occuring networks.
16
SLIDE 53 cut minimization
Simple Idea: Partition clusters along minimum cut in graph. Small cuts are often not informative. Solution: Encourage cuts that separate large sections of the graph.
n represent a cut: v i
1 if i S and v i 1 if i T. Want v to have roughly equal numbers of 1s and
0.
17
SLIDE 54 cut minimization
Simple Idea: Partition clusters along minimum cut in graph. Small cuts are often not informative. Solution: Encourage cuts that separate large sections of the graph.
n represent a cut: v i
1 if i S and v i 1 if i T. Want v to have roughly equal numbers of 1s and
0.
17
SLIDE 55 cut minimization
Simple Idea: Partition clusters along minimum cut in graph. Small cuts are often not informative. Solution: Encourage cuts that separate large sections of the graph.
n represent a cut: v i
1 if i S and v i 1 if i T. Want v to have roughly equal numbers of 1s and
0.
17
SLIDE 56 cut minimization
Simple Idea: Partition clusters along minimum cut in graph. Small cuts are often not informative. Solution: Encourage cuts that separate large sections of the graph.
v ∈ Rn represent a cut: ⃗ v(i) = 1 if i ∈ S and ⃗ v(i) = −1 if i ∈ T. Want ⃗ v to have roughly equal numbers of 1s and −1s. I.e., ⃗ vT⃗ 1 ≈ 0.
17
SLIDE 57 the laplacian view
For a graph with adjacency matrix A and degree matrix D, L = D − A is the graph Laplacian. For any vector v, vTLv vTDv vTAv
n i 1
d i v i 2
n i 1 n j 1
A i j v i v j
18
SLIDE 58 the laplacian view
For a graph with adjacency matrix A and degree matrix D, L = D − A is the graph Laplacian. For any vector ⃗ v, ⃗ vTL⃗ v = ⃗ vTD⃗ v −⃗ vTA⃗ v =
n
∑
i=1
d(i)⃗ v(i)2 −
n
∑
i=1 n
∑
j=1
A(i, j) · v(i) · v(j)
18
SLIDE 59 the laplacian view
For a cut indicator vector ⃗ v ∈ {−1, 1}n with ⃗ v(i) = −1 for i ∈ S and ⃗ v(i) = 1 for i ∈ T: ⃗ vTL⃗ V = ∑
(i,j)∈E
(⃗ v(i) −⃗ v(j))2 = 4 · cut(S, T). So minimizing ⃗ vTL⃗ v corresponds to minimizing the cut size. arg min
v 1 1 n vTLV
By the Courant-Fischer theorem, v is the smallest eigenvector
D A.
19
SLIDE 60 the laplacian view
For a cut indicator vector ⃗ v ∈ {−1, 1}n with ⃗ v(i) = −1 for i ∈ S and ⃗ v(i) = 1 for i ∈ T: ⃗ vTL⃗ V = ∑
(i,j)∈E
(⃗ v(i) −⃗ v(j))2 = 4 · cut(S, T). So minimizing ⃗ vTL⃗ v corresponds to minimizing the cut size. arg min
v∈{−1,1}n
⃗ vTL⃗ V By the Courant-Fischer theorem, v is the smallest eigenvector
D A.
19
SLIDE 61 the laplacian view
For a cut indicator vector ⃗ v ∈ {−1, 1}n with ⃗ v(i) = −1 for i ∈ S and ⃗ v(i) = 1 for i ∈ T: ⃗ vTL⃗ V = ∑
(i,j)∈E
(⃗ v(i) −⃗ v(j))2 = 4 · cut(S, T). So minimizing ⃗ vTL⃗ v corresponds to minimizing the cut size. arg min
v∈Rd with ∥⃗ v∥=1
⃗ vTL⃗ V By the Courant-Fischer theorem, v is the smallest eigenvector
D A.
19
SLIDE 62 the laplacian view
For a cut indicator vector ⃗ v ∈ {−1, 1}n with ⃗ v(i) = −1 for i ∈ S and ⃗ v(i) = 1 for i ∈ T: ⃗ vTL⃗ V = ∑
(i,j)∈E
(⃗ v(i) −⃗ v(j))2 = 4 · cut(S, T). So minimizing ⃗ vTL⃗ v corresponds to minimizing the cut size. arg min
v∈Rd with ∥⃗ v∥=1
⃗ vTL⃗ V By the Courant-Fischer theorem, ⃗ v is the smallest eigenvector
19
SLIDE 63
smallest laplacian eigenvector
We have: ⃗ vn = 1 √n ·⃗ 1 = arg min
v∈Rdwith ∥⃗ v∥=1
⃗ vTL⃗ V with ⃗ vT
nL⃗
vn = 0.
20
SLIDE 64
smallest laplacian eigenvector
We have: ⃗ vn = 1 √n ·⃗ 1 = arg min
v∈Rdwith ∥⃗ v∥=1
⃗ vTL⃗ V with ⃗ vT
nL⃗
vn = 0.
20
SLIDE 65 second smallest laplacian eigenvector
By Courant-Fischer, second small eigenvector is obtained greedily: ⃗ v1 = arg min
v∈Rdwith ∥⃗ v∥=1
⃗ vTL⃗ V ⃗ v2 = arg min
v∈Rdwith ∥⃗ v∥=1, ⃗ vT
2⃗
v1=0
⃗ vTL⃗ V If v2 were binary 1 1 d, orthogonality condition ensures that there are an equal number of vertices on each side of the cut. When v2
d, enforces a ‘relaxed’ version of this constraint. 21
SLIDE 66 second smallest laplacian eigenvector
By Courant-Fischer, second small eigenvector is obtained greedily: ⃗ v1 = arg min
v∈Rdwith ∥⃗ v∥=1
⃗ vTL⃗ V ⃗ v2 = arg min
v∈Rdwith ∥⃗ v∥=1, ⃗ vT
2⃗
v1=0
⃗ vTL⃗ V If ⃗ v2 were binary {−1, 1}d, orthogonality condition ensures that there are an equal number of vertices on each side of the cut. When v2
d, enforces a ‘relaxed’ version of this constraint. 21
SLIDE 67 second smallest laplacian eigenvector
By Courant-Fischer, second small eigenvector is obtained greedily: ⃗ v1 = arg min
v∈Rdwith ∥⃗ v∥=1
⃗ vTL⃗ V ⃗ v2 = arg min
v∈Rdwith ∥⃗ v∥=1, ⃗ vT
2⃗
v1=0
⃗ vTL⃗ V If ⃗ v2 were binary {−1, 1}d, orthogonality condition ensures that there are an equal number of vertices on each side of the cut. When ⃗ v2 ∈ Rd, enforces a ‘relaxed’ version of this constraint.
21
SLIDE 68 cutting with the second laplacian eigenvector
Find a good partition of the graph by computing ⃗ v2 = arg min
v∈Rdwith ∥⃗ v∥=1, ⃗ vT
2⃗
1=0
⃗ vTL⃗ V Set S to be all nodes with ⃗ v2(i) < 0, T to be all with ⃗ v2(i) ≥ 0. The Shi-Malik normalized cuts algorithm is a commonly used variance on this approach, using the normalize Laplacian D
1 2LD 1 2. 22
SLIDE 69 cutting with the second laplacian eigenvector
Find a good partition of the graph by computing ⃗ v2 = arg min
v∈Rdwith ∥⃗ v∥=1, ⃗ vT
2⃗
1=0
⃗ vTL⃗ V Set S to be all nodes with ⃗ v2(i) < 0, T to be all with ⃗ v2(i) ≥ 0. The Shi-Malik normalized cuts algorithm is a commonly used variance on this approach, using the normalize Laplacian D
1 2LD 1 2. 22
SLIDE 70 cutting with the second laplacian eigenvector
Find a good partition of the graph by computing ⃗ v2 = arg min
v∈Rdwith ∥⃗ v∥=1, ⃗ vT
2⃗
1=0
⃗ vTL⃗ V Set S to be all nodes with ⃗ v2(i) < 0, T to be all with ⃗ v2(i) ≥ 0. The Shi-Malik normalized cuts algorithm is a commonly used variance on this approach, using the normalize Laplacian D
1 2LD 1 2. 22
SLIDE 71 cutting with the second laplacian eigenvector
Find a good partition of the graph by computing ⃗ v2 = arg min
v∈Rdwith ∥⃗ v∥=1, ⃗ vT
2⃗
1=0
⃗ vTL⃗ V Set S to be all nodes with ⃗ v2(i) < 0, T to be all with ⃗ v2(i) ≥ 0. The Shi-Malik normalized cuts algorithm is a commonly used variance on this approach, using the normalize Laplacian D−1/2LD−1/2.
22
SLIDE 72 laplacian embedding
The smallest eigenvectors of L = D − A give the orthogonal ‘functions’ that are smoothest over the graph. I.e., minimize ⃗ vTL⃗ v = ∑
(i,j)∈E
[⃗ v(i) −⃗ v(j)]2. Embedding points with coordinates given by vn
1 j
vn
2 j
vn
k j
ensures that coordinates connected by edges have minimum Euclidean distance.
- Laplacian Eigenmaps
- Locally linear embedding
- Isomap
- Etc...
23
SLIDE 73 laplacian embedding
The smallest eigenvectors of L = D − A give the orthogonal ‘functions’ that are smoothest over the graph. I.e., minimize ⃗ vTL⃗ v = ∑
(i,j)∈E
[⃗ v(i) −⃗ v(j)]2. Embedding points with coordinates given by [⃗ vn−1(j),⃗ vn−2(j), . . . ,⃗ vn−k(j)] ensures that coordinates connected by edges have minimum Euclidean distance.
- Laplacian Eigenmaps
- Locally linear embedding
- Isomap
- Etc...
23
SLIDE 74 laplacian embedding
The smallest eigenvectors of L = D − A give the orthogonal ‘functions’ that are smoothest over the graph. I.e., minimize ⃗ vTL⃗ v = ∑
(i,j)∈E
[⃗ v(i) −⃗ v(j)]2. Embedding points with coordinates given by [⃗ vn−1(j),⃗ vn−2(j), . . . ,⃗ vn−k(j)] ensures that coordinates connected by edges have minimum Euclidean distance.
- Laplacian Eigenmaps
- Locally linear embedding
- Isomap
- Etc...
23
SLIDE 75 laplacian embedding
The smallest eigenvectors of L = D − A give the orthogonal ‘functions’ that are smoothest over the graph. I.e., minimize ⃗ vTL⃗ v = ∑
(i,j)∈E
[⃗ v(i) −⃗ v(j)]2. Embedding points with coordinates given by [⃗ vn−1(j),⃗ vn−2(j), . . . ,⃗ vn−k(j)] ensures that coordinates connected by edges have minimum Euclidean distance.
- Laplacian Eigenmaps
- Locally linear embedding
- Isomap
- Etc...
23
SLIDE 76
Questions?
24