Large scale graph learning from smooth signals
Kalofolias Vassilis Nathanael Perraudin
Large scale graph learning from smooth signals Kalofolias Vassilis - - PowerPoint PPT Presentation
Large scale graph learning from smooth signals Kalofolias Vassilis Nathanael Perraudin 13 November 2019 Graph learning learn Given W graph G matrix X x i x j X weighted adjacency matrix rows: objects 2 Dimensionality - manifolds
Kalofolias Vassilis Nathanael Perraudin
2
X
xi xj
3
n 60K m 784
4
n 60K m 784
data smoothness
5
Dirichlet energy is small:
= tr
F
= 1 2 X
i,j
Wi,jkxi xjk2
2
r>
GrG
graph sparsity Data lives on a low-dimensional manifold
D − W
Zij = kxi xjk2
2 F
F
6
F s.t. tr(L) = n
2 F
2 ≤ αn
[Kalofolias 2016] [Hu etal 2015, Dong etal 2016] [Lake & Tenenbaum 2010] [Daitch etal 2009] [Daitch etal 2009]
7
min
W 2Wm kW Zk1,1 α1> log(W1) + β
2 kWk2
F
Goal: scale it further!
O(n2)
8
9
Sketch of algorithm: Approximately minimize each function
10
min
W 2Wm kW Zk1,1 α1> log(W1) + β
2 kWk2
F
Objective can be split in 3 functions:
O(n2) O(n2) O(n2)
min
W 2Wm kM W Zk1,1 α1> log((M W)1) + β
2 kM Wk2
F
<latexit sha1_base64="1X4DdUDhRTIg6jQ2POJ2K9/o60=">ACm3icdVHbahsxENVuL0ncm9O+BEpB1BQc2pdJ6TtSwkNtKG0kEKdDbWcRStrbRFdFklbML8Uf2UvVvqrVNb0kPCB3NnJnRzBQVZ8YmyY8ovnb9xs2Nza3Wrdt37t5rb98/NarWhA6I4kqfFdhQziQdWGY5Pas0xaLgNCsujhp/9pVqw5T8bGcVHQk8kaxkBNtgytvfkGAydxli0iGCOcx8LjxcoHl4NtFO07H/6BFhmsBsdX1B89ylz9Kge76ACPNqiES2E6L0qX+HFlVQcTVpNu9Msvub+0uXDwNKUqNiUMFtdi7vof/qz7P357383Yn6SVLwMskXZMOWOMkb39HY0VqQaUlHBszTJPKjhzWlhFOfQvVhlaYXOAJHQYqsaBm5Jb1PXwSLGNYKh2OtHBp/TPCYWHMTBRB2TRl/vU1xqt8w9qWL0eOyaq2VJVobLm0CrYLAqOmabE8lkgmGgW/grJFIc52bDO1nIrxoc/Gr5Mjnt9K93v6n/c7hm/U4NsFD8Bh0QpegENwDE7AJBoJ3odvYuO40fxUfw+/rCSxtE65gH4C/HgJ0cjzcs=</latexit>11
Objective can be split in 3 functions:
O
O
O
12
Compute approximate 30 NN graph (binary)
|Eallowed| ≈ 3kn = 30n
<latexit sha1_base64="rt96viDYlqLKtVSWihgSAO2zXs=">ACFnicbVBNSyNBFOzxY9Wsu2b16KUxCHvZMNmIHwdBFMGjglEhk5U3nRdt0tM9dL9xDWN+hRf/ihcPingVb/4be2KQVbegoaiqx+tXcaqkozB8DkZGx8a/TExOlb5Of/s+U/4xe+BMZgU2hFHGHsXgUEmNDZKk8Ci1CEms8DubhX+4RlaJ43ep16KrQROtOxIAeSl4/KviygBOhWg8u3+n4jwnHJQyvzFdv8igjS15pzXu5qv83ro85WwGg7AP5PakFTYELvH5aeobUSWoCahwLlmLUyplYMlKRT2S1HmMAXRhRNseqohQdfKB2f1+aJX2rxjrH+a+ED9dyKHxLleEvtkcYP76BXi/7xmRp3Vi51mhFq8bqokylOhcd8ba0KEj1PAFhpf8rF6dgQZBvsjQoYa3A8tvJn8nB72qtXl3aW6psbA7rmGTzbIH9ZDW2wjbYDtlDSbYJbtmt+wuApugvg4TU6Egxn5tg7BI8v1vCf3w=</latexit>13
“I want a graph with 10 edges per node on average” Learn weights for allowed edges Some of them are deleted! (Wij=0) Final 10 NN graph
O (n log(n)m) O (nk)
k = 10
<latexit sha1_base64="AaM9PN1QDHhPamzaKcpO8GZJaU=">AB63icbVDLSsNAFL2pr1pfUZduBovgqiRafCyEohuXFewD2lAm0k7dGYSZiZCf0FNy4UcesPufNvTNIgaj1w4XDOvdx7jx9xpo3jfFqlpeWV1bXyemVjc2t7x97da+swVoS2SMhD1fWxpxJ2jLMcNqNFMXC57TjT24yv/NAlWahvDfTiHoCjyQLGMEmkyZXrjOwq07NyYEWiVuQKhRoDuyP/jAksaDSEI617rlOZLwEK8MIp7NKP9Y0wmSCR7SXUokF1V6S3zpDR6kyREGo0pIG5erPiQLrafCTzsFNmP918vE/7xebIL2Eyig2VZL4oiDkyIcoeR0OmKDF8mhJMFEtvRWSMFSYmjaeSh3CZ4ez75UXSPqm5p7X6Xb3auC7iKMBHMIxuHAODbiFJrSAwBge4RleLGE9Wa/W27y1ZBUz+/AL1vsXS9ON4w=</latexit>14
Grid search? ✘ “I want a graph with 20 edges per node on average”
15
δW ∗(θZ, 1, 1)
<latexit sha1_base64="c2wCPsZS8Zdve2aJXEJNekwqIw=">ACG3icbVDLSgMxFM34rPVdekmWAQVKTMqPnZFNy4rWCu2tWQytzY08yC5I5Rh/sONv+LGhSKuBf+jel0EF+HBA7n3Edy3EgKjb9Y2NT0xOTRdmirNz8wuLpaXlCx3GikOdhzJUly7TIEUAdRQo4TJSwHxXQsPtnwz9xi0oLcLgHAcRtH12E4iu4AyN1CntJK1sSKLAS1seSGRp43pr4eMPTDy1TZ1zNnslMp2xc5A/xInJ2WSo9YpvbW8kMc+BMgl07rp2BG2E6ZQcAlpsRVriBjvsxtoGhowH3Q7yfandN0oHu2GytwAaZ+70iYr/XAd02lz7Cnf3tD8T+vGWP3sJ2IoRAj5a1I0lxZAOg6KeUMBRDgxhXAnzVsp7TDGOJs5iFsLREPtfX/5LnYqzm5l72yvXD3O4yiQVbJGNohDkiVnJIaqRNO7sgDeSLP1r31aL1Yr6PSMSvWSE/YL1/AhXdoX8=</latexit>=
<latexit sha1_base64="0M450rhcbyoqjwRFIQqzrDBkTkg=">AB6HicbVDLSsNAFJ3UV62vqks3g0VwVRItPhZC0Y3LFuwD2lAm05t27GQSZiZCf0CNy4UcesnufNvnKRB1HrgwuGce7n3Hi/iTGnb/rQKS8srq2vF9dLG5tb2Tnl3r63CWFJo0ZCHsusRBZwJaGmOXQjCSTwOHS8yU3qdx5AKhaKOz2NwA3ISDCfUaKN1LwalCt21c6AF4mTkwrK0RiUP/rDkMYBCE05Uarn2JF2EyI1oxmpX6sICJ0QkbQM1SQAJSbZIfO8JFRhtgPpSmhcab+nEhIoNQ08ExnQPRY/fVS8T+vF2v/wk2YiGINgs4X+THOsTp13jIJFDNp4YQKpm5FdMxkYRqk0pC+Eyxdn3y4ukfVJ1Tqu1Zq1Sv87jKIDdIiOkYPOUR3dogZqIYoAPaJn9GLdW0/Wq/U2by1Y+cw+gXr/Qul54z5</latexit>δ = rα β θ = r 1 αβ
δ arg min
W 2Wm kW θZk1,1 1> log(W1) + 1
2kWk2
F
<latexit sha1_base64="b4qMB6tOhEkznTjOv5dALRLFfE=">ACfXicbVFdb9MwFHXC1yhfHTzuxaKatMGokm7a4G0aEuJxSHSZqLvIcW5a7YT2c6kyk1/Bb+MN/4KL+BkFTDGkSwdn3vP9b3XWSW4sVH0PQjv3L13/8HGw96jx0+ePutvPj8zZa0ZjFkpSn2eUQOCKxhbgWcVxqozAQk2eX7Np5cgTa8VJ/toKpDPFC86o9VLa/+pIV8RpyBuSg7C0IVTPiOQqdQnhyidQgZMmlQ1ekWVCGNfshsvOwbvwF7JMXbwX+7Q3K4yJpHaeFS5uLogtK0xEOdtJ/qi7ePV6hUmhKfM3N2pwW3yZfrgYpf1BNIw64NskXpMBWuM07X8jeclqCcoyQY2ZxFlp45qy5mApkdqAxVl3QGE08VlWCmrhuhwdteyXFRan+UxZ36t8NRacxCZj6zbd78G2vF/8UmtS3eTh1XVW1BseuHilpgW+L2K3DONTArFp5QprnvFbM59fuw/sN63RLetTj8PfJtcjYaxvDg08Hg+OT9To20BZ6iXZQjI7QMfqITtEYMfQjwMFu8Cr4GW6He+HwOjUM1p4X6AbCo1/ve8Ot</latexit>=
<latexit sha1_base64="0M450rhcbyoqjwRFIQqzrDBkTkg=">AB6HicbVDLSsNAFJ3UV62vqks3g0VwVRItPhZC0Y3LFuwD2lAm05t27GQSZiZCf0CNy4UcesnufNvnKRB1HrgwuGce7n3Hi/iTGnb/rQKS8srq2vF9dLG5tb2Tnl3r63CWFJo0ZCHsusRBZwJaGmOXQjCSTwOHS8yU3qdx5AKhaKOz2NwA3ISDCfUaKN1LwalCt21c6AF4mTkwrK0RiUP/rDkMYBCE05Uarn2JF2EyI1oxmpX6sICJ0QkbQM1SQAJSbZIfO8JFRhtgPpSmhcab+nEhIoNQ08ExnQPRY/fVS8T+vF2v/wk2YiGINgs4X+THOsTp13jIJFDNp4YQKpm5FdMxkYRqk0pC+Eyxdn3y4ukfVJ1Tqu1Zq1Sv87jKIDdIiOkYPOUR3dogZqIYoAPaJn9GLdW0/Wq/U2by1Y+cw+gXr/Qul54z5</latexit>arg min
W 2Wm kW Zk1,1 α1> log(W1) + β
2 kWk2
F
<latexit sha1_base64="FDxmbeAhHFR5y2fei3fbK9OeiI=">ACeXicbVHLbtNAFB2bVwmvtCxhMRAVBQqRnVY8dhVIiGWRSF2RSa3ryTgZdR7WzBgpmjfwLex40fYsGcRkApR7rS0bnve4tKcOuS5HsUX7l67fqNrZudW7fv3L3X3d45tro2lI2oFtqcFGCZ4IqNHeCnVSGgSwEy4qzd60/+8KM5Vp9couKTSTMFC85BRekvPuVgJkRyVXuM8KVJxQEzpcNnhFlhmh3FD8mSxznz5Pg/ZihUNMW9YbNm0IiGoODSYS3LwofdqcEqcrTISe9bM/6lO82lthUhqgF9IL5qDxw1Ag9Frm70+HebeXDJI18GWSbkgPbXCUd7+Rqa1ZMpRAdaO06RyEw/GcSpY0yG1ZRXQM5ixcaAKJLMTv56hwbtBmeJSm2DK4bX6d4YHae1CFiGy3cX+62vF/nGtStfTzxXVe2YoueNylpgp3H7BjzlhlEnFoEANTzMiukcwnlceFZnfYQ3LV7+XvkyOR4O0v3BwceD3uHbzTm20AP0GPVRil6hQ/QBHaERouhH9DajZ5EP+NHcT9+dh4aR5uc+gC4v1fZfnB3g=</latexit>W ∗(Z, α, β)
<latexit sha1_base64="J9ybjXLEzUtKIbV4rTCD9MZsA=">ACGHicbVDJSgNBEO2JW4xb1KOXxiBEkTijweUW9OIxglkwiaGmU0ma9Cx09whmM/w4q948aCI19z8GyeTIGp8UPB4r6qr69m+4Eqb5qeRmptfWFxKL2dWVtfWN7KbW1XlBZJhXnCk3UbFAruYkVzLbDuSwTHFlizB1djv/aAUnHPvdVDH1sO9Fze5Qx0LWzR7X7g/zdIQ2byVuhxE7UBOH3Ifoj2qgh2m9nc2bBTEBniTUlOTJFuZ0dNTseCx0NROgVMyfd0KQWrOBEaZqDQBzaAHjZi6oKDqhUmeyO6Fysd2vVkXK6mifpzIgRHqaFjx50O6L7643F/7xGoLvnrZC7fqDRZNF3UBQ7dFxSrTDJTIthjEBJn8V8r6IHpOMtMEsLFGKfJ8+S6nHBOikUb4q50uU0jTZIbskTyxyRkrkmpRJhTDySJ7JK3kznowX4934mLSmjOnMNvkFY/QF63igQ=</latexit>=
<latexit sha1_base64="0M450rhcbyoqjwRFIQqzrDBkTkg=">AB6HicbVDLSsNAFJ3UV62vqks3g0VwVRItPhZC0Y3LFuwD2lAm05t27GQSZiZCf0CNy4UcesnufNvnKRB1HrgwuGce7n3Hi/iTGnb/rQKS8srq2vF9dLG5tb2Tnl3r63CWFJo0ZCHsusRBZwJaGmOXQjCSTwOHS8yU3qdx5AKhaKOz2NwA3ISDCfUaKN1LwalCt21c6AF4mTkwrK0RiUP/rDkMYBCE05Uarn2JF2EyI1oxmpX6sICJ0QkbQM1SQAJSbZIfO8JFRhtgPpSmhcab+nEhIoNQ08ExnQPRY/fVS8T+vF2v/wk2YiGINgs4X+THOsTp13jIJFDNp4YQKpm5FdMxkYRqk0pC+Eyxdn3y4ukfVJ1Tqu1Zq1Sv87jKIDdIiOkYPOUR3dogZqIYoAPaJn9GLdW0/Wq/U2by1Y+cw+gXr/Qul54z5</latexit>δ only changes the scale
≤ δ
<latexit sha1_base64="0Zjg6AfxpmKw3BeQeQc1u+tfw/o=">ACA3icbVDLSsNAFJ34rPUVdaebwSK4KokWH7uiG5cV7AOaUCaTm3bo5OHMRCgh4MZfceNCEbf+hDv/xiQNotYDFw7n3Dt37nEizqQyjE9tbn5hcWm5slJdXVvf2NS3tjsyjAWFNg15KHoOkcBZAG3FIdeJID4DoeuM7M/e4dCMnC4EZNIrB9MgyYxyhRmTQdy0OtzixipcSAW6aWC5wRdJ0oNeMulEAzxKzJDVUojXQPyw3pLEPgaKcSNk3jUjZCRGKUQ5p1YolRISOyRD6GQ2ID9JOis0pPsgUF3uhyCpQuFB/TiTEl3LiO1mnT9RI/vVy8T+vHyvzE5YEMUKAjpd5MUcqxDngWCXCaCKTzJCqGDZXzEdEUGoymKrFiGc5zj5PnmWdI7q5nG9cd2oNS/KOCpoD+2jQ2SiU9REV6iF2oie/SIntGL9qA9a/a27R1TitndtAvaO9fMFaYqQ=</latexit>Only θ changes sparsity
16
So δ is not important. How do we find θ?
min
W 2Wm kW θZk1,1 1> log(W1) + 1
2kWk2
F
Take 1 node: 1 column
ignore symmetricity
min
w0
θw>z log(w>1) + 1 2kwk2
2.
Analyse role of θ on simpler problem!
17
Theorem: By setting in the range , has exactly non-zero elements.
θ k w∗
✓
1
√
kz2
k+1−bkzk+1 ,
1
√
kz2
k−bkzk
0.01 0.02 0.03 0.04
k
5 10 15 20 25
Measured sparsity Theoretical bounds
If θ in this range we obtain 10 non zeros
3
10-2 10-1
k
5 10 15 20 25
Measured sparsity Theoretical bounds
to obtain 10 edges/node θ must be in this range
18
“Failing” case:
19
3
100
k
5 10 15 20 25 30
Measured sparsity Theoretical bounds
3 10-6 10-5 k 5 10 15 20 25
1001 USPS images (non uniform)
Measured sparsity Theoretical bounds 3 10-3 10-2 k 5 10 15 20 25
400 ATT faces
Measured sparsity Theoretical bounds 3 101 102 k 5 10 15 20 25
X 9 N(0; 1)1000#2
Measured sparsity Theoretical bounds
“spherical” data (n = 260K)
20
“spherical” data (n = 260K, m = 2K)
21
Recovered with ANN
“spherical” data (n = 260K, m = 2K) Original 2-D manifold
22
“spherical” data (n = 260K)
23
Original 2-D manifold
“spherical” data (n = 260K)
24
“spherical” data (n = 4K, m = 2K)
25
“spherical” data (n = 4K, m = 2K) diameter =
diameter = n − 1
26
Word2vec (n = 10K, m = 300)
27 29 words 69 words
MNIST (n = 60K)
Label propagation (1% known labels)
28
LX
2 F
MNIST (n = 60K) tr (X⊤LX) = WZ
1,1
LX
2 F
1. Good manifold recovery 2. Scalable! ✔ per iteration ✔ (one time) 3. No need for parameter tuning ✔ Automatic parameter selection for desired sparsity
30
O(n log(n)m) O(nk)
Code: Matlab & Python (GSP box, pyGSP)
30