Large Graph Limits of Learning Algorithms Andrew M Stuart Computing - PowerPoint PPT Presentation

Large Graph Limits of Learning Algorithms Andrew M Stuart Computing and Mathematical Sciences, Caltech Andrea Bertozzi, Michael Luo (UCLA) and Kostas Zygalakis (Edinburgh) ⋆ Matt Dunlop (Caltech), Dejan Slepˇ cev (CMU) and Matt Thorpe (CMU) 1

References X Zhu, Z Ghahramani, J Lafferty, Semi-supervised learning using Gaussian fields and harmonic functions , ICML, 2003. Harmonic Functions. C Rasmussen and C Williams, Gaussian processes for machine learning , MIT Press, 2006. Probit. AL Bertozzi and A Flenner, Diffuse interface models on graphs for classification of high dimensional data , SIAM MMS, 2012. Ginzburg-Landau. MA Iglesias, Y Lu, AM Stuart, Bayesian level set method for geometric inverse problems , Interfaces and Free Boundaries, 2016. Level Set. AL Bertozzi, M Luo, AM Stuart and K Zygalakis, Uncertainty quantification in the classification of high dimensional data , https://arxiv.org/abs/1703.08816 , 2017. Probit on a graph. N Garcia-Trillos and D Slepˇ cev, A variational approach to the consistency of spectral clustering , ACHA, 2017. M Dunlop, D Slepˇ cev, AM Stuart and M Thorpe, Large data and zero noise limits of graph based semi-supervised learning algorithms , In preparation, 2017. N Garcia-Trillos, D Sanz-Alonso, Continuum Limit of Posteriors in Graph Bayesian Inverse Problems , https://arxiv.org/abs/1706.07193 , 2017. 2

Talk Overview Learning and Inverse Problems Optimization Theoretical Properties Probability Conclusions 3

Regression Let D ⊂ R d be a bounded open set. Let D ′ ⊂ D . Ill-Posed Inverse Problem Find u : D �→ R given x ∈ D ′ . y ( x ) = u ( x ) , Strong prior information needed. 5

Classification Let D ⊂ R d be a bounded open set. Let D ′ ⊂ D . Ill-Posed Inverse Problem Find u : D �→ R given x ∈ D ′ . � � y ( x ) = sign u ( x ) , Strong prior information needed. 6

y = sign ( u ) . Red = 1. Blue = − 1 . Yellow: no information. 7

Reconstruction of the function u on D 8

Graph Laplacian Graph Laplacian: Similarity graph G with n vertices Z = { 1 , . . . , n } . � � Weighted adjacency matrix W = { w j , k } , w j , k = η ε ( x j − x k ) . Diagonal D = diag { d jj } , d jj = � k ∈ Z w j , k . L = s n ( D − W ) (unnormalized); L ′ = D − 1 2 LD − 1 2 (normalized). Spectral Properties: L is positive semi-definite: � u , Lu � R n ∝ � j ∼ k w j , k | u j − u k | 2 . Lq j = λ j q j ; Fully connected ⇒ λ 1 > λ 0 = 0 . Fiedler Vector: q 1 . 10

Problem Statement (Optimization) Semi-Supervised Learning Input : � x j ∈ R d , � j ∈ Z := { 1 , . . . , n } Unlabelled data ; j ∈ Z ′ ⊆ Z � � y j ∈ {± 1 } , . Labelled data Output : � � y j ∈ {± 1 } , j ∈ Z Labels . Classification based on sign ( u ) , u the optimizer of: J ( u ; y ) = 1 2 � u , C − 1 u � R n + Φ( u ; y ) . u is an R − valued function on the graph nodes. C = ( L + τ 2 I ) − α � � from unlabelled data: w j , k = η ε ( x j − x k ) . Φ( u ; y ) links real-valued u to the binary-valued labels y . 11

Example: Voting Records U.S. House of Representatives 1984, 16 key votes. For each congress representative we have an associated feature vector x j ∈ R 16 such as x j = ( 1 , − 1 , 0 , · · · , 1 ) T ; 1 is “yes”, − 1 is “no” and 0 abstain/no-show. Hence d = 16 and n = 435 . Figure: Fiedler Vector and Spectrum (Normalized Case) 12

Probit Rasmussen and Williams, 2006. (MIT Press) Bertozzi, Luo, Stuart and Zygalakis, 2017. (arXiv) Probit Model p ( u ; y ) = 1 J ( n ) 2 � u , C − 1 u � R n + Φ ( n ) p ( u ; y ) . Here C = ( L + τ 2 I ) − α , Φ ( n ) � � � p ( u ; y ) := − log Ψ( y j u j ; γ ) j ∈ Z ′ and � v 1 − t 2 / 2 γ 2 � � Ψ( v ; γ ) = dt . exp � 2 πγ 2 −∞ 13

Level Set Iglesias, Lu and Stuart, 2016. (IFB) Level Set Model ls ( u ; y ) = 1 J ( n ) 2 � u , C − 1 u � R n + Φ ( n ) ls ( u ; y ) . Here C = ( L + τ 2 I ) − α , and 1 Φ ( n ) � | 2 . � � � ls ( u ; y ) := � y j − sign u j 2 γ 2 j ∈ Z ′ 14

Infimization Recall that both optimization problems have the form J ( n ) ( u ; y ) = 1 2 � u , C − 1 u � R n + Φ ( n ) ( u ; y ) . Indeed: Φ ( n ) � � � p ( u ; y ) := − log Ψ( y j u j ; γ ) j ∈ Z ′ and 1 Φ ( n ) � � | 2 . � � ls ( u ; y ) := � y j − sign u j 2 γ 2 j ∈ Z ′ Theorem 1 Probit: J p is convex. Level Set: J ls does not attain its infimum. 16

Limit Theorem for the Dirichlet Energy Garcia-Trillos and Slepˇ cev, 2016. (ACHA) Unlabelled data { x j } sampled i.i.d. from density ρ supported on bounded D ⊂ R d . Let ∂ u L u = − 1 � � ρ 2 ∇ u ρ ∇ · x ∈ D , ∂ n = 0 , x ∈ ∂ D . Theorem 2 2 Let s n = C ( η ) n ε 2 . Then under connectivity conditions on ε = ε ( n ) in η ε , the scaled Dirichlet energy Γ − converges in the TL 2 metric: 1 n � u , Lu � R n → � u , L u � L 2 as n → ∞ . ρ 17

Sketch Proof: Quadratic Forms on Graphs Discrete Dirichlet Energy � w j , k | u j − u k | 2 . � u , Lu � R n ∝ j ∼ k Figure: Connectivity Stencils For Orange Node: PDE, Data, Localized Data. 18

Sketch Proof: Limits of Quadratic Forms on Graphs Garcia-Trillos and Slepˇ cev, 2016. (ACHA) { x j } n j = 1 i.i.d. from density ρ on D ⊂ R d . � � | · | η ε = 1 w jk = η ε ( x j − x k ) , ε d η . ε Limiting Discrete Dirichlet Energy 1 � 2 ; � �� u , Lu � R n ∝ � η ε x j − x k � u ( x j ) − u ( x k ) n 2 ε 2 j ∼ k � u ( x ) − u ( y ) � � 2 �� n → ∞ ≈ η ε x − y ρ ( x ) ρ ( y ) dxdy ; � � ε � D D � |∇ u ( x ) | 2 ρ ( x ) 2 dx ∝ � u , L u � L 2 ε → 0 ≈ C ( η ) ρ . D 19

Limit Theorem for Probit M. Dunlop, D Slepˇ cev, AM Stuart and M Thorpe, In preparation 2017. Let D ± be two disjoint bounded subsets of D , define D ′ = D + ∪ D − and y ( x ) = + 1 , x ∈ D + ; y ( x ) = − 1 , x ∈ D − . For α > 0, define C = ( L + τ 2 I ) − α . Recall that C = ( L + τ 2 I ) − α . Theorem 3 2 Let s n = C ( η ) n ε 2 . Then under connectivity conditions on ε = ε ( n ) the scaled probit objective function Γ − converges in the TL 2 metric: 1 n J ( n ) p ( u ; y ) → J p ( u ; y ) n → ∞ , as where J p ( u ; y ) = 1 u , C − 1 u � � ρ + Φ p ( u ; y ) , L 2 2 � � � Φ p ( u ; y ) := − D ′ log Ψ( y ( x ) u ( x ) ; γ ) ρ ( x ) dx . 20

Problem Statement (Bayesian Formulation) Semi-Supervised Learning Input : � x j ∈ R d , � Unlabelled data j ∈ Z := { 1 , . . . , n } ; prior j ∈ Z ′ ⊆ Z � � Labelled data y j ) ∈ {± 1 } , . likelihood Output : � � Labels y j ∈ {± 1 } , j ∈ Z . posterior Connection between probability and optimization: J ( n ) ( u ; y ) = 1 2 � u , C − 1 u � R n + Φ ( n ) ( u ; y ) . − J ( n ) ( u ; y ) � � P ( u | y ) ∝ exp − Φ ( n ) ( u ; y ) � � ∝ exp × N ( 0 , C ) ∝ P ( y | u ) × P ( u ) . 22

Example of Underlying Gaussian (Voting Records) Figure: Two point correlation of sign ( u ) for 3 democrats 23

Probit (Continuum Limit) Let α > d 2 . Probit Probabilistic Model Prior: Gaussian P ( du ) = N ( 0 , C ) . � � Posterior: P γ ( du | y ) ∝ exp − Φ p ( u ; y ) P ( du ) . � � � Φ p ( u ; y ) := − D ′ log Ψ( y ( x ) u ( x ) ; γ ) ρ ( x ) dx . 24

Level Set (Continuum Limit) Let α > d 2 . Level Set Probabilistic Model Prior: Gaussian P ( du ) = N ( 0 , C ) . � � Posterior: P γ ( du | y ) ∝ exp − Φ ls ( u ; y ) P ( du ) . � 1 � 2 ρ ( x ) dx . � � �� Φ ls ( u ; y ) := � y ( x ) − sign u ( x ) 2 γ 2 D ′ 25

Connecting Probit, Level Set and Regression M. Dunlop, D Slepˇ cev, AM Stuart and M Thorpe, In preparation 2017. Theorem 4 Let α > d 2 . We have P γ ( u | y ) ⇒ P ( u | y ) as γ → 0 where P ( du | y ) ∝ 1 A ( u ) P ( du ) , P ( du ) = N ( 0 , C ) x ∈ D ′ } . � � A = { u : sign u ( x ) = y ( x ) , Compare with regression ( Zhu, Ghahramani, Lafferty 2003, (ICML): ) x ∈ D ′ } . A 0 = { u : u ( x ) = y ( x ) , 26

Example (PDE Two Moons – Unlabelled Data) Figure: Sampling density ρ of unlabelled data. 27

Example (PDE Two Moons – Label Data) Figure: Labelled Data. 28

Example (PDE Two Moons – Fiedler Vector of L ) Figure: Fiedler Vector. 29

Example (PDE Two Moons – Posterior Labelling) Figure: Posterior mean of u and sign ( u ) . 30

Example (One Data Point Makes All The Difference) Figure: Sampling density, Label Data 1, Label Data 2. 31

Large Graph Limits of Learning Algorithms Andrew M Stuart Computing - PowerPoint PPT Presentation

Large Graph Limits of Learning Algorithms Andrew M Stuart Computing and Mathematical Sciences, Caltech Andrea Bertozzi, Michael Luo (UCLA) and Kostas Zygalakis (Edinburgh) Matt Dunlop (Caltech), Dejan Slep cev (CMU) and Matt Thorpe

City Limits Lions Clubs City Limits Lions Clubs City Limits Lions Clubs City Limits Lions

Different Types of Limits Besides ordinary, two-sided limits, there are one-sided limits (left-

MAT 166 Calculus for Bus/Soc Chapter 3 Notes Limits The Deriviative David J. Gisch Limits

Graph Algorithms Chapter 22 1 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms?

Municipal Building Project Cynthia Stuart | Stuart Consulting Introductions Cynthia Stuart,

Large Graph Limits of Learning Algorithms Matt Dunlop, Xiyang (Michael) Luo Computing and

Graph Algorithms L.F.O.A. Lecture Full Of Acronyms Graph Search Algorithms The most basic graph

Limits (the size of the pie) allocation limits minimum reliability flow of supply Limits

Medical Programs Overview Table 1. Caption Medical SNAP TANF Programs Income Limits Income

Scope & Limits of Scope & Limits of Scope & Limits of Legal Authority Legal

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

DB server limits (process/sessions) DB server limits (process/sessions) Carlos Fernando Gamboa,

Graph Algorithms Graph Algorithms g Undirected: edge ( u , v ) = ( v , u ); for all v , ( v ,

Verified Graph Algorithms in ACL2 Nathan Guermond Kestrel Institute November 5, 2018 Another

Leveraging Graph Algorithms In Visualizations With Neovis.js William Lyon @lyonwj lyonwj.com

Dynamic Graph Algorithms Christian Wulff-Nilsen University of Copenhagen November 14 , 2019 1 /

Lecture 3: Perceptron Princeton University COS 495 Instructor: Yingyu Liang Perceptron Overview

Hardness and advantages of Module-SIS and Module-LWE Adeline Roux-Langlois EMSEC: Univ Rennes,

Approximate Dynamic Programming A. LAZARIC ( SequeL Team @INRIA-Lille ) ENS Cachan - Master 2 MVA

INQUIRY-BASED LEARNING PROBLEM SETS IN AN OUTREACH PROGRAM FOR HIGH SCHOOL GIRLS Increasing

Session #5: Learning With Errors Chris Peikert Georgia Institute of Technology Winter School on

Probabilistic Knowledge Bases Guy Van den Broeck First Conference on Automated Knowledge Base

Day 1: Introduction to Statistical Learning Lucas Leemann Essex Summer School Introduction to

MACHINE LEARNING Liviu Ciortuz Department of CS, University of Ia si, Rom ania 1. What is

Large Graph Limits of Learning Algorithms Andrew M Stuart Computing - PowerPoint PPT Presentation

Large Graph Limits of Learning Algorithms Andrew M Stuart Computing and Mathematical Sciences, Caltech Andrea Bertozzi, Michael Luo (UCLA) and Kostas Zygalakis (Edinburgh) Matt Dunlop (Caltech), Dejan Slep cev (CMU) and Matt Thorpe

City Limits Lions Clubs City Limits Lions Clubs City Limits Lions Clubs City Limits Lions

Different Types of Limits Besides ordinary, two-sided limits, there are one-sided limits (left-

MAT 166 Calculus for Bus/Soc Chapter 3 Notes Limits The Deriviative David J. Gisch Limits

Graph Algorithms Chapter 22 1 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms?

Municipal Building Project Cynthia Stuart | Stuart Consulting Introductions Cynthia Stuart,

Large Graph Limits of Learning Algorithms Matt Dunlop, Xiyang (Michael) Luo Computing and

Graph Algorithms L.F.O.A. Lecture Full Of Acronyms Graph Search Algorithms The most basic graph

Limits (the size of the pie) allocation limits minimum reliability flow of supply Limits

Medical Programs Overview Table 1. Caption Medical SNAP TANF Programs Income Limits Income

Scope &amp; Limits of Scope &amp; Limits of Scope &amp; Limits of Legal Authority Legal

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

DB server limits (process/sessions) DB server limits (process/sessions) Carlos Fernando Gamboa,

Graph Algorithms Graph Algorithms g Undirected: edge ( u , v ) = ( v , u ); for all v , ( v ,

Verified Graph Algorithms in ACL2 Nathan Guermond Kestrel Institute November 5, 2018 Another

Leveraging Graph Algorithms In Visualizations With Neovis.js William Lyon @lyonwj lyonwj.com

Dynamic Graph Algorithms Christian Wulff-Nilsen University of Copenhagen November 14 , 2019 1 /

Lecture 3: Perceptron Princeton University COS 495 Instructor: Yingyu Liang Perceptron Overview

Hardness and advantages of Module-SIS and Module-LWE Adeline Roux-Langlois EMSEC: Univ Rennes,

Approximate Dynamic Programming A. LAZARIC ( SequeL Team @INRIA-Lille ) ENS Cachan - Master 2 MVA

INQUIRY-BASED LEARNING PROBLEM SETS IN AN OUTREACH PROGRAM FOR HIGH SCHOOL GIRLS Increasing

Session #5: Learning With Errors Chris Peikert Georgia Institute of Technology Winter School on

Probabilistic Knowledge Bases Guy Van den Broeck First Conference on Automated Knowledge Base

Day 1: Introduction to Statistical Learning Lucas Leemann Essex Summer School Introduction to

MACHINE LEARNING Liviu Ciortuz Department of CS, University of Ia si, Rom ania 1. What is

Scope & Limits of Scope & Limits of Scope & Limits of Legal Authority Legal