Poisson Learning: Graph-based semi-supervised learning at very low - PowerPoint PPT Presentation

Poisson Learning: Graph-based semi-supervised learning at very low label rates Jeff Calder 1 , Brendan Cook 1 , Matthew Thorpe 2 , and Dejan Slepˇ cev 3 1 School of Mathematics, University of Minnesota 2 Department of Mathematics, University of Manchester 3 Department of Mathematical Sciences, Carnegie Mellon University International Conference on Machine Learning (ICML) July 12-18, 2020 Research supported by the National Science Foundation, European Research Council, and a University of Minnesota Grant in Aid award. Calder et al. (UofM) Poisson Learning ICML 2020 1 / 46

Outline 1 Introduction Graph-based semi-supervised learning Laplace learning/Label propagation Degeneracy in Laplace learning 2 Poisson learning Random walk perspective Variational interpretation 3 Experimental results Algorithmic details Datasets and algorithms Results 4 References Calder et al. (UofM) Poisson Learning ICML 2020 2 / 46

Graph-based semi-supervised learning Graph: G = ( X , W ) X = { x 1 , . . . , x n } are the vertices of the graph W = ( w ij ) n i , j = 1 are nonnegative and symmetric ( w ij = w ji ) edge weights. w ij ≈ 1 if x i , x j similar, and w ij ≈ 0 when dissimilar. Labels: We assume the first m ≪ n vertices are given labels y 1 , y 2 , . . . , y m ∈ { e 1 , e 2 , . . . , e k } ∈ R k . Task: Extend the labels to the rest of the vertices x m + 1 , . . . , x n . Semi-supervised smoothness assumption Similar points x i , x j ∈ X in high density regions of the graph should have similar labels. Laplace Learning/Label Propagation: Original work [Zhu et al., 2003] Learning [Zhou et al., 2005][Ando and Zhang, 2007] Manifold ranking [He et al., 2006] [Zhou et al., 2011] [Xu et al., 2011] Calder et al. (UofM) Poisson Learning ICML 2020 3 / 46

Laplace learning/Label propagation Laplacian regularized semi-supervised learning solves the Laplace equation � L u ( x i ) = 0 , if m + 1 ≤ i ≤ n , u ( x i ) = y i , if 1 ≤ i ≤ m , where u : X → R k , and L is the graph Laplacian n � L u ( x i ) = w ij ( u ( x i ) − u ( x j )) . j = 1 The label decision for vertex x i is determined by the largest component of u ( x i ) ℓ ( x i ) = argmax { u j ( x ) } . j ∈{ 1 ,..., k } Calder et al. (UofM) Poisson Learning ICML 2020 4 / 46

Label propagation The solution of Laplace learning satisfies n � L u ( x i ) = w ij ( u ( x i ) − u ( x j )) = 0 . ( m + 1 ≤ i ≤ n ) j = 1 Re-arranging, we see that u satisfies the mean-property � n j = 1 w ij u ( x j ) u ( x i ) = . � n j = 1 w ij Label propagation [Zhu 2005] iterates � n j = 1 w ij u k ( x j ) u k + 1 ( x i ) = , � n j = 1 w ij and at convergence is equivalent to Laplace learning. Calder et al. (UofM) Poisson Learning ICML 2020 5 / 46

Ill-posed with small amount of labeled data 1 1 0.9 0.8 0.8 0.7 0.6 0.6 0.4 0.5 0.4 0.2 0.3 0 1 0.2 0.8 0.1 0.6 0.8 0.6 0.4 0 0.4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.2 0 0 Graph is n = 10 5 i.i.d. random variables uniformly drawn from [ 0 , 1 ] 2 . w xy = 1 if | x − y | < 0 . 01 and w xy = 0 otherwise. Two labels: y 1 = 0 at the Red point and y 2 = 1 at the Green point. Over 95% of labels in [ 0 . 4975 , 0 . 5025 ]. [Nadler et al., 2009][El Alaoui et al., 2016] Calder et al. (UofM) Poisson Learning ICML 2020 6 / 46

MNIST (70,000 28 × 28 pixel images of digits 0 - 9 ) [Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. “Gradient-based learning applied to document recognition.” Proceedings of the IEEE, 86(11):2278-2324, November 1998.] Calder et al. (UofM) Poisson Learning ICML 2020 7 / 46

Laplace learning on MNIST # Labels/class 1 2 3 4 5 Laplace 16.1 (6.2) 28.2 (10) 42.0 (12) 57.8 (12) 69.5 (12) Graph NN 58.8 (5.6) 66.6 (2.8) 70.2 (4) 71.3 (2.6) 73.4 (1.9) # Labels/class 10 50 100 500 1000 Laplace 93.2 (2.3) 96.9 (0.1) 97.1 (0.1) 97.6 (0.1) 97.7 (0.0) Graph NN 82.3 (1.0) 89.0 (0.5) 90.6 (0.4) 93.4 (0.1) 93.7 (0.1) Average accuracy over 10 trials with standard deviation in brackets. Graph NN: 1-nearest neighbor using graph geodesic distance. Calder et al. (UofM) Poisson Learning ICML 2020 8 / 46

Recent work The low-label rate problem was originally identified in [Nadler 2009]. A lot of recent work has attempted to address this issue with new graph-based classification algorithms at low label rates. Higher-order regularization: [Zhou and Belkin, 2011], [Dunlop et al., 2019] p -Laplace regularization: [Alaoui et al., 2016], [Calder 2018,2019], [Slepcev & Thorpe 2019] Re-weighted Laplacians: [Shi et al., 2017], [Calder & Slepcev, 2019] Centered kernel method: [Mai & Couillet, 2018] While we have lots of new models, the problem with Laplace learning at low label rates was still not well-understood. In this talk: We explain the degeneracy in terms of random walks. 1 We propose a new algorithm: Poisson learning. 2 Calder et al. (UofM) Poisson Learning ICML 2020 9 / 46

Outline 1 Introduction Graph-based semi-supervised learning Laplace learning/Label propagation Degeneracy in Laplace learning 2 Poisson learning Random walk perspective Variational interpretation 3 Experimental results Algorithmic details Datasets and algorithms Results 4 References Calder et al. (UofM) Poisson Learning ICML 2020 10 / 46

Poisson learning We propose to replace Laplace learning � L u ( x i ) = 0 , if m + 1 ≤ i ≤ n , (1) (Laplace equation) u ( x i ) = y i , if 1 ≤ i ≤ m , with Poisson learning m � (Poisson equation) L u ( x i ) = ( y j − c ) δ ij for i = 1 , . . . , n j = 1 subject to � n 1 � m i = 1 d i u ( x i ) = 0 , where c = i = 1 y i . m In both cases, the label decision is the same: ℓ ( x i ) = argmax { u j ( x ) } . j ∈{ 1 ,..., k } Calder et al. (UofM) Poisson Learning ICML 2020 11 / 46

Poisson learning We propose to replace Laplace learning � L u ( x i ) = 0 , if m + 1 ≤ i ≤ n , (2) (Laplace equation) u ( x i ) = y i , if 1 ≤ i ≤ m , with Poisson learning m � (Poisson equation) L u ( x i ) = ( y j − c ) δ ij for i = 1 , . . . , n j = 1 subject to � n � m 1 i = 1 d i u ( x i ) = 0 , where c = i = 1 y i . m For Poisson learning, unbalanced class sizes can be incorporated: � p j � ℓ ( x i ) = argmax n j u j ( x ) , j ∈{ 1 ,..., k } p j = Fraction of data in class j n j = Fraction of training data from class j . Calder et al. (UofM) Poisson Learning ICML 2020 12 / 46

Random Walk Perspective Suppose u solves the Laplace learning equation � L u ( x i ) = 0 , if m + 1 ≤ i ≤ n , u ( x i ) = y i , if 1 ≤ i ≤ m . Let x ∈ X and let X 0 , X 1 , X 2 , . . . be a random walk on X with transition probabilities n P ( X k = x j | X k − 1 = x i ) = w ij � where d i = w ij . d i j = 1 Define the stopping time to be the first time the walk hits a label, that is τ = inf { k ≥ 0 : X k ∈ { x 1 , x 2 , . . . , x m }} . Let i τ ≤ m so that X τ = x i τ . Then (by Doob’s optimal stopping theorem) (3) u ( x ) = E [ y i τ | X 0 = x ] . Calder et al. (UofM) Poisson Learning ICML 2020 13 / 46

Classification experiment Calder et al. (UofM) Poisson Learning ICML 2020 14 / 46

Random walk experiment Calder et al. (UofM) Poisson Learning ICML 2020 15 / 46

Classification experiment Calder et al. (UofM) Poisson Learning ICML 2020 16 / 46

The Random walk perspective At low label rates, the random walker reaches the mixing time before hitting a label. The label eventually hit is largely independent of where the walker starts. After walking for a long time, the probability distribution of the walker approaches the invariant distribution π given by d i π i = j = 1 d j . � n Thus, the solution of Laplace learning is approximately � n j = 1 d j y j =: c ∈ R k . u ( x i ) = E [ y i τ | X 0 = x i ] ≈ � n j = 1 d j Bottom line: Nearly everything is labeled by the one-hot vector closest to c ! Calder et al. (UofM) Poisson Learning ICML 2020 17 / 46

The random walk perspective x j x j x j Let X 0 , X 1 , X be a random walk on the graph X starting from x j ∈ X , and define 2 � T m � � � u T ( x i ) = E y j 1 { X . xj k = x i } k = 0 j = 1 Idea: We release random walkers from the labeled nodes, and record how often each label’s walker visits x i . We can write m T � � x j u T ( x i ) = P ( X = x i ) . y j k j = 1 k = 0 The inner term is a Green’s function for a random walk. As T → ∞ , u T → ∞ . We center u T by its mean value: n T m T m where c = 1 � � � � � u T ( x i ) = y j = mc , y j . m i = 1 j = 1 j = 1 k = 0 k = 0 Calder et al. (UofM) Poisson Learning ICML 2020 18 / 46

The random walk perspective Subtracting off the mean of u T , and normalizing by d i , we arrive at � T m � m 1 where c = 1 � � � u T ( x i ) := E ( y j − c ) 1 { X , y j . xj d i k = x i } m j = 1 j = 1 k = 0 Theorem For every T ≥ 0 we have � m � u T + 1 ( x i ) = u T ( x i ) + 1 � ( y j − c ) δ ij − L u T ( x i ) . d i j = 1 If the graph G is connected and the Markov chain induced by the random walk is aperiodic, then u T → u as T → ∞ , where u : X → R is the solution of m � L u ( x i ) = ( y j − c ) δ ij for i = 1 , . . . , n j = 1 satisfying � n i = 1 d i u ( x i ) = 0 . Calder et al. (UofM) Poisson Learning ICML 2020 19 / 46

Poisson Learning: Graph-based semi-supervised learning at very low - PowerPoint PPT Presentation

Poisson Learning: Graph-based semi-supervised learning at very low label rates Jeff Calder 1 , Brendan Cook 1 , Matthew Thorpe 2 , and Dejan Slep cev 3 1 School of Mathematics, University of Minnesota 2 Department of Mathematics, University of

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Poisson Distribution: Review Poisson Over Time Let B 1 Poisson( ) be the number of bikes

Shoestring: Graph-Based Semi- Supervised Classification with Severely Limited Labeled Data Wanyu

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Randomness in Computing L ECTURE 14 Last time Poisson distribution Poisson approximation

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

Link prediction in graph construction for supervised and semi-supervised learning Lilian Berton,

15. Poisson Processes In Lecture 4, we introduced Poisson arrivals as the limiting behavior of

Simulating events: the Poisson process Michel Bierlaire michel.bierlaire@epfl.ch Transport and

Poisson Regression Models for Count Data Outline Review Introduction to Poisson

An Active Learning Approach to STEM Writing Intensive Courses Dr. Corey Ptak Director of

Working Together to Manage Substance Use November 12, 2008 and Mental Health Issues Wednesday, 25

Transitioning t to Industry 4 y 4.0 Event Sponsors Welcome and Introduction Wabash

THE gareth@fathom.pro @dunlop71 MAGIC OF UX exposing bias through the medium of card magic

Digital Modernization for Government Drupal GovCon2019 Forum One is a full-service digital

A LARGE-SCALE VIEW OF THE DISTANT UNIVERSE STEVE FINKELSTEIN THE UNIVERSITY OF TEXAS AT AUSTIN

CSEE 6861 CAD of Digital Systems Handout: Lecture #1 1/21/16 Prof. Steven M. Nowick

Peter W. Sauer University of Illinois at Urbana-Champaign psauer@illinois.edu (In collaboration