Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The - PowerPoint PPT Presentation

Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani

Ubiquity of manifolds � In many domains (e.g., speech, some vision problems) data explicitly lies on a manifold.

Ubiquity of manifolds � In many domains (e.g., speech, some vision problems) data explicitly lies on a manifold. � For all sources of high-dimensional data, true dimensionality is much lower than the number of features.

Ubiquity of manifolds � In many domains (e.g., speech, some vision problems) data explicitly lies on a manifold. � For all sources of high-dimensional data, true dimensionality is much lower than the number of features. � Much of the data is highly nonlinear.

Manifold Learning Important point: only small distances are meaningful. In fact, all large distances are (almost) the same.

Manifold Learning Important point: only small distances are meaningful. In fact, all large distances are (almost) the same. Manifolds (Riemannian manifolds with a measure + noise) provide a natural mathematical language for thinking about high-dimensional data.

Manifold Learning Learning when data ∼ M ⊂ R N � Clustering: M → { 1 , . . . , k } connected components, min cut, normalized cut � Classification/Regression: M → {− 1 , +1 } or M → R P on M × {− 1 , +1 } or P on M × R � Dimensionality Reduction: f : M → R n n << N � M unknown: what can you learn about M from data? e.g. dimensionality, connected components holes, handles, homology curvature, geodesics

Graph-based methods Data ——– Probability Distribution Graph ——– Manifold

Graph-based methods Data ——– Probability Distribution Graph ——– Manifold Graph extracts underlying geometric structure.

Problems of machine learning � Classification / regression. � Data representation / dimensionality reduction. � Clustering. Common intuition – similar objects have similar labels.

Intuition

Intuition Geometry of data changes our notion of similarity.

Manifold assumption

Manifold assumption Geometry is important.

Manifold assumption Manifold/geometric assumption: functions of interest are smooth with respect to the underlying geometry.

Manifold assumption Manifold/geometric assumption: functions of interest are smooth with respect to the underlying geometry. Probabilistic setting: Map X → Y . Probability distribution P on X × Y . Regression/(two class)classification: X → R .

Manifold assumption Manifold/geometric assumption: functions of interest are smooth with respect to the underlying geometry. Probabilistic setting: Map X → Y . Probability distribution P on X × Y . Regression/(two class)classification: X → R . Probabilistic version: conditional distributions P ( y | x ) are smooth with respect to the marginal P ( x ) .

What is smooth? Function f : X → R . Penalty at x ∈ X : � 1 ( f ( x ) − f ( x + δ )) 2 p ( x ) d δ ≈ �∇ f � 2 p ( x ) δ k small δ Total penalty – Laplace operator: � �∇ f � 2 p ( x ) = � f, ∆ p f � X X

What is smooth? Function f : X → R . Penalty at x ∈ X : � 1 ( f ( x ) − f ( x + δ )) 2 p ( x ) d δ ≈ �∇ f � 2 p ( x ) δ k small δ Total penalty – Laplace operator: � �∇ f � 2 p ( x ) = � f, ∆ p f � X X Two-class classification – conditional P (1 | x ) . Manifold assumption: � P (1 | x ) , ∆ p P (1 | x ) � X is small.

Laplace operator Laplace operator is a fundamental geometric object. k ∂ 2 f � ∆ f = − ∂x 2 i i =1 The only differential operator invariant under translations and rotations. Heat, Wave, Schroedinger equations. Fourier analysis.

Laplacian on the circle φ − d 2 f dφ 2 = λf where f (0) = f (2 π ) Same as in R with periodic boundary conditions. Eigenvalues: λ n = n 2 Eigenfunctions: sin( nφ ) , cos( nφ ) Fourier analysis.

Laplace-Beltrami operator �� x p �� 1 �� x �� f : M k → R �� 2 �� exp p : T p M k → M k �� ∂ 2 f (exp p ( x )) � ∆ M f ( p ) = − ∂x 2 i i Generalization of Fourier analysis.

Key learning question Machine learning: manifold is unknown. How to do Fourier analysis/reconstruct Laplace operator on an unknown manifold?

Algorithmic framework

Algorithmic framework � xi − xj � 2 W ij = e − [justification: heat equation] t � xi − xj � 2 � xi − xj � 2 � � e − f ( x j ) e − Lf ( x i ) = f ( x i ) − t t j j � xi − xj � 2 � f t L f = 2 e − ( f i − f j ) 2 t i ∼ j

Data representation f : G → R Minimize � i ∼ j w ij ( f i − f j ) 2 Preserve adjacency. Solution: Lf = λf (slightly better Lf = λDf ) Lowest eigenfunctions of L ( ˜ L ). Laplacian Eigenmaps Related work: LLE: Roweis, Saul 00; Isomap: Tenenbaum, De Silva, Langford 00 Hessian Eigenmaps: Donoho, Grimes, 03; Diffusion Maps: Coifman, et al, 04

Laplacian Eigenmaps � Visualizing spaces of digits and sounds. Partiview, Ndaona, Surendran 04 � Machine vision: inferring joint angles. Corazza, Andriacchi, Stanford Biomotion Lab, 05, Partiview, Surendran Isometrically invariant representation. [link] � Reinforcement Learning: value function approximation. Mahadevan, Maggioni, 05

Semi-supervised learning Learning from labeled and unlabeled data. � Unlabeled data is everywhere. Need to use it. � Natural learning is semi-supervised.

Semi-supervised learning Learning from labeled and unlabeled data. � Unlabeled data is everywhere. Need to use it. � Natural learning is semi-supervised. Labeled data: ( x 1 , y 1 ) , . . . , ( x l , y l ) ∈ R N × R Unlabeled data: x l +1 , . . . , x l + u ∈ R N Need to reconstruct f L,U : R N → R

Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The - PowerPoint PPT Presentation

Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani Ubiquity of manifolds In many domains

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

Mining Data Graphs Semi-supervised learning, label propagation, Web Search Data graphs Data

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

5 Semi-Supervised Learning BVM Tutorial: Advanced Deep Learning Methods David Zimmerer, Division

Algorithms for Lipschitz Learning on Graphs Sushant Sachdeva Yale Institute of Network Sciences

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Schoenberg: from metric geometry to matrix positivity Apoorva Khare Indian Institute of Science

T HE WORK OF S TEFAN P APADIMA IN TOPOLOGY AND GEOMETRY Alexandru Suciu Northeastern

Geometry of Gaussoids Bernd Sturmfels MPI Leipzig and UC Berkeley p 3 p 13 a 12 | 3 p 23 p 123 a

8. Geometric problems extremal volume ellipsoids centering classification placement

Enumerating numerical semigroups using polyhedral geometry Christopher ONeill San Diego State

Geometry of Optimal Coverage for Space-based Targets for Space based Targets with Visibility

Simple feature geometry and tidycensus Kyle Walker Instructor DataCamp Analyzing US Census

Spatial trajectories in Boost Geometry Vissarion Fisikopoulos FOSDEM 2020 Boost.Geometry