ISOMAP and LLE 2019 Fisher 1922 ... the objective of - PowerPoint PPT Presentation

ISOMAP and LLE 姚遠 2019

Fisher 1922 ... the objective of statistical methods is the reduction of data. A quantity of data... is to be replaced by relatively few quantities which shall adequately represent ... the relevant information contained in the original data. Since the number of independent facts supplied in the data is usu- ally far greater than the number of facts sought, much of the information supplied by an actual sample is irrelevant. It is the object of the statistical process employed in the reduction of data to exclude this irrelevant information, and to isolate the whole of the relevant information contained in the data. – R . A . Fisher � 2

Python scikit-learn Manifold learning Toolbox http://scikit-learn.org/stable/modules/manifold.html • PCA/MDS(SMACOF algorithm, not spectral method) • ISOMAP/LLE (+MLLE) • Hessian Eigenmap • Laplacian Eigenmap • LTSA • tSNE 2 � 3

Matlab Dimensionality Reduction Toolbox http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_R • eduction.html Math.pku.edu.cn/teachers/yaoy/Spring2011/matlab/drtoolbox • – PrincipalFComponentFAnalysisF(PCA),FProbabilisticFPC – FactorFAnalysisF(FA),FSammon mapping,FLinearFDiscriminant AnalysisF(LDA) – MultidimensionalFscalingF(MDS),FIsomap,FLandmarkFIsomap – LocalFLinearFEmbeddingF(LLE),FLaplacian Eigenmaps,FHessianFLLE,FConformalFEigenmaps – LocalFTangentFSpaceFAlignmentF(LTSA),FMaximumFVarianceFUnfoldingF(extensionFofFLLE) – LandmarkFMVUF(LandmarkMVU),FFastFMaximumFVarianceFUnfoldingF(FastMVU) KernelFPCA – DiffusionFmaps – … – � 4

Recall: PCA • Principal Component Analysis (PCA) X p × n = [ X 1 X 2 ... X n ] One Dimensional Manifold

Recall: MDS • Given pairwise distances D, where D ij = d ij2 , the squared distance between point i and j – Convert the pairwise distance matrix D (c.n.d.) into the dot product matrix B (p.s.d.) • B ij (a) = -.5 H(a) D H’(a), Hölder matrix H(a) = I-1a’; • a = 1 k : B ij = -.5 (D ij - D ik – D jk ) $ ' N N N • a = 1/n: ∑ ∑ ∑ B ij = − 1 2 D ij − 1 − 1 1 D sj D it D st & + ) N N N 2 % ( s = 1 t = 1 s , t = 1 – Eigendecomposition of B = YY T If we preserve the pairwise Euclidean distances do we preserve the structure??

Nonlinear Manifolds.. A

Nonlinear Manifolds.. PCA and MDS see the Euclidean A distance

Nonlinear Manifolds.. PCA and MDS see the Euclidean A distance What is important is the geodesic distance Unfold the manifold

Intrinsic Description.. • To preserve structure , preserve the geodesic distance and not the Euclidean distance.

Manifold Learning Learning when data ∼ M ⊂ R N Clustering: M → { 1 , . . . , k } connected components, min cut Classification/Regression: M → { − 1 , +1 } or M → R P on M × { − 1 , +1 } or P on M × R Dimensionality Reduction: f : M → R n n << N M unknown: what can you learn about M from data? e.g. dimensionality, connected components holes, handles, homology curvature, geodesics

Generative Models in Manifold Learning

Spectral Geometric Embedding Given x 1 , . . . , x n ∈ M ⊂ R N , Find y 1 , . . . , y n ∈ R d where d < < N ISOMAP (Tenenbaum, et al, 00) LLE (Roweis, Saul, 00) Laplacian Eigenmaps (Belkin, Niyogi, 01) Local Tangent Space Alignment (Zhang, Zha, 02) Hessian Eigenmaps (Donoho, Grimes, 02) Diffusion Maps (Coifman, Lafon, et al, 04) Related: Kernel PCA (Schoelkopf, et al, 98)

Meta-Algorithm • Construct a neighborhood graph • Construct a positive semi-definite kernel • Find the spectrum decomposition Kernel Spectrum

Two Basic Geometric Embedding Methods: Science 2000 • Tenenbaum-de Silva-Langford Isomap Algorithm – Global approach. – On a low dimensional embedding • Nearby points should be nearby. • Faraway points should be faraway. • Roweis-Saul Locally Linear Embedding Algorithm – Local approach • Nearby points nearby

Isomap

Isomap • Estimate the geodesic distance between faraway points.

Isomap • Estimate the geodesic distance between faraway points. • For neighboring points Euclidean distance is a good approximation to the geodesic distance. • For faraway points estimate the distance by a series of short hops between neighboring points. – Find shortest paths in a graph with edges connecting neighboring data points

Isomap • Estimate the geodesic distance between faraway points. • For neighboring points Euclidean distance is a good approximation to the geodesic distance. • For faraway points estimate the distance by a series of short hops between neighboring points. – Find shortest paths in a graph with edges connecting neighboring data points Once we have all pairwise geodesic distances use classical metric MDS

Isomap - Algorithm • Construct an n-by-n neighborhood graph – connecting points whose distances are within a fixed radius. – K nearest neighbor graph • Compute the shortest path (geodesic) distances between nodes: D – Floyd’s Algorithm (O( N 3 )) – Dijkstra’s Algorithm (O( kN 2 logN)) • Construct a lower dimensional embedding. – Classical MDS (K = -0.5 H D H’ = U S U’)

Isomap

Example…

Residual Variance vs. Intrinsic Dimension Face Images SwisRoll Hand Images 2

ISOMAP on Alanine-dipeptide ISOMAP 3D embedding with RMSD metric on 3900 Kcenters

Convergence of ISOMAP • ISOMAP has provable convergence guarantees; • Given that { x i } is sampled sufficiently dense, graph shortest path distance will approximate closely the original geodesic distance as measured in manifold M ; • But ISOMAP may suffer from nonconvexity such as holes on manifolds

Two step approximations

Convergence Theorem   [Bernstein, de Silva, Langford, Main Theorem Theorem 1: Let M be a compact submanifold of R n and let { x i } be a finite set of data points in M. We are given a graph G on { x i } and positive real numbers ⌅ 1 , ⌅ 2 < 1 and ⇥ , ⇤ > 0. Suppose: 1. G contains all edges ( x i , x j ) of length ⌅ x i � x j ⌅ ⇥ ⇤ . 2. The data set { x i } statisfies a ⇥ -sampling condition – for every point m ⇤ M there exists an x i such that d M ( m , x i ) < ⇥ . 3. M is geodesically convex – the shortest curve joining any two points on the surface is a geodesic curve. ⇧ 24 ⌅ 1 , where r 0 is the minimum radius of curvature of M – 4. ⇤ < ( 2 / ⇧ ) r 0 1 r 0 = max γ , t ⌅ � 00 ( t ) ⌅ where � varies over all unit-speed geodesics in M. 5. ⇤ < s 0 , where s 0 is the minimum branch separation of M – the largest positive number for which ⌅ x � y ⌅ < s 0 implies d M ( x , y ) ⇥ ⇧ r 0 . 6. ⇥ < ⌅ 2 ⇤ / 4. Then the following is valid for all x , y ⇤ M , ( 1 � ⌅ 1 ) d M ( x , y ) ⇥ d G ( x , y ) ⇥ ( 1 + ⌅ 2 ) d M ( x , y )

Probabilistic Result I So, short Euclidean distance hops along G approximate well actual geodesic distance as measured in M. I What were the main assumptions we made? The biggest one was the δ -sampling density condition. I A probabilistic version of the Main Theorem can be shown where each point x i is drawn from a density function. Then the approximation bounds will hold with high probability. Here’s a truncated version of what the theorem looks like now: Asymptotic Convergence Theorem: Given λ 1 , λ 2 , µ > 0 then for density function α sufficiently large: 1 − λ 1 ≤ d G ( x , y ) d M ( x , y ) ≤ 1 + λ 2 will hold with probability at least 1 − µ for any two data points x, y.

A Shortcoming of ISOMAP • One need to compute pairwise shortest path between all sample pairs (i,j) – Global – Non-sparse – Cubic complexity O(N 3 )

Landmark ISOMAP: Nystrom Extension Method I ISOMAP out of the box is not scalable. Two bottlenecks: I All pairs shortest path - O ( kN 2 log N ) . I MDS eigenvalue calculation on a full NxN matrix - O ( N 3 ) . I For contrast, LLE is limited by a sparse eigenvalue computation - O ( dN 2 ) . I Landmark ISOMAP (L-ISOMAP) Idea: I Use n << N landmark points from { x i } and compute a n x N matrix of geodesic distances, D n , from each data point to the landmark points only. I Use new procedure Landmark-MDS (LMDS) to find a Euclidean embedding of all the data – utilizes idea of triangulation similar to GPS. I Savings: L-ISOMAP will have shortest paths calculation of O ( knN log N ) and LMDS eigenvalue problem of O ( n 2 N ) .

Landmark Choice • Random • MiniMax: k-center • Hierarchical landmarks: cover-tree • Nyström extension method

Locally Linear Embedding manifold is a topological space which is locally Euclidean .” Fit Locally, Think Globally

Fit Locally… We expect each data point and its neighbours to lie on or close to a locally linear patch of the manifold. Each point can be written as a linear combination of its neighbors. The weights are chosen to minimize the reconstruction Error. Derivation on board

Important property...

ISOMAP and LLE 2019 Fisher 1922 ... the objective of - PowerPoint PPT Presentation

ISOMAP and LLE 2019 Fisher 1922 ... the objective of statistical methods is the reduction of data. A quantity of data... is to be replaced by relatively few quantities which shall adequately represent ... the relevant information

Stude nt Suc c e ss Sc ore c a rd Stude nt Suc c e ss Sc ore c a rd I I rvine Va lle y Co lle

ISOMAP and LLE 2020 Fisher 1922 ... the objective of statistical methods is the

Nonlinear Manifold Learning Part One: Background, LLE, IsoMap 6.454 Area One Seminar October 8

VACUUM E XCE LLE NCE DE FINE D VACUUM E XCE LLE NCE DE FINE D Cutting

Wha t is Co lle g e Cre dit Plus (CCP)? Why CCP a t T ri-C? ACCE SS Be ne fits a nd Cha lle

L a Sa lle Ba nk & T rust # 5200 WH F to PUD fo r a n Anima l Ho spita l* L a Sa lle

T he US Cybe r Cha lle ng e U S Cyb e r Cha lle ng e : De ve lo p ing the Ne xt Ge ne ra

De bug g ing L a rg e S c a le a nd Hybrid P a ra lle l C ode Ma rk O'C onnor m a rk@ a

Pat Patch ch-based based Ei Eigen en-fac ace e Isomap omap Netw Networ orks ks By:

WHS SE NI ORS AGE NDA Re vie w Gra dua tio n Re q uire me nts Disc uss Co lle g e / Ca

OVE RVI E W I NSURT E CH & T E L E MAT I CS Dr. Mic he lle O sb o rne , MBA, C

Hopstech Indu Hopstech Indu Industries Limited Industries Limited Assure Exce lle nce o

T eam- I nitiated P roblem S olving II (TIPS II) Cha nda T e lle e n, E duc a tio na l Co

JACK CKSONVILLE LLE 2 2017 AN OPPORTUNITY FOR HISTORIC PRESERVATION & REVITALIZATION

Yo uth in the 21 st Ce ntury: Ho w will the y me e t the c ha lle ng e s? Ma ure e n E . K e

Ric ha rd Ro se a nd F e lic ia T ripp Co lle g e Co a c he s Ma king Wa ve s F o unda tio

Roadmap DNA RNA Protein Central dogma Genetic code Gene structure

Computing G. Bel Enguix, M.D. Jimenez Lopez, Veronica Dahl Chair of Excellence, E.C. Professor

P ROTEIN -L IGAND S TANDARD B INDING F REE -E NERGY C ALCULATIONS Chris Chipot Laboratoire

A d a p t i v e Mu l t i l e v e l S p l i t t i n g Me t h o d : I

Joint use of SAXS and crystallography Rob Meijers EMBL Hamburg Outstation SAXS EMBO Course EMBL

Information Storage and Processing in Biological Systems: A seminar course for the Natural

Disclosures I have no disclosures 2 1 12/13/19 ARS Question Which resistance test do you

In In sit itu u po poly lymeriza merization tion in in 3D 3D po porou ous ma mate teri

ISOMAP and LLE 2019 Fisher 1922 ... the objective of - PowerPoint PPT Presentation

ISOMAP and LLE 2019 Fisher 1922 ... the objective of statistical methods is the reduction of data. A quantity of data... is to be replaced by relatively few quantities which shall adequately represent ... the relevant information

Stude nt Suc c e ss Sc ore c a rd Stude nt Suc c e ss Sc ore c a rd I I rvine Va lle y Co lle

ISOMAP and LLE 2020 Fisher 1922 ... the objective of statistical methods is the

Nonlinear Manifold Learning Part One: Background, LLE, IsoMap 6.454 Area One Seminar October 8

VACUUM E XCE LLE NCE DE FINE D VACUUM E XCE LLE NCE DE FINE D Cutting

Wha t is Co lle g e Cre dit Plus (CCP)? Why CCP a t T ri-C? ACCE SS Be ne fits a nd Cha lle

L a Sa lle Ba nk &amp; T rust # 5200 WH F to PUD fo r a n Anima l Ho spita l* L a Sa lle

T he US Cybe r Cha lle ng e U S Cyb e r Cha lle ng e : De ve lo p ing the Ne xt Ge ne ra

De bug g ing L a rg e S c a le a nd Hybrid P a ra lle l C ode Ma rk O'C onnor m a rk@ a

Pat Patch ch-based based Ei Eigen en-fac ace e Isomap omap Netw Networ orks ks By:

WHS SE NI ORS AGE NDA Re vie w Gra dua tio n Re q uire me nts Disc uss Co lle g e / Ca

OVE RVI E W I NSURT E CH &amp; T E L E MAT I CS Dr. Mic he lle O sb o rne , MBA, C

Hopstech Indu Hopstech Indu Industries Limited Industries Limited Assure Exce lle nce o

T eam- I nitiated P roblem S olving II (TIPS II) Cha nda T e lle e n, E duc a tio na l Co

JACK CKSONVILLE LLE 2 2017 AN OPPORTUNITY FOR HISTORIC PRESERVATION &amp; REVITALIZATION

Yo uth in the 21 st Ce ntury: Ho w will the y me e t the c ha lle ng e s? Ma ure e n E . K e

Ric ha rd Ro se a nd F e lic ia T ripp Co lle g e Co a c he s Ma king Wa ve s F o unda tio

Roadmap DNA RNA Protein Central dogma Genetic code Gene structure

Computing G. Bel Enguix, M.D. Jimenez Lopez, Veronica Dahl Chair of Excellence, E.C. Professor

P ROTEIN -L IGAND S TANDARD B INDING F REE -E NERGY C ALCULATIONS Chris Chipot Laboratoire

A d a p t i v e Mu l t i l e v e l S p l i t t i n g Me t h o d : I

Joint use of SAXS and crystallography Rob Meijers EMBL Hamburg Outstation SAXS EMBO Course EMBL

Information Storage and Processing in Biological Systems: A seminar course for the Natural

Disclosures I have no disclosures 2 1 12/13/19 ARS Question Which resistance test do you

In In sit itu u po poly lymeriza merization tion in in 3D 3D po porou ous ma mate teri

L a Sa lle Ba nk & T rust # 5200 WH F to PUD fo r a n Anima l Ho spita l* L a Sa lle

OVE RVI E W I NSURT E CH & T E L E MAT I CS Dr. Mic he lle O sb o rne , MBA, C

JACK CKSONVILLE LLE 2 2017 AN OPPORTUNITY FOR HISTORIC PRESERVATION & REVITALIZATION