A Geometric View to Optimal Transportation and Generative Model - PowerPoint PPT Presentation

A Geometric View to Optimal Transportation and Generative Model David Xianfeng Gu 1 1 Computer Science & Applied Mathematics SUNY at Stony Brook University Center of Mathematical Sciences and Appications Harvard University Geometric Computation and Applications Trinity College, Dublin, Ireland David Gu Geometric Understanding

Thanks Thanks for the invitation. David Gu Geometric Understanding

Collaborators These projects are collaborated with Shing-Tung Yau, Feng Luo, Zhongxuan Luo, Na Lei, Dimitris Samaras and so on. David Gu Geometric Understanding

Outline Why dose DL work? 1 How to quantify the learning capability of a DNN? 2 How does DL manipulate the probability distributions? 3 David Gu Geometric Understanding

Why dose DL work? David Gu Geometric Understanding

Deep Learning Deep learning is the mainstream technique for many machine learning tasks, including image recognition, machine translation, speech recognition, and so on. Despite its success, the theoretical understanding on how it works remains primitive. David Gu Geometric Understanding

Manifold Assumption We believe the great success of deep learning can be partially explained by the well accepted manifold assumption and the clustering assumption: Manifold Assumption Natural high dimensional data concentrates close to a non-linear low-dimensional manifold. Clustering Assumption The distances among the probability distributions of subclasses on the manifold are far enough to discriminate them. Deep learning method can learn and represent the manifold structure, and transform the probability distributions. David Gu Geometric Understanding

General Model R n Ambient Space- image space R n Σ manifold - Support of a distribution µ parameter domain - U j latent space R m U i coordinates map ϕ i - encoding/decoding ϕ j ϕ i maps ϕ ij ϕ ij controls the probability measure David Gu Geometric Understanding

Manifold Structure Definition (Manifold) Suppose M is a topological space, covered by a set of open sets M ⊂ � α U α . For each open set U α , there is a homeomorphism ϕ α : U α → R n , the pair ( U α , ϕ α ) form a chart. The union of charts form an atlas A = { ( U α , ϕ α ) } . If U α ∩ U β � = / 0 , then the chart transition map is given by ϕ αβ : ϕ α ( U α ∩ U β ) → ϕ β ( U α ∩ U β ) , ϕ αβ := ϕ β ◦ ϕ − 1 α . David Gu Geometric Understanding

Example Image space X is R 3 ; the data manifold Σ is the happy buddaha. David Gu Geometric Understanding

Example The encoding map is ϕ i : Σ → Z ; the decoding map is ϕ − 1 : Z → Σ . i David Gu Geometric Understanding

Example The automorphism of the latent space ϕ ij : Z → Z is the chart transition. David Gu Geometric Understanding

Example Uniform distribution ζ on the latent space Z , non-uniform distribution on Σ produced by a decoding map. David Gu Geometric Understanding

Example Uniform distribution ζ on the latent space Z , uniform distribution on Σ produced by another decoding map. David Gu Geometric Understanding

Human Facial Image Manifold One facial image is determined by a finite number of genes, lighting conditions, camera parameters, therefore all facial images form a manifold. David Gu Geometric Understanding

Manifold view of Generative Model Given a parametric representation ϕ : Z → Σ , randomly generate a parameter z ∈ Z (white noise), ϕ ( z ) ∈ Σ is a human facial image. David Gu Geometric Understanding

Manifold view of Denoising R n Σ p ˜ p Suppose ˜ p is a point close to the manifold, p ∈ Σ is the closest point of ˜ p . The projection ˜ p → p can be treated as denoising. David Gu Geometric Understanding

Manifold view of Denoising Σ is the clean facial image manifold; noisy image ˜ p is a point close to Σ ; the closest point p ∈ Σ is the resulting denoised image. David Gu Geometric Understanding

Manifold view of Denoising Traditional Method Fourier transform the noisy image, filter out the high frequency component, inverse Fourier transform back to the denoised image. ML Method Use the clean facial images to train the neural network, obtain a representation of the manifold. Project the noisy image to the manifold, the projection point is the denoised image. Key Difference Traditional method is independent of the content of the image; ML method heavily depends on the content of the image. The prior knowledge is encoded by the manifold. David Gu Geometric Understanding

Manifold view of Denoising If the wrong manifold is chosen, the denoising result is of non-sense. Here we use the cat face manifold to denoise a human face image, the result looks like a cat face. David Gu Geometric Understanding

How dose DL learn a manifold? David Gu Geometric Understanding

Learning Task The central tasks for Deep Learning are Learn the manifold structure from the data; 1 Represent the manifold implicitly or explicitly. 2 David Gu Geometric Understanding

Autoencoder Figure: Auto-encoder architecture. Ambient space X , latent space Z , encoding map ϕ θ : X → Z , decoding map ψ θ : Z → X . David Gu Geometric Understanding

Autoencoder The encoder takes a sample x ∈ X and maps it to z ∈ F , z = ϕ ( x ) . The decoder ψ : F → X maps z to the reconstruction ˜ x . { ( X , x ) , µ , M } ϕ ✲ { ( F , z ) , D } ψ ψ ◦ ϕ ✲ ❄ x ) , ˜ { ( X , ˜ M } An autoencoder is trained to minimise reconstruction errors: � ϕ , ψ = argmin ϕ , ψ X L ( x , ψ ◦ ϕ ( x )) d µ ( x ) , where L ( · , · ) is the loss function, such as squared errors. The reconstructed manifold ˜ M = ψ ◦ ϕ ( M ) is used as an approximation of M . David Gu Geometric Understanding

ReLU DNN Definition (ReLU DNN) For any number of hidden layers k ∈ N , input and output dimensions w 0 , w k + 1 ∈ N , a R w 0 → R w k + 1 ReLU DNN is given by specifying a sequence of k natural numbers w 1 , w 2 ,..., w k representing widths of the hidden layers, a set of k affine transformations T i : R w i − 1 → R w i for i = 1 ,..., k and a linear transformation T k + 1 : R w k → R w k + 1 corresponding to weights of hidden layers. The mapping ϕ θ : R w 0 → R w k + 1 represented by this ReLU DNN is ϕ = T k + 1 ◦ σ ◦ T k ◦···◦ T 2 ◦ σ ◦ T 1 , (1) where ◦ denotes mapping composition, θ represent all the weight and bias parameters. David Gu Geometric Understanding

Activated Path Fix the encoding map ϕ θ , let the set of all neurons in the network is denoted as S , all the subsets is denoted as 2 S . Definition (Activated Path) Given a point x ∈ X , the activated path of x consists all the activated neurons when ϕ θ ( x ) is evaluated, and denoted as ρ ( x ) . Then the activated path defines a set-valued function ρ : X → 2 S . David Gu Geometric Understanding

Cell Decomposition Definition (Cell Decomposition) Fix a encoding map ϕ θ represented by a ReLU RNN, two data points x 1 , x 2 ∈ X are equivalent , denoted as x 1 ∼ x 2 , if they share the same activated path, ρ ( x 1 ) = ρ ( x 2 ) . Then each equivalence relation partitions the ambient space X into cells, � D ( ϕ θ ) : X = U α , α each equivalence class corresponds to a cell: x 1 , x 2 ∈ U α if and only if x 1 x 2 . D ( ϕ θ ) is called the cell decomposition induced by the encoding map ϕ θ . Furthermore, ϕ θ maps the cell decomposition in the ambient space D ( ϕ θ ) to a cell decomposition in the latent space. David Gu Geometric Understanding

Encoding/Decoding a. Input manifold b. latent representation c. reconstructed mfld ˜ D = ϕ θ ( M ) M = ψ θ ( D ) M ⊂ X Figure: Auto-encoder pipeline. David Gu Geometric Understanding

Piecewise Linear Mapping d. cell decomposition e. latent space f. cell decomposition D ( ϕ θ ) cell decomposition D ( ψ θ ◦ ϕ θ ) Piecewise linear encoding/decoding maps induce cell decompositions of the ambient space and the latent space. David Gu Geometric Understanding

RL Complexity of a DNN Definition (Rectified Linear Complexity of a ReLU DNN) Given a ReLU DNN N ( w 0 ,..., w k + 1 ) , its rectified linear complexity is the upper bound of the number of pieces of all PL functions ϕ θ represented by N , N ( N ) := max N ( ϕ θ ) . θ Rectified Linear complexity gives a measurement for the representation capability of a neural network. David Gu Geometric Understanding

RL Complexity Estimate Lemma The maximum number of parts one can get when cutting d-dimensional space R d with n hyperplanes is denoted as C ( d , n ) , then � n � n � n � n � � � � C ( d , n ) = + + + ··· + . (2) 0 1 2 d Proof. Suppose n hyperplanes cut R d into C ( d , n ) cells, each cell is a convex polyhedron. The ( n + 1 ) -th hyperplane is π , then the first n hyperplanes intersection π and partition π into C ( d − 1 , n ) cells, each cell on π partitions a polyhedron in R d into 2 cells, hence we get the formula C ( d , n + 1 ) = C ( d , n )+ C ( d − 1 , n ) . It is obvious that C ( 2 , 1 ) = 2, the formula (2) can be easily David Gu Geometric Understanding

A Geometric View to Optimal Transportation and Generative Model - PowerPoint PPT Presentation

A Geometric View to Optimal Transportation and Generative Model David Xianfeng Gu 1 1 Computer Science & Applied Mathematics SUNY at Stony Brook University Center of Mathematical Sciences and Appications Harvard University Geometric

generative design systems Generative Brief Design Definitions Workshop Processes

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Cumbernauld Academy Existing aerial view from west Site Plan Aerial view from South Aerial view

FE Review-Transportation 1 FE Review-Transportation 2 FE Review-Transportation 3 FE

Optimal Transportation With Convex Constraints Ping Chen ( ) Science School Jiangsu

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Geometric Optimization Piotr Indyk April 26, 2005 Lecture 19: Geometric Optimization Geometric

Geometric Algebra A powerful tool for solving geometric problems in visual computing Leandro A.

Maple View Flats 1.24.17 1 Site Plan Maple View Flats - 1.24.17 Historic Homes to be moved and

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

Student view socrative.com STUDENT LOGIN or LOGIN, STUDENT LOGIN Room name: ALLIANCE Student

Optimal Transportation and Equilibria on Wireless Networks Alonso SILVA alonso.silva@inria.fr

Efficient Realization of Geometric Constraint Introduction Systems via Optimal Recursive

Batched Dynamic Geometric Problems Jeff Vitter Duke University Center for Geometric and

Constraint Optimization: Main Result From Efficient Computation Additional Result Comparison to

Characterizations of commutative rings by their simple, cyclic, uniform and uniserial modules

Recap: Lexicalized PCFGs We now need to estimate rule probabilities such as Prob (

3 rd October 2018 The Early Foundation Stage Curriculum Characteristics of Learning The

Follow-The-Sun Methodology in a Stochastic Modeling Perspective Ricardo M. Czekster, Paulo

Outcomes Following Primary Percutaneous Coronary Intervention: A Comparison Between Hospitals

Parameter Estimation and Lexicalization for Problem 1: Assuming Independence PCFGs Problem 2:

Register allocation Michel Schinz Advanced Compiler Construction 2008-05-16 Register