Normalizing Flow Models Stefano Ermon, Aditya Grover Stanford - PowerPoint PPT Presentation

Normalizing Flow Models Stefano Ermon, Aditya Grover Stanford University Lecture 7 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 7 1 / 16

Recap of likelihood-based learning so far: Model families: Autoregressive Models: p θ ( x ) = � n i =1 p θ ( x i | x < i ) � Variational Autoencoders: p θ ( x ) = p θ ( x , z ) d z Autoregressive models provide tractable likelihoods but no direct mechanism for learning features Variational autoencoders can learn feature representations (via latent variables z ) but have intractable marginal likelihoods Key question : Can we design a latent variable model with tractable likelihoods? Yes! Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 7 2 / 16

Simple Prior to Complex Data Distributions Desirable properties of any model distribution: Analytic density Easy-to-sample Many simple distributions satisfy the above properties e.g., Gaussian, uniform distributions Unfortunately, data distributions could be much more complex (multi-modal) Key idea : Map simple distributions (easy to sample and evaluate densities) to complex distributions (learned via data) using change of variables . Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 7 3 / 16

Change of Variables formula Let Z be a uniform random variable U [0 , 2] with density p Z . What is p Z (1)? 1 2 Let X = 4 Z , and let p X be its density. What is p X (4)? p X (4) = p ( X = 4) = p (4 Z = 4) = p ( Z = 1) = p Z (1) = 1 / 2 No Clearly, X is uniform in [0 , 8], so p X (4) = 1 / 8 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 7 4 / 16

Change of Variables formula Change of variables (1D case) : If X = f ( Z ) and f ( · ) is monotone with inverse Z = f − 1 ( X ) = h ( X ), then: p X ( x ) = p Z ( h ( x )) | h ′ ( x ) | Previous example: If X = 4 Z and Z ∼ U [0 , 2], what is p X (4)? Note that h ( X ) = X / 4 p X (4) = p Z (1) h ′ (4) = 1 / 2 × 1 / 4 = 1 / 8 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 7 5 / 16

Geometry: Determinants and volumes Let Z be a uniform random vector in [0 , 1] n Let X = AZ for a square invertible matrix A , with inverse W = A − 1 . How is X distributed? Geometrically, the matrix A maps the unit hypercube [0 , 1] n to a parallelotope Hypercube and parallelotope are generalizations of square/cube and parallelogram/parallelopiped to higher dimensions � a � c Figure: The matrix A = maps a unit square to a parallelogram b d Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 7 6 / 16

Geometry: Determinants and volumes The volume of the parallelotope is equal to the determinant of the transformation A � a � c det ( A ) = det = ad − bc b d X is uniformly distributed over the parallelotope. Hence, we have p X ( x ) = p Z ( W x ) | det ( W ) | = p Z ( W x ) / | det ( A ) | Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 7 7 / 16

Generalized change of variables For linear transformations specified via A , change in volume is given by the determinant of A For non-linear transformations f ( · ), the linearized change in volume is given by the determinant of the Jacobian of f ( · ). Change of variables (General case) : The mapping between Z and X , given by f : R n �→ R n , is invertible such that X = f ( Z ) and Z = f − 1 ( X ). � ∂ f − 1 ( x ) � � �� f − 1 ( x ) � � � p X ( x ) = p Z � det � � ∂ x � Note 1: x , z need to be continuous and have the same dimension. For example, if x ∈ R n then z ∈ R n Note 2: For any invertible matrix A , det ( A − 1 ) = det ( A ) − 1 − 1 � �� ∂ f ( z ) � � p X ( x ) = p Z ( z ) � det � � ∂ z � Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 7 8 / 16

Two Dimensional Example Let Z 1 and Z 2 be continuous random variables with joint density p Z 1 , Z 2 . Let u = ( u 1 , u 2 ) be a transformation Let v = ( v 1 , v 2 ) be the inverse transformation Let X 1 = u 1 ( Z 1 , Z 2 ) and X 2 = u 2 ( Z 1 , Z 2 ) Then, Z 1 = v 1 ( X 1 , X 2 ) and Z 2 = v 2 ( X 1 , X 2 ) p X 1 , X 2 ( x 1 , x 2 ) � ∂ v 1 ( x 1 , x 2 ) � ∂ v 1 ( x 1 , x 2 ) �� ∂ x 1 ∂ x 2 = p Z 1 , Z 2 ( v 1 ( x 1 , x 2 ) , v 2 ( x 1 , x 2 )) � (inverse) � det � � ∂ v 2 ( x 1 , x 2 ) ∂ v 2 ( x 1 , x 2 ) � � ∂ x 1 ∂ x 2 � ∂ u 1 ( z 1 , z 2 ) − 1 � ∂ u 1 ( z 1 , z 2 ) �� ∂ z 1 ∂ z 2 = p Z 1 , Z 2 ( z 1 , z 2 ) (forward) � det � � ∂ u 2 ( z 1 , z 2 ) ∂ u 2 ( z 1 , z 2 ) � � ∂ z 1 ∂ z 2 � Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 7 9 / 16

Normalizing flow models Consider a directed, latent-variable model over observed variables X and latent variables Z In a normalizing flow model , the mapping between Z and X , given by f θ : R n �→ R n , is deterministic and invertible such that X = f θ ( Z ) and Z = f − 1 θ ( X ) Using change of variables, the marginal likelihood p ( x ) is given by � �� ∂ f − 1 θ ( x ) � � f − 1 � � p X ( x ; θ ) = p Z θ ( x ) � det � � ∂ x � � � Note: x , z need to be continuous and have the same dimension. Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 7 10 / 16

A Flow of Transformations Normalizing: Change of variables gives a normalized density after applying an invertible transformation Flow: Invertible transformations can be composed with each other z m := f m θ ◦ · · · ◦ f 1 θ ( z 0 ) = f m θ ( f m − 1 ( · · · ( f 1 θ ( z 0 )))) � f θ ( z 0 ) θ Start with a simple distribution for z 0 (e.g., Gaussian) Apply a sequence of M invertible transformations x � z M By change of variables M θ ) − 1 ( z m ) � � ∂ ( f m �� f − 1 � � � � � p X ( x ; θ ) = p Z θ ( x ) � det � � ∂ z m � m =1 (Note: determininant of product equals product of determinants) Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 7 11 / 16

Planar flows (Rezende & Mohamed, 2016) Planar flow. Invertible transformation x = f θ ( z ) = z + u h ( w T z + b ) parameterized by θ = ( w , u , b ) where h ( · ) is a non-linearity Absolute value of the determinant of the Jacobian is given by � � � det ∂ f θ ( z ) � � � � � det ( I + h ′ ( w T z + b ) uw T ) � = � � � � ∂ z � � � � 1 + h ′ ( w T z + b ) u T w = � � � (matrix determinant lemma) Need to restrict parameters and non-linearity for the mapping to be invertible. For example, h = tanh () and h ′ ( w T z + b ) u T w ≥ − 1 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 7 12 / 16

Planar flows (Rezende & Mohamed, 2016) Base distribution: Gaussian Base distribution: Uniform 10 planar transformations can transform simple distributions into a more complex one Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 7 13 / 16

Learning and Inference Learning via maximum likelihood over the dataset D � � �� ∂ f − 1 θ ( x ) � � � f − 1 � � max log p X ( D ; θ ) = log p Z θ ( x ) + log � det � � ∂ x � � θ � x ∈D Exact likelihood evaluation via inverse tranformation x �→ z and change of variables formula Sampling via forward transformation z �→ x z ∼ p Z ( z ) x = f θ ( z ) Latent representations inferred via inverse transformation (no inference network required!) z = f − 1 θ ( x ) Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 7 14 / 16

Desiderata for flow models Simple prior p Z ( z ) that allows for efficient sampling and tractable likelihood evaluation. E.g., isotropic Gaussian Invertible transformations with tractable evaluation: Likelihood evaluation requires efficient evaluation of x �→ z mapping Sampling requires efficient evaluation of z �→ x mapping Computing likelihoods also requires the evaluation of determinants of n × n Jacobian matrices, where n is the data dimensionality Computing the determinant for an n × n matrix is O ( n 3 ): prohibitively expensive within a learning loop! Key idea : Choose tranformations so that the resulting Jacobian matrix has special structure. For example, the determinant of a triangular matrix is the product of the diagonal entries, i.e., an O ( n ) operation Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 7 15 / 16

Triangular Jacobian x = ( x 1 , · · · , x n ) = f ( z ) = ( f 1 ( z ) , · · · , f n ( z )) ∂ f 1 ∂ f 1  · · ·  ∂ z 1 ∂ z n J = ∂ f ∂ z = · · · · · · · · ·   ∂ f n ∂ f n · · · ∂ z 1 ∂ z n Suppose x i = f i ( z ) only depends on z ≤ i . Then ∂ f 1  · · · 0  ∂ z 1 J = ∂ f ∂ z = · · · · · · · · ·   ∂ f n ∂ f n · · · ∂ z 1 ∂ z n has lower triangular structure. Determinant can be computed in linear time . Similarly, the Jacobian is upper triangular if x i only depends on z ≥ i Next lecture: Designing invertible transformations! Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 7 16 / 16

Normalizing Flow Models Stefano Ermon, Aditya Grover Stanford - PowerPoint PPT Presentation

Normalizing Flow Models Stefano Ermon, Aditya Grover Stanford University Lecture 7 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 7 1 / 16 Recap of likelihood-based learning so far: Model families: Autoregressive

Normalizing Flow Models Stefano Ermon, Aditya Grover Stanford University Lecture 8 Stefano

Normalizing Flow Models Stefano Ermon, Aditya Grover Stanford University Lecture 7 Stefano

Latent Normalizing Flows for Discrete Sequences Zachary M. Ziegler, Alexander M. Rush School of

Normalizing tweets with edit scripts and recurrent neural embeddings Grzegorz Chrupaa |

Flow Visualization Overview: Flow Visualization (1) Introduction, overview Flow data Simulation

Dependence and Data Flow Models (c) 2007 Mauro Pezz & Michal Young Ch 6, slide 1 Why Data

Network Flow CS31005: Algorithms-II Autumn 2020 IIT Kharagpur Network Flow Models the flow

Why Data Flow Models? Models from Chapter 5 emphasized control Control flow graph, call

Potential Flow & Flow Nets Potential Flow Irrotational flow for which implies:

Coupling free flow / porous-medium flow General idea free flow, Navier-Stokes wind 1 phase, 2

1 What Is Control-Flow Analysis? Loop Concepts Control-flow analysis discovers the flow of

Traffic Flow Characteristics and Theory Primary Elements of Traffic Flow a. Flow Rate = q

Flow networks, flow, maximum flow Can interpret directed graph as flow network. Material

= edge edge ( (u,v u,v) ) is not in is not in E E f x Y ( , ) f x y ( , ) y Y

FLOW CYTOMETRY DATA COMPRESSION A.E. Bras PhD Student Erasmus University, Rotterdam, the

Flow Visualization Overview: Flow Visualization (1) Introduction, overview Fl Flow data d t

Short, Invertible Elements in Partially Splitting Cyclotomic Rings and Applications to

Invertible Generative Models for Inverse Problems Mitigating Representation Error and Dataset Bias

Announcements Wednesday, October 04 Please fill out the mid-semester survey under

Math 267, Section E2, 2009f Hailiang Liu December 4, 2009 Chapter 1: First Order Equations

One-sided invertibility of infinite band-dominated matrices Yuri Karlovich Universidad Autnoma

Systems Fundamentals Overview Definition Examples Properties Memory

Math 211 Math 211 Lecture #18 Properties of Solution Spaces February 26, 2001 2 Method of

An introduction to the Jacobian conjecture Damiano Fulghesu Minnesota State University Moorhead

Normalizing Flow Models Stefano Ermon, Aditya Grover Stanford - PowerPoint PPT Presentation

Normalizing Flow Models Stefano Ermon, Aditya Grover Stanford University Lecture 7 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 7 1 / 16 Recap of likelihood-based learning so far: Model families: Autoregressive

Normalizing Flow Models Stefano Ermon, Aditya Grover Stanford University Lecture 8 Stefano

Normalizing Flow Models Stefano Ermon, Aditya Grover Stanford University Lecture 7 Stefano

Latent Normalizing Flows for Discrete Sequences Zachary M. Ziegler, Alexander M. Rush School of

Normalizing tweets with edit scripts and recurrent neural embeddings Grzegorz Chrupaa |

Flow Visualization Overview: Flow Visualization (1) Introduction, overview Flow data Simulation

Dependence and Data Flow Models (c) 2007 Mauro Pezz &amp; Michal Young Ch 6, slide 1 Why Data

Network Flow CS31005: Algorithms-II Autumn 2020 IIT Kharagpur Network Flow Models the flow

Why Data Flow Models? Models from Chapter 5 emphasized control Control flow graph, call

Potential Flow &amp; Flow Nets Potential Flow Irrotational flow for which implies:

Coupling free flow / porous-medium flow General idea free flow, Navier-Stokes wind 1 phase, 2

1 What Is Control-Flow Analysis? Loop Concepts Control-flow analysis discovers the flow of

Traffic Flow Characteristics and Theory Primary Elements of Traffic Flow a. Flow Rate = q

Flow networks, flow, maximum flow Can interpret directed graph as flow network. Material

= edge edge ( (u,v u,v) ) is not in is not in E E f x Y ( , ) f x y ( , ) y Y

FLOW CYTOMETRY DATA COMPRESSION A.E. Bras PhD Student Erasmus University, Rotterdam, the

Flow Visualization Overview: Flow Visualization (1) Introduction, overview Fl Flow data d t

Short, Invertible Elements in Partially Splitting Cyclotomic Rings and Applications to

Invertible Generative Models for Inverse Problems Mitigating Representation Error and Dataset Bias

Announcements Wednesday, October 04 Please fill out the mid-semester survey under

Math 267, Section E2, 2009f Hailiang Liu December 4, 2009 Chapter 1: First Order Equations

One-sided invertibility of infinite band-dominated matrices Yuri Karlovich Universidad Autnoma

Systems Fundamentals Overview Definition Examples Properties Memory

Math 211 Math 211 Lecture #18 Properties of Solution Spaces February 26, 2001 2 Method of

An introduction to the Jacobian conjecture Damiano Fulghesu Minnesota State University Moorhead

Dependence and Data Flow Models (c) 2007 Mauro Pezz & Michal Young Ch 6, slide 1 Why Data

Potential Flow & Flow Nets Potential Flow Irrotational flow for which implies: