Networks Mostly adapted from Goodfellows 2016 NIPS tutorial: - PowerPoint PPT Presentation

Generative Adversarial Networks Mostly adapted from Goodfellow’s 2016 NIPS tutorial: https://arxiv.org/pdf/1701.00160.pdf

Story so far: Why generative models? • Unsupervised learning means we have more training data • Some problems have many right answers, and diversity is desirable • Caption generation, image to image, super-resolution • Some tasks intrinsically require generation • Machine translation • Some generative models allow us to investigate a lower dimensional manifold of high dimensional data. This manifold can provide insight into high dimensional observations • Brain activity, gene expression

Recap: Factor Analysis • Generative model: Assumes that data are generated from real valued latent variables Bishop – Pattern Recognition and Machine Learning

Recap: Factor Analysis • We can see from the marginal distribution: 𝑞 𝒚 𝒋 𝑿, 𝝂, 𝛀 = 𝒪 𝒚 𝒋 𝝂, 𝛀 + 𝑿𝑿 𝑈 that the covariance matrix of the data distribution is broken into 2 terms • A diagonal part 𝛀 : variance not shared between variables • A low rank matrix 𝑿𝑿 𝑈 : shared variance due to latent factors

Recap: Evidence Lower Bound (ELBO) • From basic probability we have: KL 𝑟 𝑨 || 𝑞 𝑨|𝑦, 𝜄 = KL 𝑟 𝑨 || 𝑞 𝑦, 𝑨 |𝜄 + log 𝑞 𝑦 𝜄 • We can rearrange the terms to get the following decomposition: log 𝑞 𝑦 𝜄 = KL 𝑟 𝑨 || 𝑞 𝑨|𝑦, 𝜄 − KL 𝑟 𝑨 || 𝑞 𝑦, 𝑨 |𝜄 • We define the evidence lower bound (ELBO) as: ℒ 𝑟, 𝜄 ≜ −KL 𝑟 𝑨 || 𝑞 𝑦, 𝑨 |𝜄 Then: log 𝑞 𝑦 𝜄 = KL 𝑟 𝑨 ||𝑞 𝑨|𝑦, 𝜄 + ℒ 𝑟, 𝜄

Recap: The EM algorithm E step Bishop – Pattern Recognition and Machine Learning • Maximize ℒ 𝑟, 𝜄 (𝑢−1) with respect to 𝑟 by setting 𝒓 𝒖 𝒜 ← 𝒒 𝒜 𝒚, 𝜾 𝒖−𝟐

Recap: The M step Bishop – Pattern Recognition and Machine Learning • After applying the E step, we increase the likelihood of the data by finding better parameters according to: 𝜄 (𝑢) ← 𝐛𝐬𝐡𝐧𝐛𝐲 𝜾 𝔽 𝒓 𝒖 (𝒜) 𝐦𝐩𝐡 𝒒 𝒚, 𝒜 𝜾

Recap: EM in practice argmax 𝑿,𝛀 𝔽 𝑟 𝑢 (𝒜) log 𝑞 𝒀, 𝒂 𝑿, 𝛀 = 𝑂 = argmax 𝑿,𝛀 − 𝑂 1 𝑈 𝛀 −1 𝒚 𝑗 − 𝒚 𝒋 𝑈 𝛀 −1 𝑿𝔽 𝑟 𝑢 (𝒜 𝒋 ) 𝒜 𝑗 2 log det(𝛀) − ෍ ቆ 2 𝒚 𝑗 𝑗=1 + 1 𝑈 2 tr 𝑿 𝑈 𝛀 −1 𝑿𝔽 𝑟 𝑢 𝒜 𝒋 𝒜 𝒋 𝒜 𝒋 ቇ • By looking at what expectations the M step requires, we find out what we need to compute in the E step. • For FA, we only need these 2 sufficient statistics to enable the M step . • In practice, sufficient statistics are often what we compute in the E step

Recap: From EM to Variational Inference • In EM we alternately maximize the ELBO with respect to 𝜄 and probability distribution (functional) 𝑟 • In variational inference, we drop the distinction between hidden variables and parameters of a distribution • I.e. we replace 𝑞(𝑦, 𝑨|𝜄) with 𝑞(𝑦, 𝑨) . Effectively this puts a probability distribution on the parameters 𝜾 , then absorbs them into 𝑨 • Fully Bayesian treatment instead of a point estimate for the parameters

Recap: Variational Autoencoder • For 𝑢 = 1: 𝑐: 𝑈 𝜖ℒ 𝜖ℒ ℒ 𝐵 or − ሚ ℒ 𝐶 as the 𝜖𝜄 with either − ሚ • Estimate 𝜖𝜚 , 𝑞(𝑦 𝑗 |𝑨 𝑗 , 𝜄) loss • Update 𝜚, 𝜄 𝑨 𝑗 = 𝑕(𝜗 𝑗 , 𝑦 𝑗 , 𝜚) • Training procedure uses standard back propagation with an MC procedure to approximately run EM on the ELBO • The reparameterization trick enables the 𝑕(𝜗 𝑗 , 𝑦 𝑗 , 𝜚) 𝜗 𝑗 ~𝑞(𝜗) gradient to flow through the network

Recap: Requirements of the VAE • Note that the VAE requires 2 tractable distributions to be used: • The prior distribution 𝑞(𝑨) must be easy to sample from • The conditional likelihood 𝑞 𝑦|𝑨, 𝜄 must be computable • In practice this means that the 2 distributions of interest are often simple, for example uniform, Gaussian, or even isotropic Gaussian

Recap: The VAE blurry image problem • The samples from the VAE look blurry • Three plausible explanations for this • Maximizing the likelihood • Restrictions on the family of distributions https://blog.openai.com/generative-models/ • The lower bound approximation

Recap: The maximum likelihood explanation • Recent evidence suggests that this is not actually the problem • GANs can be trained with maximum likelihood and still generate sharp examples https://arxiv.org/pdf/1701.00160.pdf

A taxonomy of generative models

Fully Visible Belief Net (FVBN), e.g. Wavenet 𝑈 𝑞 𝒚 = ෑ 𝑞 𝑦 𝑢 𝑦 1 , … , 𝑦 𝑢−1 ) 𝑢=1 • No latent variable (hence fully visible) • Easier to optimize well • Tractable log-likelihood • Slower to run • Train with auto-regressive target

GAN Advantages • Sample in parallel (vs FVBN) • Few restrictions on generator function • No Markov Chain • No variational bound • Subjectively better samples

GAN Disadvantages • Very difficult to train properly • Difficult to evaluate • Likelihood cannot be computed • No encoder (in vanilla GAN)

GAN samples look sharp Real Samples Generated Samples https://arxiv.org/pdf/1703.10717.pdf

GAN samples look sharp Real Samples Generated Samples Boundary Equilibrium GAN Energy Based GAN https://arxiv.org/pdf/1703.10717.pdf

Interpolation is impressive https://arxiv.org/pdf/1703.10717.pdf

Generative Adversarial Networks: Basic idea Looks Fake! Generator Discriminator (Counterfeiter): (Detective): Distinguish Creates fake data real data from fake from random data input Looks Real!

The Generator • Faking Data • To create good fake data, the generator must understand what real data looks like • Attempts to generate samples that are likely under the true data distribution • Implicitly learns to model the true distribution • Latent Code • Since the sample is determined by the random noise input, the probability distribution is conditioned on this input • The random noise is interpreted by the model as a latent code , i.e. a point on the manifold

Problem setup Generator Trained Discriminator Trained to get better and to get better and better at fooling better at distinguishing the discriminator real data from fake (making fake data data look real)

Formalizing the generator/discriminator Generator: 𝐻 𝑨, 𝜄 (𝐻) Discriminator: 𝐸 𝑦, 𝜄 (𝐸) A differentiable function, A differentiable function, 𝐸 (here having parameters 𝜄 (𝐸) ), 𝐻 (here having parameters 𝜄 (𝐻) ), mapping from the mapping from the data space, latent space, ℝ 𝑀 , to the ℝ 𝑁 , to a scalar between 0 and 1 data space, ℝ 𝑁 representing the probability that the data is real

Simplifying notation Generator: 𝐻 𝑨 Discriminator: 𝐸 𝑦 , 𝐸 𝐻(𝑨) For simplicity of notation, Note that the discriminator can we write 𝐻 𝑨 without 𝜄 (𝐻) also take the output of the generator as input. Typically 𝐻 is a neural Typically 𝐸 is a neural network, network, but it doesn’t have but it doesn’t have to be to be Note 𝑨 can go into any layer of the network, not just the first

An artist’s renditio n 𝐸 𝐻(𝑨) or 𝐸 𝑦 𝐻 𝑨 or 𝑦 𝑨

The game (theory) • The generator and discriminator are adversaries in a game • The generator controls only its parameters • The discriminator controls only its parameters • Each seeks to maximize its own success and minimize the success of the other: related to minimax theory

Nash equilibrium • In game theory, a local optimum in this system is called a Nash equilibrium: • Generator loss, 𝐾 (𝐻) , is at a local minimum with respect to 𝜄 𝐻 • Discriminator loss, 𝐾 (𝐸) , is at a local minimum with respect to 𝜄 𝐸

Basic training procedure • Initialize 𝜄 (𝐻) , 𝜄 (𝐸) • For 𝑢 = 1: 𝑐: 𝑈 Initialize Δ𝜄 (𝐸) = 0 • • For 𝑗 = 𝑢: 𝑢 + 𝑐 − 1 • Sample 𝑨 𝑗 ~ 𝑞(𝑨 𝑗 ) Can also run 𝑙 minibatches • Compute 𝐸 𝐻 𝑨 𝑗 , 𝐸(𝑦 𝑗 ) of the discriminator update (𝐸) ← Compute gradient of Discriminator loss , 𝐾 𝐸 𝜄 𝐻 , 𝜄 (𝐸) • Δ𝜄 𝑗 before updating the Δ𝜄 (𝐸) ← Δ𝜄 (𝐸) + Δ𝜄 𝑗 𝐸 • generator, but Goodfellow Update 𝜄 (𝐸) • finds 𝑙 = 1 tends to work Initialize Δ𝜄 (𝐻) = 0 • best • For 𝑘 = 𝑢: 𝑢 + 𝑐 − 1 • Sample 𝑨 𝑘 ~ 𝑞(𝑨 𝑘 ) • Compute 𝐸 𝐻 𝑨 𝑘 , 𝐸(𝑦 𝑘 ) (𝐻) ← Compute gradient of Generator loss, 𝐾 𝐻 𝜄 𝐻 , 𝜄 (𝐸) • Δ𝜄 𝑘 Δ𝜄 (𝐻) ← Δ𝜄 (𝐻) + Δ𝜄 𝐻 • 𝑘 • Update 𝜄 (𝐻)

Networks Mostly adapted from Goodfellows 2016 NIPS tutorial: - PowerPoint PPT Presentation

Generative Adversarial Networks Mostly adapted from Goodfellows 2016 NIPS tutorial: https://arxiv.org/pdf/1701.00160.pdf Story so far: Why generative models? Unsupervised learning means we have more training data Some problems have

P2P Networks as Content P2P Networks as Content Delivery Networks Delivery Networks FINAL

Current Network Structure for Pediatrics Hospital Networks Country, state, regional, Academic

Mobile Communications Ad-Hoc Networks & Wireless Sensor Networks Ad-hoc networks

Outline Applications of Random Networks Random Networks Applications of Random Networks

Types of networks (social networks, computer networks, entity- relationship networks, )

Computer Networks I Computer Networks I Networks A networks connection structure is known as

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Mobility and cellular networks Mobility and cellular networks Cellular radio and PCS networks

Overview Multi-layer networks: Cognitive Modeling limits of single layer networks; Lecture

Chapter 1 Communication Networks and Services Networks and Services Network Architecture and

Regional Networks Regional Networks Rural Creative Placemaking Summit Regional

Core Models of Complex Networks Principles of Complex Systems Generalized random networks

ECEN 5032 Data Networks Wireless Networks Peter Mathys mathys@colorado.edu University of

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

VACCINE NETWORKS VACCINE NETWORKS EXAMINING ACUTE AND PERPETUAL NETWORKS AND EXAMINING ACUTE AND

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

On equilibrium problems on the real axis. Applications Ram on Orive Universidad de La Laguna.

90 Years of Computability and Complexity Stathis Zachos National Technical University of Athens

Greenhouse Gas Emissions Andre Barbe USAEE Annual Conference November 14, 2017 Disclaimer This

Potential l Carb arbon Mult ltiplie lier Effects or or Re-spendin ing deci cisi sions fol

Inverse Optimization and Equilibrium with Applications in Finance and Statistics Jong-Shi Pang

Parameterized Two-Player Nash Equilibrium Danny Hermelin, Chien-Chung Huang, .. Stefan Kratsch,

Noise vs Computational unpredictability in dynamics Crist obal Rojas Joint with M. Braverman

Monte Carlo methods for magnetic systems Zoltn Nda Babe-Bolyai University Dept of

Networks Mostly adapted from Goodfellows 2016 NIPS tutorial: - PowerPoint PPT Presentation

Generative Adversarial Networks Mostly adapted from Goodfellows 2016 NIPS tutorial: https://arxiv.org/pdf/1701.00160.pdf Story so far: Why generative models? Unsupervised learning means we have more training data Some problems have

P2P Networks as Content P2P Networks as Content Delivery Networks Delivery Networks FINAL

Current Network Structure for Pediatrics Hospital Networks Country, state, regional, Academic

Mobile Communications Ad-Hoc Networks &amp; Wireless Sensor Networks Ad-hoc networks

Outline Applications of Random Networks Random Networks Applications of Random Networks

Types of networks (social networks, computer networks, entity- relationship networks, )

Computer Networks I Computer Networks I Networks A networks connection structure is known as

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Mobility and cellular networks Mobility and cellular networks Cellular radio and PCS networks

Overview Multi-layer networks: Cognitive Modeling limits of single layer networks; Lecture

Chapter 1 Communication Networks and Services Networks and Services Network Architecture and

Regional Networks Regional Networks Rural Creative Placemaking Summit Regional

Core Models of Complex Networks Principles of Complex Systems Generalized random networks

ECEN 5032 Data Networks Wireless Networks Peter Mathys mathys@colorado.edu University of

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

VACCINE NETWORKS VACCINE NETWORKS EXAMINING ACUTE AND PERPETUAL NETWORKS AND EXAMINING ACUTE AND

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

On equilibrium problems on the real axis. Applications Ram on Orive Universidad de La Laguna.

90 Years of Computability and Complexity Stathis Zachos National Technical University of Athens

Greenhouse Gas Emissions Andre Barbe USAEE Annual Conference November 14, 2017 Disclaimer This

Potential l Carb arbon Mult ltiplie lier Effects or or Re-spendin ing deci cisi sions fol

Inverse Optimization and Equilibrium with Applications in Finance and Statistics Jong-Shi Pang

Parameterized Two-Player Nash Equilibrium Danny Hermelin, Chien-Chung Huang, .. Stefan Kratsch,

Noise vs Computational unpredictability in dynamics Crist obal Rojas Joint with M. Braverman

Monte Carlo methods for magnetic systems Zoltn Nda Babe-Bolyai University Dept of

Mobile Communications Ad-Hoc Networks & Wireless Sensor Networks Ad-hoc networks