Unsupervised Learning There is no direct ground truth for the - PowerPoint PPT Presentation

Unsupervised Learning • There is no direct ground truth for the quantity of interest • Autoencoders • Variational Autoencoders (VAEs) • Generative Adversarial Networks (GANs) 1

Autoencoders Goal: Meaningful features that capture the main factors of variation in the dataset • These are good for classification, clustering, exploration, generation, … • We have no ground truth for them Features Encoder Input data 2 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Autoencoders Goal: Meaningful features that capture the main factors of variation Features that can be used to reconstruct the image Decoder L2 Loss function: Features (Latent variables) Encoder Input data 3 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Autoencoders Linear Transformation for Encoder and Decoder give result close to PCA Deeper networks give better reconstructions,   since basis can be non-linear Original Autoencoder PCA 4 Image Credit: Reducing the Dimensionality of Data with Neural Networks, . Hinton and Salakhutdinov

Example: Document Word Prob. → 2D Code PCA-based Autoencoder 5 Image Credit: Reducing the Dimensionality of Data with Neural Networks, Hinton and Salakhutdinov

Example: Semi-Supervised Classification • Many images, but few ground truth labels supervised fine-tuning start unsupervised train classification network on labeled images train autoencoder on many images Loss function (Softmax, etc) Predicted Label GT Label Decoder Classifier L2 Loss function: Features Features (Latent Variables) Encoder Encoder Input data 6 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Autoencoder geometry.cs.ucl.ac.uk/creativeai 7

Generative Models • Assumption: the dataset are samples from an unknown distribution • Goal: create a new sample from that is not in the dataset … ? Dataset Generated Image credit: Progressive Growing of GANs for Improved Quality, Stability, and Variation, Karras et al. 8

Generative Models • Assumption: the dataset are samples from an unknown distribution • Goal: create a new sample from that is not in the dataset … Dataset Generated Image credit: Progressive Growing of GANs for Improved Quality, Stability, and Variation, Karras et al. 9

Generative Models Generator with parameters known and easy to sample from 10

Generative Models How to measure similarity of and ? 1) Likelihood of data in Generator with Variational Autoencoders (VAEs) parameters 2) Adversarial game: Discriminator distinguishes Generator makes it vs known and and hard to distinguish easy to sample from Generative Adversarial Networks (GANs) 11

Autoencoders as Generative Models? • A trained decoder transforms some features to approximate samples from • What happens if we pick a random ? Decoder = Generator? • We do not know the distribution of features that decode to likely samples random Feature space / latent space 12 Image Credit: Reducing the Dimensionality of Data with Neural Networks , Hinton and Salakhutdinov

Variational Autoencoders (VAEs) • Pick a parametric distribution for features • The generator maps to an image distribution (where are parameters) Generator with parameters • Train the generator to maximize the likelihood of the data sample in : 13

Outputting a Distribution Bernoulli distribution Normal distribution Generator with Generator with parameters parameters sample sample 14

Variational Autoencoders (VAEs) • Pick a parametric distribution for features • The generator maps to an image distribution (where are parameters) Generator with parameters • Train the generator to maximize the likelihood of the data sample in : 15

Variational Autoencoders (VAEs):   Naïve Sampling (Monte-Carlo) • Approximate Integral with Monte-Carlo in each iteration • SGD approximates the sum over data Maximum likelihood of data in generated distribution: 16

Variational Autoencoders (VAEs):   Naïve Sampling (Monte-Carlo) • Approximate Integral with Monte-Carlo in each iteration • SGD approximates the expectancy over data Loss function: Generator with Random from dataset parameters sample 17

Variational Autoencoders (VAEs):   Naïve Sampling (Monte-Carlo) • Approximate Integral with Monte-Carlo in each iteration • SGD approximates the expectancy over data Loss function: • Only few map close to a given • Very expensive, or very inaccurate (depending on sample count) Generator with Random from dataset parameters sample with non-zero 18

Variational Autoencoders (VAEs):   The Encoder • During training, another network can learn a distribution of good for a given Loss function: • should be much smaller than • A single sample is good enough Generator with parameters sample Encoder with parameters 19

Variational Autoencoders (VAEs):   The Encoder • Can we still easily sample a new ? Loss function: • Need to make sure approximates • Regularize with KL-divergence • Negative loss can be shown to be a lower bound for the likelihood, Generator with parameters and equivalent if sample Encoder with parameters 20

Reparameterization Trick Example when : , where Generator with parameters sample Backprop Backprop? sample Encoder with Encoder with parameters parameters Does not depend on parameters 21

Feature Space of Autoencoders vs. VAEs Autoencoder VAE SIGGRAPH Asia Course CreativeAI: Deep Learning for Graphics 22

Generating Data MNIST Frey Faces sample Generator with parameters sample 23 Image Credit: Auto-Encoding Variational Bayes , Kingma and Welling

  VAE on MNIST   https://www.siarez.com/projects/variational-autoencoder 24

  Variational Autoencoder   geometry.cs.ucl.ac.uk/creativeai   25

Generative Adversarial Networks Player 1: generator Player 2: discriminator real/fake Scores if discriminator Scores if it can distinguish can’t distinguish output between real and fake from real image from dataset 26

Generative Models How to measure similarity of and ? 1) Likelihood of data in Generator with Variational Autoencoders (VAEs) parameters 2) Adversarial game: Discriminator distinguishes Generator makes it vs known and and hard to distinguish easy to sample from Generative Adversarial Networks (GANs) 27

Why Adversarial? • If discriminator approximates : • at maximum of has lowest loss • Optimal has single mode at , small variance : discriminator with parameters : generator with parameters sample 28 Image Credit: How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? , Ferenc Huszár

Why Adversarial? • For GANs, the discriminator instead approximates: : discriminator depends on the generator with parameters : generator with parameters sample 29 Image Credit: How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? , Ferenc Huszár

Why Adversarial? VAEs: Maximize likelihood of GANs: Maximize likelihood of generator samples in Adversarial game data samples in approximate 30 Image Credit: How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? , Ferenc Huszár

Why Adversarial? VAEs: Maximize likelihood of GANs: Maximize likelihood of generator samples in Adversarial game data samples in approximate 31 Image Credit: How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? , Ferenc Huszár

GAN Objective probability that is not fake fake/real classification loss (BCE): :discriminator Discriminator objective: :generator Generator objective: sample 32

Non-saturating Heuristic Generator loss is negative binary cross-entropy: poor convergence Negative BCE 33 Image Credit: NIPS 2016 Tutorial: Generative Adversarial Networks, Ian Goodfellow

Non-saturating Heuristic Generator loss is negative binary cross-entropy: poor convergence Flip target class instead of flipping the sign for generator loss: good convergence – like BCE Negative BCE BCE with flipped target 34 Image Credit: NIPS 2016 Tutorial: Generative Adversarial Networks, Ian Goodfellow

GAN Training Loss: Loss: Discriminator training Generator training :discriminator :discriminator from dataset :generator Interleave in each training step sample 35

DCGAN • First paper to successfully use CNNs with GANs • Due to using novel components (at that time) like batch norm., ReLUs, etc. 36 Image Credit: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , Radford et al.

  Generative Adversarial Network   geometry.cs.ucl.ac.uk/creativeai 37

Conditional GANs (CGANs) • ≈ learn a mapping between images from example pairs • Approximate sampling from a conditional distribution 38 Image Credit: Image-to-Image Translation with Conditional Adversarial Nets , Isola et al.

Unsupervised Learning There is no direct ground truth for the - PowerPoint PPT Presentation

Unsupervised Learning There is no direct ground truth for the quantity of interest Autoencoders Variational Autoencoders (VAEs) Generative Adversarial Networks (GANs) 1 Autoencoders Goal: Meaningful features that capture the main

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Machine Learning for NLP Unsupervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Unsupervised learning introduction October 7, 2019 Unsupervised learning introduction

Overview of statistical learning theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Statistical

BBM406 Fundamentals of Machine Learning Lecture 12: Computational Graph Backpropagation

14.1 Review From the last lecture, we have the following general formulation for learning

Large scale libc++ deployment Evgenii Stepanov, Google Ivan Krasin, Google Containers of

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2016 Goal:

Introduction to Machine Learning ML-Basics: Losses & Risk Minimization Learning goals Know

Joint SVBRDF Recovery and Synthesis From a Single Image using an Unsupervised Generative

Unsupervised Learning There is no direct ground truth for the - PowerPoint PPT Presentation

Unsupervised Learning There is no direct ground truth for the quantity of interest Autoencoders Variational Autoencoders (VAEs) Generative Adversarial Networks (GANs) 1 Autoencoders Goal: Meaningful features that capture the main

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Machine Learning for NLP Unsupervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Unsupervised learning introduction October 7, 2019 Unsupervised learning introduction

Overview of statistical learning theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Statistical

BBM406 Fundamentals of Machine Learning Lecture 12: Computational Graph Backpropagation

14.1 Review From the last lecture, we have the following general formulation for learning

Large scale libc++ deployment Evgenii Stepanov, Google Ivan Krasin, Google Containers of

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2016 Goal:

Introduction to Machine Learning ML-Basics: Losses &amp; Risk Minimization Learning goals Know

Joint SVBRDF Recovery and Synthesis From a Single Image using an Unsupervised Generative

Introduction to Machine Learning ML-Basics: Losses & Risk Minimization Learning goals Know