Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department - PowerPoint PPT Presentation

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 1 / 49

Outline Unsupervised Learning 1 Predictive Learning 2 Autoencoders & Manifold Learning 3 Generative Adversarial Networks 4 Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 2 / 49

Unsupervised Learning Dataset: X = { x ( i ) } i No supervision Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 4 / 49

Unsupervised Learning Dataset: X = { x ( i ) } i No supervision What can we learn? Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 4 / 49

Clustering I Goal: to group similar x ( i ) ’s Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 5 / 49

Clustering II K -means algorithm: Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 6 / 49

Clustering II K -means algorithm: Hierarchical clustering: Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 6 / 49

Factorization and Recommendation Goal: to uncover the factors behind data (rating matrix) Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 7 / 49

Factorization and Recommendation Goal: to uncover the factors behind data (rating matrix) Commonly used in the recommender systems Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 7 / 49

Factorization and Recommendation Goal: to uncover the factors behind data (rating matrix) Commonly used in the recommender systems Non-negative matrix factorization (NMF) [9, 10] Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 7 / 49

Dimension Reduction Goal: to reduce the dimension of each x ( i ) E.g., PCA Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 8 / 49

Dimension Reduction Goal: to reduce the dimension of each x ( i ) E.g., PCA Predictive learning Learn to “fill in the blanks” Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 8 / 49

Dimension Reduction Goal: to reduce the dimension of each x ( i ) E.g., PCA Predictive learning Learn to “fill in the blanks” Manifold learning Learn tangent vectors of a given point Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 8 / 49

Data Generation I Goal: to generate new data points/samples Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 9 / 49

Data Generation I Goal: to generate new data points/samples Generative adversarial networks (GANs) Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 9 / 49

Data Generation II Text to image based on conditional GANs: “ This bird is completely red with black wings and pointy beak. ” Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 10 / 49

Predictive Learning I.e., blank filling Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 12 / 49

Predictive Learning I.e., blank filling E.g., word2vec [13, 12]: “ ... the cat sat on ... ” Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 12 / 49

Doc2Vec How to encode a document? Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 13 / 49

Doc2Vec How to encode a document? Bag of words (TF-IDF), average word2vec, etc. Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 13 / 49

Doc2Vec How to encode a document? Bag of words (TF-IDF), average word2vec, etc. Do not capture the semantic meaning of a doc “ I like final project ” 6 = “ Final project likes me ” Predictive learning for docs? Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 13 / 49

Doc2Vec How to encode a document? Bag of words (TF-IDF), average word2vec, etc. Do not capture the semantic meaning of a doc “ I like final project ” 6 = “ Final project likes me ” Predictive learning for docs? Doc2vec [7]: to capture the context not explained by words Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 13 / 49

Filling Images How? Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 14 / 49

Filling Images How? PixelRNN [19] Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 14 / 49

More Predicting the future by watching unlabeled videos [6, 21]: Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 15 / 49

Autoencoders I Encoder: to learn a low dimensional representation c (called code ) of input x Decoder: to reconstruct x from c Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 17 / 49

Autoencoders I Encoder: to learn a low dimensional representation c (called code ) of input x Decoder: to reconstruct x from c Cost function: argmin Θ � logP ( X | Θ ) = argmin Θ � ∑ n logP ( x ( n ) | Θ ) Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 17 / 49

Autoencoders I Encoder: to learn a low dimensional representation c (called code ) of input x Decoder: to reconstruct x from c Cost function: argmin Θ � logP ( X | Θ ) = argmin Θ � ∑ n logP ( x ( n ) | Θ ) Sigmoid output units a ( L ) = ˆ ρ j for x j ⇠ Bernoulli ( ρ j ) j ) x ( n ) ) ( 1 � x ( n ) P ( x ( n ) | Θ ) = ( a ( L ) j ( 1 � a ( L ) ) j j j j Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 17 / 49

Autoencoders I Encoder: to learn a low dimensional representation c (called code ) of input x Decoder: to reconstruct x from c Cost function: argmin Θ � logP ( X | Θ ) = argmin Θ � ∑ n logP ( x ( n ) | Θ ) Sigmoid output units a ( L ) = ˆ ρ j for x j ⇠ Bernoulli ( ρ j ) j ) x ( n ) ) ( 1 � x ( n ) P ( x ( n ) | Θ ) = ( a ( L ) j ( 1 � a ( L ) ) j j j j Linear output units a ( L ) = z ( L ) = ˆ µ for x ⇠ N ( µ , Σ ) � logP ( x ( n ) | Θ ) = k x ( n ) � z ( L ) k 2 Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 17 / 49

Autoencoders II A 32 -bit code can roughly represents a 32 ⇥ 32 MNIST image Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 18 / 49

Convolutional Autoencoders Convolution + deconvolution: Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 19 / 49

Convolutional Autoencoders Convolution + deconvolution: How to train deconvolution layer? Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 19 / 49

Convolutional Autoencoders Convolution + deconvolution: How to train deconvolution layer? Treat it as convolution layer Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 19 / 49

Manifolds I In many applications, data concentrate around one or more low-dimensional manifolds Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 20 / 49

Manifolds I In many applications, data concentrate around one or more low-dimensional manifolds A manifold is a topological space that are linear locally Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 20 / 49

Manifolds II For each point x on a manifold, we have its tangent space spanned by tangent vectors Local directions specify how one can change x infinitesimally while staying on the manifold Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 21 / 49

Learning Manifolds I How to learn manifolds with autoencoders? Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 22 / 49

Learning Manifolds I How to learn manifolds with autoencoders? Contractive autoencoder [16]: regularize the code c such that it is invariant to local changes of x : 2 � � ∂ c ( n ) � � Ω ( c ) = ∑ � � ∂ x ( n ) � � n � � F ∂ c ( n ) / ∂ x ( n ) is a Jacobian matrix Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 22 / 49

Learning Manifolds I How to learn manifolds with autoencoders? Contractive autoencoder [16]: regularize the code c such that it is invariant to local changes of x : 2 � � ∂ c ( n ) � � Ω ( c ) = ∑ � � ∂ x ( n ) � � n � � F ∂ c ( n ) / ∂ x ( n ) is a Jacobian matrix Encoder preserves local structures in code space Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 22 / 49

Learning Manifolds II In practice, it is easier to train a denoising autoencoder [20]: Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 23 / 49

Learning Manifolds II In practice, it is easier to train a denoising autoencoder [20]: Encoder: to encode x with random noises Decoder: to reconstruct x without noises Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 23 / 49

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department - PowerPoint PPT Presentation

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 1 / 49 Outline Unsupervised

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Machine Learning for NLP Unsupervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Unsupervised learning introduction October 7, 2019 Unsupervised learning introduction

Practical Unsupervised Learning INFO/CS 4300, Spring 2016 Jack Hessel Unsupervised Learning is

SQUAREM Acceleration Schemes for Monotone Fixed-Point Iterations Including the EM and MM

Bjrn Bo Srensen How spillovers from foreign direct investment boost the complexity of South

From numerical quadrature to Pad approximation Claude Brezinski University of Lille - France

3. Interpolation: Closing the Gaps of Discretization . . . 3. Interpolation: Closing the Gaps of

DIMENSIONALITY REDUCTION AND VISUALIZATION Loose ends from HW2 Hyperparameters, bin size =

p(E|H p(E|H p ) p ) p(E|H p(E|H d ) d ) Concerns Logically correct framework for evaluation

Lars Bauer, Artjom Grudnitsky, Hongyan Zhang, Jrg Henkel - 1 - Institut fr Technische

Lars Bauer, Artjom Grudnitsky, Hongyan Zhang, Jrg Henkel - 1 - Institut fr Technische