Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department - - PowerPoint PPT Presentation

unsupervised learning
SMART_READER_LITE
LIVE PREVIEW

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department - - PowerPoint PPT Presentation

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 1 / 49 Outline Unsupervised


slide-1
SLIDE 1

Unsupervised Learning

Shan-Hung Wu

shwu@cs.nthu.edu.tw

Department of Computer Science, National Tsing Hua University, Taiwan

Machine Learning

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 1 / 49

slide-2
SLIDE 2

Outline

1

Unsupervised Learning

2

Predictive Learning

3

Autoencoders & Manifold Learning

4

Generative Adversarial Networks

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 2 / 49

slide-3
SLIDE 3

Outline

1

Unsupervised Learning

2

Predictive Learning

3

Autoencoders & Manifold Learning

4

Generative Adversarial Networks

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 3 / 49

slide-4
SLIDE 4

Unsupervised Learning

Dataset: X = {x(i)}i

No supervision

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 4 / 49

slide-5
SLIDE 5

Unsupervised Learning

Dataset: X = {x(i)}i

No supervision

What can we learn?

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 4 / 49

slide-6
SLIDE 6

Clustering I

Goal: to group similar x(i)’s

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 5 / 49

slide-7
SLIDE 7

Clustering II

K-means algorithm:

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 6 / 49

slide-8
SLIDE 8

Clustering II

K-means algorithm: Hierarchical clustering:

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 6 / 49

slide-9
SLIDE 9

Factorization and Recommendation

Goal: to uncover the factors behind data (rating matrix)

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 7 / 49

slide-10
SLIDE 10

Factorization and Recommendation

Goal: to uncover the factors behind data (rating matrix)

Commonly used in the recommender systems

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 7 / 49

slide-11
SLIDE 11

Factorization and Recommendation

Goal: to uncover the factors behind data (rating matrix)

Commonly used in the recommender systems

Non-negative matrix factorization (NMF) [9, 10]

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 7 / 49

slide-12
SLIDE 12

Dimension Reduction

Goal: to reduce the dimension of each x(i)

E.g., PCA

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 8 / 49

slide-13
SLIDE 13

Dimension Reduction

Goal: to reduce the dimension of each x(i)

E.g., PCA

Predictive learning

Learn to “fill in the blanks”

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 8 / 49

slide-14
SLIDE 14

Dimension Reduction

Goal: to reduce the dimension of each x(i)

E.g., PCA

Predictive learning

Learn to “fill in the blanks”

Manifold learning

Learn tangent vectors of a given point

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 8 / 49

slide-15
SLIDE 15

Data Generation I

Goal: to generate new data points/samples

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 9 / 49

slide-16
SLIDE 16

Data Generation I

Goal: to generate new data points/samples Generative adversarial networks (GANs)

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 9 / 49

slide-17
SLIDE 17

Data Generation II

Text to image based on conditional GANs: “This bird is completely red with black wings and pointy beak.”

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 10 / 49

slide-18
SLIDE 18

Outline

1

Unsupervised Learning

2

Predictive Learning

3

Autoencoders & Manifold Learning

4

Generative Adversarial Networks

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 11 / 49

slide-19
SLIDE 19

Predictive Learning

I.e., blank filling

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 12 / 49

slide-20
SLIDE 20

Predictive Learning

I.e., blank filling E.g., word2vec [13, 12]: “... the cat sat on...”

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 12 / 49

slide-21
SLIDE 21

Doc2Vec

How to encode a document?

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 13 / 49

slide-22
SLIDE 22

Doc2Vec

How to encode a document?

Bag of words (TF-IDF), average word2vec, etc.

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 13 / 49

slide-23
SLIDE 23

Doc2Vec

How to encode a document?

Bag of words (TF-IDF), average word2vec, etc.

Do not capture the semantic meaning of a doc

“I like final project” 6= “Final project likes me”

Predictive learning for docs?

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 13 / 49

slide-24
SLIDE 24

Doc2Vec

How to encode a document?

Bag of words (TF-IDF), average word2vec, etc.

Do not capture the semantic meaning of a doc

“I like final project” 6= “Final project likes me”

Predictive learning for docs? Doc2vec [7]: to capture the context not explained by words

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 13 / 49

slide-25
SLIDE 25

Filling Images

How?

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 14 / 49

slide-26
SLIDE 26

Filling Images

How? PixelRNN [19]

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 14 / 49

slide-27
SLIDE 27

More

Predicting the future by watching unlabeled videos [6, 21]:

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 15 / 49

slide-28
SLIDE 28

Outline

1

Unsupervised Learning

2

Predictive Learning

3

Autoencoders & Manifold Learning

4

Generative Adversarial Networks

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 16 / 49

slide-29
SLIDE 29

Autoencoders I

Encoder: to learn a low dimensional representation c (called code) of input x Decoder: to reconstruct x from c

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 17 / 49

slide-30
SLIDE 30

Autoencoders I

Encoder: to learn a low dimensional representation c (called code) of input x Decoder: to reconstruct x from c Cost function: argminΘ logP(X|Θ) = argminΘ ∑n logP(x(n) |Θ)

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 17 / 49

slide-31
SLIDE 31

Autoencoders I

Encoder: to learn a low dimensional representation c (called code) of input x Decoder: to reconstruct x from c Cost function: argminΘ logP(X|Θ) = argminΘ ∑n logP(x(n) |Θ) Sigmoid output units a(L)

j

= ˆ ρj for xj ⇠ Bernoulli(ρj)

P(x(n)

j

|Θ) = (a(L)

j

)x(n)

j (1a(L)

j

)(1x(n)

j

)

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 17 / 49

slide-32
SLIDE 32

Autoencoders I

Encoder: to learn a low dimensional representation c (called code) of input x Decoder: to reconstruct x from c Cost function: argminΘ logP(X|Θ) = argminΘ ∑n logP(x(n) |Θ) Sigmoid output units a(L)

j

= ˆ ρj for xj ⇠ Bernoulli(ρj)

P(x(n)

j

|Θ) = (a(L)

j

)x(n)

j (1a(L)

j

)(1x(n)

j

)

Linear output units a(L) = z(L) = ˆ µ for x ⇠ N (µ,Σ)

logP(x(n) |Θ) = kx(n) z(L)k2

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 17 / 49

slide-33
SLIDE 33

Autoencoders II

A 32-bit code can roughly represents a 32⇥32 MNIST image

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 18 / 49

slide-34
SLIDE 34

Convolutional Autoencoders

Convolution + deconvolution:

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 19 / 49

slide-35
SLIDE 35

Convolutional Autoencoders

Convolution + deconvolution: How to train deconvolution layer?

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 19 / 49

slide-36
SLIDE 36

Convolutional Autoencoders

Convolution + deconvolution: How to train deconvolution layer? Treat it as convolution layer

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 19 / 49

slide-37
SLIDE 37

Manifolds I

In many applications, data concentrate around one or more low-dimensional manifolds

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 20 / 49

slide-38
SLIDE 38

Manifolds I

In many applications, data concentrate around one or more low-dimensional manifolds A manifold is a topological space that are linear locally

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 20 / 49

slide-39
SLIDE 39

Manifolds II

For each point x on a manifold, we have its tangent space spanned by tangent vectors

Local directions specify how one can change x infinitesimally while staying on the manifold

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 21 / 49

slide-40
SLIDE 40

Learning Manifolds I

How to learn manifolds with autoencoders?

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 22 / 49

slide-41
SLIDE 41

Learning Manifolds I

How to learn manifolds with autoencoders? Contractive autoencoder [16]: regularize the code c such that it is invariant to local changes of x: Ω(c) = ∑

n

  • ∂c(n)

∂x(n)

  • 2

F

∂c(n)/∂x(n) is a Jacobian matrix

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 22 / 49

slide-42
SLIDE 42

Learning Manifolds I

How to learn manifolds with autoencoders? Contractive autoencoder [16]: regularize the code c such that it is invariant to local changes of x: Ω(c) = ∑

n

  • ∂c(n)

∂x(n)

  • 2

F

∂c(n)/∂x(n) is a Jacobian matrix

Encoder preserves local structures in code space

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 22 / 49

slide-43
SLIDE 43

Learning Manifolds II

In practice, it is easier to train a denoising autoencoder [20]:

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 23 / 49

slide-44
SLIDE 44

Learning Manifolds II

In practice, it is easier to train a denoising autoencoder [20]:

Encoder: to encode x with random noises Decoder: to reconstruct x without noises

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 23 / 49

slide-45
SLIDE 45

Tangent Vectors I

The code c represents a coordinate on a low dimensional manifold

E.g., the blue line

How to get the tangent vectors of a given point?

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 24 / 49

slide-46
SLIDE 46

Tangent Vectors II

Given a point x, let c be the code of x and J(x) = ∂c

∂x be the Jacobian

matrix of c at x

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 25 / 49

slide-47
SLIDE 47

Tangent Vectors II

Given a point x, let c be the code of x and J(x) = ∂c

∂x be the Jacobian

matrix of c at x J(x) summarizes how c changes in terms of x

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 25 / 49

slide-48
SLIDE 48

Tangent Vectors II

Given a point x, let c be the code of x and J(x) = ∂c

∂x be the Jacobian

matrix of c at x J(x) summarizes how c changes in terms of x Directions in the input space that changes c most should be tangent vectors

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 25 / 49

slide-49
SLIDE 49

Tangent Vectors II

Given a point x, let c be the code of x and J(x) = ∂c

∂x be the Jacobian

matrix of c at x J(x) summarizes how c changes in terms of x Directions in the input space that changes c most should be tangent vectors

1

Decompose J(x) using SVD such that J(x) = UDV>

2

Let tangent vectors be rows of V corresponding to the largest singular values in D

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 25 / 49

slide-50
SLIDE 50

Tangent Vectors III

In practice, J(x) usually has few large singular values Tangent vectors found by contractive/denoising autoencoders:

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 26 / 49

slide-51
SLIDE 51

Tangent Vectors III

In practice, J(x) usually has few large singular values Tangent vectors found by contractive/denoising autoencoders: Can be used by Tangent Prop [18]: Let {v(i,j)}j be tangent vectors of each example x(i) Trains an NN classifier f with cost penalty: Ω[f] = ∑i,j ∇xf(x(i))>v(i,j)

Points in the same manifold share the same label

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 26 / 49

slide-52
SLIDE 52

Outline

1

Unsupervised Learning

2

Predictive Learning

3

Autoencoders & Manifold Learning

4

Generative Adversarial Networks

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 27 / 49

slide-53
SLIDE 53

Decoder as Data Generator

Decoder can generate data points even with synthetic codes

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 28 / 49

slide-54
SLIDE 54

Decoder as Data Generator

Decoder can generate data points even with synthetic codes However, decoder just “remembers” the samples it has seen

Generated data are just combinations of training examples

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 28 / 49

slide-55
SLIDE 55

Generative Adversarial Networks (GANs)

Generative adversarial network (GAN) [4]: Generator: to generate data points from random codes Discriminator: to separate generated points from true ones

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 29 / 49

slide-56
SLIDE 56

Generative Adversarial Networks (GANs)

Generative adversarial network (GAN) [4]: Generator: to generate data points from random codes Discriminator: to separate generated points from true ones

Sigmoid output unit a(L) = ˆ ρ for P(y = true point|x) ⇠ Bernoulli(ρ)

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 29 / 49

slide-57
SLIDE 57

Generative Adversarial Networks (GANs)

Generative adversarial network (GAN) [4]: Generator: to generate data points from random codes Discriminator: to separate generated points from true ones

Sigmoid output unit a(L) = ˆ ρ for P(y = true point|x) ⇠ Bernoulli(ρ) Weights for x and ˆ x are tied

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 29 / 49

slide-58
SLIDE 58

Cost Function

In normal binary classification, we have the log likelihood logP(X|Θ) = ∑n logP(y(n) |x(n),Θ) = ∑n log h ( ˆ ρ(n))y(n)(1 ˆ ρ(n))(1y(n))i Cost function (N true points, M generated points): argmin

Θgenmax Θdis ∑ n

log ˆ ρ(n) +∑

m

log(1 ˆ ρ(m)) ˆ ρ(n) depends on Θdis only ˆ ρ(m)depends on both Θdis and Θgen

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 30 / 49

slide-59
SLIDE 59

Convolutional Generators

DC-GAN [14] for generating images:

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 31 / 49

slide-60
SLIDE 60

Convolutional Generators

DC-GAN [14] for generating images:

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 31 / 49

slide-61
SLIDE 61

Training: Alternative SGD

argmin

Θgenmax Θdis ∑ n

log ˆ ρ(n) +∑

m

log(1 ˆ ρ(m)) Alternate SGD:

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 32 / 49

slide-62
SLIDE 62

Training: Alternative SGD

argmin

Θgenmax Θdis ∑ n

log ˆ ρ(n) +∑

m

log(1 ˆ ρ(m)) Alternate SGD: Fix Θgen, apply a stochastic gradient ascent step on Θdis

∇ΘdisC involves both the first and second term

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 32 / 49

slide-63
SLIDE 63

Training: Alternative SGD

argmin

Θgenmax Θdis ∑ n

log ˆ ρ(n) +∑

m

log(1 ˆ ρ(m)) Alternate SGD: Fix Θgen, apply a stochastic gradient ascent step on Θdis

∇ΘdisC involves both the first and second term

Fix Θdis, apply a stochastic gradient ascent step on Θgen

∇ΘgenC involves only the second term

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 32 / 49

slide-64
SLIDE 64

Training: Challenges I

argmin

Θgenmax Θdis ∑ n

log ˆ ρ(n) +∑

m

log(1 ˆ ρ(m)) The goal is to find a saddle point However, most optimization algorithms are design to find minima

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 33 / 49

slide-65
SLIDE 65

Training: Challenges I

argmin

Θgenmax Θdis ∑ n

log ˆ ρ(n) +∑

m

log(1 ˆ ρ(m)) The goal is to find a saddle point However, most optimization algorithms are design to find minima Avoid momentum in training algorithm

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 33 / 49

slide-66
SLIDE 66

Training: Challenges II

argmin

Θgenmax Θdis ∑ n

log ˆ ρ(n) +∑

m

log(1 ˆ ρ(m))

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 34 / 49

slide-67
SLIDE 67

Training: Challenges II

argmin

Θgenmax Θdis ∑ n

log ˆ ρ(n) +∑

m

log(1 ˆ ρ(m)) Alternate SGD does not distinguish between minΘgenmaxΘdis and maxΘdisminΘgen

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 34 / 49

slide-68
SLIDE 68

Training: Challenges II

argmin

Θgenmax Θdis ∑ n

log ˆ ρ(n) +∑

m

log(1 ˆ ρ(m)) Alternate SGD does not distinguish between minΘgenmaxΘdis and maxΘdisminΘgen Mode collapsing: in maxΘdisminΘgen, generator maps every code to a single point that discriminator believes is most likely to be real

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 34 / 49

slide-69
SLIDE 69

Training: Challenges II

argmin

Θgenmax Θdis ∑ n

log ˆ ρ(n) +∑

m

log(1 ˆ ρ(m)) Alternate SGD does not distinguish between minΘgenmaxΘdis and maxΘdisminΘgen Mode collapsing: in maxΘdisminΘgen, generator maps every code to a single point that discriminator believes is most likely to be real

Then, discriminator can spot fake points by excluding the point

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 34 / 49

slide-70
SLIDE 70

Training: Challenges II

argmin

Θgenmax Θdis ∑ n

log ˆ ρ(n) +∑

m

log(1 ˆ ρ(m)) Alternate SGD does not distinguish between minΘgenmaxΘdis and maxΘdisminΘgen Mode collapsing: in maxΘdisminΘgen, generator maps every code to a single point that discriminator believes is most likely to be real

Then, discriminator can spot fake points by excluding the point

Discriminator and generator cancel each other’s progress during training steps

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 34 / 49

slide-71
SLIDE 71

Unrolled GANs

argmin

Θgenmax Θdis ∑ n

log ˆ ρ(n) +∑

m

log(1 ˆ ρ(m)) SGD ignores max operation when computing ∇ΘgenC

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 35 / 49

slide-72
SLIDE 72

Unrolled GANs

argmin

Θgenmax Θdis ∑ n

log ˆ ρ(n) +∑

m

log(1 ˆ ρ(m)) SGD ignores max operation when computing ∇ΘgenC Unrolled GANs [11]: to back-propagate through several max steps when computing ∇ΘgenC

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 35 / 49

slide-73
SLIDE 73

Minibatch Discrimination

In maxΘdisminΘgen, generator collapses because ∇ΘdisC are computed independently for each example

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 36 / 49

slide-74
SLIDE 74

Minibatch Discrimination

In maxΘdisminΘgen, generator collapses because ∇ΘdisC are computed independently for each example Generator cannot know if discriminator is excluding a single region

Cannot learn to generate dissimilar points

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 36 / 49

slide-75
SLIDE 75

Minibatch Discrimination

In maxΘdisminΘgen, generator collapses because ∇ΘdisC are computed independently for each example Generator cannot know if discriminator is excluding a single region

Cannot learn to generate dissimilar points

Minibatch discrimination [17]: to let discriminator look at multiple points when making predictions

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 36 / 49

slide-76
SLIDE 76

Minibatch Discrimination

In maxΘdisminΘgen, generator collapses because ∇ΘdisC are computed independently for each example Generator cannot know if discriminator is excluding a single region

Cannot learn to generate dissimilar points

Minibatch discrimination [17]: to let discriminator look at multiple points when making predictions Without vs. with minibatch discrimination:

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 36 / 49

slide-77
SLIDE 77

Other Difficulties

Counting Perspective Global structure

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 37 / 49

slide-78
SLIDE 78

Other Difficulties

Counting Perspective Global structure Still are research problems

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 37 / 49

slide-79
SLIDE 79

Code Space Arithmetics

DC-GAN [14] can learn to use codes in meaningful ways:

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 38 / 49

slide-80
SLIDE 80

Code Space Arithmetics

DC-GAN [14] can learn to use codes in meaningful ways: Finding codes for images with constraints [22, 1]

Demo 1 Demo 2 Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 38 / 49

slide-81
SLIDE 81

Learn to Encode in GANs

Adversarial feature learning [2, 3]:

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 39 / 49

slide-82
SLIDE 82

Learn to Encode in GANs

Adversarial feature learning [2, 3]:

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 39 / 49

slide-83
SLIDE 83

Conditional GAN I

How? “This bird is completely red with black wings and pointy beak.”

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 40 / 49

slide-84
SLIDE 84

Conditional GAN I

How? “This bird is completely red with black wings and pointy beak.” Text to image [15]:

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 40 / 49

slide-85
SLIDE 85

More GANs I

Super resolution [8]:

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 41 / 49

slide-86
SLIDE 86

More GANs II

Image-to-image translation [5]:

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 42 / 49

slide-87
SLIDE 87

Reference I

[1] Andrew Brock, Theodore Lim, JM Ritchie, and Nick Weston. Neural photo editing with introspective adversarial networks. arXiv preprint arXiv:1609.07093, 2016. [2] Jeff Donahue, Philipp Krähenbühl, and Trevor Darrell. Adversarial feature learning. arXiv preprint arXiv:1605.09782, 2016. [3] Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Alex Lamb, Martin Arjovsky, Olivier Mastropietro, and Aaron Courville. Adversarially learned inference. arXiv preprint arXiv:1606.00704, 2016.

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 43 / 49

slide-88
SLIDE 88

Reference II

[4] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680, 2014. [5] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004, 2016. [6] Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals, Alex Graves, and Koray Kavukcuoglu. Video pixel networks. arXiv preprint arXiv:1610.00527, 2016. [7] Quoc V Le and Tomas Mikolov. Distributed representations of sentences and documents. In ICML, volume 14, pages 1188–1196, 2014.

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 44 / 49

slide-89
SLIDE 89

Reference III

[8] Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint arXiv:1609.04802, 2016. [9] Daniel D Lee and H Sebastian Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788–791, 1999. [10] Daniel D Lee and H Sebastian Seung. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems, pages 556–562, 2001.

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 45 / 49

slide-90
SLIDE 90

Reference IV

[11] Luke Metz, Ben Poole, David Pfau, and Jascha Sohl-Dickstein. Unrolled generative adversarial networks. arXiv preprint arXiv:1611.02163, 2016. [12] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013. [13] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 46 / 49

slide-91
SLIDE 91

Reference V

[14] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015. [15] Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396, 2016. [16] Salah Rifai, Pascal Vincent, Xavier Muller, Xavier Glorot, and Yoshua Bengio. Contractive auto-encoders: Explicit invariance during feature extraction. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 833–840, 2011.

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 47 / 49

slide-92
SLIDE 92

Reference VI

[17] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in Neural Information Processing Systems, pages 2226–2234, 2016. [18] Patrice Simard, Bernard Victorri, Yann LeCun, and John S Denker. Tangent prop-a formalism for specifying selected invariances in an adaptive network. In NIPS, volume 91, pages 895–903, 1991. [19] Aaron van den Oord, Nal Kalchbrenner, Lasse Espeholt, Oriol Vinyals, Alex Graves, et al. Conditional image generation with pixelcnn decoders. In Advances in Neural Information Processing Systems, pages 4790–4798, 2016.

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 48 / 49

slide-93
SLIDE 93

Reference VII

[20] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning, pages 1096–1103. ACM, 2008. [21] Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. Anticipating the future by watching unlabeled video. arXiv preprint arXiv:1504.08023, 2015. [22] Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A Efros. Generative visual manipulation on the natural image manifold. In European Conference on Computer Vision, pages 597–613. Springer, 2016.

Shan-Hung Wu (CS, NTHU) Unsupervised Learning Machine Learning 49 / 49