UTLC Unsupervised Transfer Learning Challenge egoire Mesnil 1 , 2 , - - PowerPoint PPT Presentation

utlc unsupervised transfer learning challenge
SMART_READER_LITE
LIVE PREVIEW

UTLC Unsupervised Transfer Learning Challenge egoire Mesnil 1 , 2 , - - PowerPoint PPT Presentation

Introduction Deep Architecture Results Summary UTLC Unsupervised Transfer Learning Challenge egoire Mesnil 1 , 2 , Yann Dauphin 1 , Xavier Glorot 1 , Gr Salah Rifai 1 , Yoshua Bengio 1 et al. 1 LISA, Universit e de Montr eal, Canada 2


slide-1
SLIDE 1

Introduction Deep Architecture Results Summary

UTLC Unsupervised Transfer Learning Challenge

Gr´ egoire Mesnil1,2, Yann Dauphin1, Xavier Glorot1, Salah Rifai1, Yoshua Bengio1 et al.

1 LISA, Universit´

e de Montr´ eal, Canada

2 LITIS, Universit´

e de Rouen, France

July 2nd 2011

UTL Challenge, ICML Workshop 1/ 25

slide-2
SLIDE 2

Introduction Deep Architecture Results Summary

Plan

1

Introduction

2

Deep Architecture Preprocessing Feature Extraction Postprocessing

3

Results

4

Summary

UTL Challenge, ICML Workshop 2/ 25

slide-3
SLIDE 3

Introduction Deep Architecture Results Summary

UTL Challenge

Presentation

Dates : Phase 1 : Unsupervised Learning ; start : january 3, end : march 4. Phase 2 : Transfer Learning ; start : march 4, end : april 15. Five different Data sets : data set # samples dimension sparsity AVICENNA Arabic Manuscripts 150205 120 0 % HARRY Human actions 69652 5000 98 % RITA CIFAR-10 111808 7200 1 % SYLVESTER Ecology 572820 100 0 % TERRY NLP 217034 47236 99 %

UTL Challenge, ICML Workshop 3/ 25

slide-4
SLIDE 4

Introduction Deep Architecture Results Summary

UTL Challenge

Evaluation

ALC : Area under Learning Curve 1 to 64 samples per class

UTL Challenge, ICML Workshop 4/ 25

slide-5
SLIDE 5

Introduction Deep Architecture Results Summary

UTL Challenge

Performance

How to evaluate the performance of one model without any label or prior knowledge on the training set ?

UTL Challenge, ICML Workshop 5/ 25

slide-6
SLIDE 6

Introduction Deep Architecture Results Summary

UTL Challenge

Performance

How to evaluate the performance of one model without any label or prior knowledge on the training set ? proxy : ALC Valid versus Test (Phase 1)

UTL Challenge, ICML Workshop 5/ 25

slide-7
SLIDE 7

Introduction Deep Architecture Results Summary

UTL Challenge

Performance

How to evaluate the performance of one model without any label or prior knowledge on the training set ? proxy : ALC Valid versus Test (Phase 1) valid ALC returned by the competition servers (Phase 1 & 2)

UTL Challenge, ICML Workshop 5/ 25

slide-8
SLIDE 8

Introduction Deep Architecture Results Summary

UTL Challenge

Performance

How to evaluate the performance of one model without any label or prior knowledge on the training set ? proxy : ALC Valid versus Test (Phase 1) valid ALC returned by the competition servers (Phase 1 & 2) ALC with the given labels (Phase 2)

UTL Challenge, ICML Workshop 5/ 25

slide-9
SLIDE 9

Introduction Deep Architecture Results Summary

UTL Challenge

Performance

How to evaluate the performance of one model without any label or prior knowledge on the training set ? proxy : ALC Valid versus Test (Phase 1) valid ALC returned by the competition servers (Phase 1 & 2) ALC with the given labels (Phase 2)

UTL Challenge, ICML Workshop 5/ 25

slide-10
SLIDE 10

Introduction Deep Architecture Results Summary

UTL Challenge

Performance

How to evaluate the performance of one model without any label or prior knowledge on the training set ? proxy : ALC Valid versus Test (Phase 1) valid ALC returned by the competition servers (Phase 1 & 2) ALC with the given labels (Phase 2) From phase 1 to phase 2, we over-explored the hyperparameters of the next models to grab the 1st place.

UTL Challenge, ICML Workshop 5/ 25

slide-11
SLIDE 11

Introduction Deep Architecture Results Summary

Deep Architecture

Stack different blocks

We used this template :

1

Pre-processing : PCA w/wo whitening, Contrast Normalization, Uniformization

2

Feature Extraction : Rectifiers, DAE, CAE, µ-ss-RBM

3

Post-processing : Transductive PCA

UTL Challenge, ICML Workshop 6/ 25

slide-12
SLIDE 12

Introduction Deep Architecture Results Summary

Deep Architecture

Stack different blocks

We used this template :

1

Pre-processing : PCA w/wo whitening, Contrast Normalization, Uniformization

2

Feature Extraction : Rectifiers, DAE, CAE, µ-ss-RBM

3

Post-processing : Transductive PCA

UTL Challenge, ICML Workshop 6/ 25

slide-13
SLIDE 13

Introduction Deep Architecture Results Summary

Deep Architecture

Stack different blocks

We used this template :

1

Pre-processing : PCA w/wo whitening, Contrast Normalization, Uniformization

2

Feature Extraction : Rectifiers, DAE, CAE, µ-ss-RBM

3

Post-processing : Transductive PCA

UTL Challenge, ICML Workshop 6/ 25

slide-14
SLIDE 14

Introduction Deep Architecture Results Summary

Deep Architecture

Stack different blocks

We used this template :

1

Pre-processing : PCA w/wo whitening, Contrast Normalization, Uniformization

2

Feature Extraction : Rectifiers, DAE, CAE, µ-ss-RBM

3

Post-processing : Transductive PCA

UTL Challenge, ICML Workshop 6/ 25

slide-15
SLIDE 15

Introduction Deep Architecture Results Summary

Plan

1

Introduction

2

Deep Architecture Preprocessing Feature Extraction Postprocessing

3

Results

4

Summary

UTL Challenge, ICML Workshop 7/ 25

slide-16
SLIDE 16

Introduction Deep Architecture Results Summary

Preprocessing

Given a training set D = {x(j)}j=1...n where x(j) ∈ Rd : Uniformization (t-IDF) Rank all the x(j)

i

and map them to [0, 1] Contrast Normalization For each x(j), compute its mean µ(j) = d

i=1 x(j) i

and its deviation σ(j). x(j) ← (x(j) − µ(j))/σ(j) Principal Component Analysis with/without whitening i.e divide by the squared root eigen value or not.

UTL Challenge, ICML Workshop 8/ 25

slide-17
SLIDE 17

Introduction Deep Architecture Results Summary

Preprocessing

Given a training set D = {x(j)}j=1...n where x(j) ∈ Rd : Uniformization (t-IDF) Rank all the x(j)

i

and map them to [0, 1] Contrast Normalization For each x(j), compute its mean µ(j) = d

i=1 x(j) i

and its deviation σ(j). x(j) ← (x(j) − µ(j))/σ(j) Principal Component Analysis with/without whitening i.e divide by the squared root eigen value or not.

UTL Challenge, ICML Workshop 8/ 25

slide-18
SLIDE 18

Introduction Deep Architecture Results Summary

Preprocessing

Given a training set D = {x(j)}j=1...n where x(j) ∈ Rd : Uniformization (t-IDF) Rank all the x(j)

i

and map them to [0, 1] Contrast Normalization For each x(j), compute its mean µ(j) = d

i=1 x(j) i

and its deviation σ(j). x(j) ← (x(j) − µ(j))/σ(j) Principal Component Analysis with/without whitening i.e divide by the squared root eigen value or not.

UTL Challenge, ICML Workshop 8/ 25

slide-19
SLIDE 19

Introduction Deep Architecture Results Summary

Preprocessing

Given a training set D = {x(j)}j=1...n where x(j) ∈ Rd : Uniformization (t-IDF) Rank all the x(j)

i

and map them to [0, 1] Contrast Normalization For each x(j), compute its mean µ(j) = d

i=1 x(j) i

and its deviation σ(j). x(j) ← (x(j) − µ(j))/σ(j) Principal Component Analysis with/without whitening i.e divide by the squared root eigen value or not.

UTL Challenge, ICML Workshop 8/ 25

slide-20
SLIDE 20

Introduction Deep Architecture Results Summary

Plan

1

Introduction

2

Deep Architecture Preprocessing Feature Extraction Postprocessing

3

Results

4

Summary

UTL Challenge, ICML Workshop 9/ 25

slide-21
SLIDE 21

Introduction Deep Architecture Results Summary

Feature Extraction

µ-ss-RBM

µ-Spike & Slab Restricted Boltzmann Machine modelizes the interac- tion between three random vectors :

1

visible vector v representing the observed data

2

binary “spike” variables h

3

real-valued “slab” variables s

UTL Challenge, ICML Workshop 10/ 25

slide-22
SLIDE 22

Introduction Deep Architecture Results Summary

Feature Extraction

µ-ss-RBM

µ-Spike & Slab Restricted Boltzmann Machine modelizes the interac- tion between three random vectors :

1

visible vector v representing the observed data

2

binary “spike” variables h

3

real-valued “slab” variables s It is defined by the energy function :

E(v, s, h) = −

N

  • i=1

v TWisihi + 1 2v T

  • Λ +

N

  • i=1

Φihi

  • v

+

N

  • i=1

1 2sT

i αisi − N

  • i=1

µT

i αisihi − N

  • i=1

bihi +

N

  • i=1

µT

i αiµihi,

In training, we use Persistent Contrastive Divergence with a Gibbs Sampling procedure.

UTL Challenge, ICML Workshop 10/ 25

slide-23
SLIDE 23

Introduction Deep Architecture Results Summary

Feature Extraction

µ-ss-RBM

more details in A.Courville, J.Bergstra and Y.Bengio, Unsupervised Models of Images by Spike-and-Slab RBMs, ICML 2011. Pools of filters learned on CIFAR-10

UTL Challenge, ICML Workshop 11/ 25

slide-24
SLIDE 24

Introduction Deep Architecture Results Summary

Feature Extraction

Denoising Autoencoders

A Denoising Autoencoder is an autoencoder trained to denoise artifi- cially corrupted training samples. Corruption e.g ˜ x = x + ǫ where ǫ ∼ N(0, σ2) Encoder : h(˜ x) = s(W ˜ x + b) where s is the sigmoid function. Decoder : r(˜ x) = W T h(˜ x) + b

′ (tied weights). UTL Challenge, ICML Workshop 12/ 25

slide-25
SLIDE 25

Introduction Deep Architecture Results Summary

Feature Extraction

Denoising Autoencoders

A Denoising Autoencoder is an autoencoder trained to denoise artifi- cially corrupted training samples. Corruption e.g ˜ x = x + ǫ where ǫ ∼ N(0, σ2) Encoder : h(˜ x) = s(W ˜ x + b) where s is the sigmoid function. Decoder : r(˜ x) = W T h(˜ x) + b

′ (tied weights).

Different loss functions to be minimized using stochastic gradient de- scent : r(˜ x) − x2

2 (linear reconstruction and MSE)

s(r(˜ x)) − x2

2 (non-linear reconstruction)

i xi log r(˜

xi) − (1 − xi) log(1 − r(˜ xi)) (cross-entropy)

UTL Challenge, ICML Workshop 12/ 25

slide-26
SLIDE 26

Introduction Deep Architecture Results Summary

Feature Extraction

Contractive Autoencoders

A Contractive Autoencoder encourages an invariance of the represen- tation by penalizing the sensitivity of its encoder to the training inputs characterized with : Jf (x)2

F =

  • ij

∂hj(x) ∂xi 2

UTL Challenge, ICML Workshop 13/ 25

slide-27
SLIDE 27

Introduction Deep Architecture Results Summary

Feature Extraction

Contractive Autoencoders

A Contractive Autoencoder encourages an invariance of the represen- tation by penalizing the sensitivity of its encoder to the training inputs characterized with : Jf (x)2

F =

  • ij

∂hj(x) ∂xi 2 To avoid useless constant representations, this term is counterbalanced by a reconstruction error and use tied weights (decoder and encoder share the same weights) : s(r(x)) − x2

2 + λJf (x)2 F

where λ controls the tradeoff between both penalities.

UTL Challenge, ICML Workshop 13/ 25

slide-28
SLIDE 28

Introduction Deep Architecture Results Summary

Feature Extraction

Contractive Autoencoders

more details in S.Rifai, P.Vincent, X.Muller, X.Glorot and Y.Bengio Contractive Auto-Encoders : Explicit Invariance During Feature Extraction, ICML 2011. Random selection of 4000 filters learned on CIFAR-10

UTL Challenge, ICML Workshop 14/ 25

slide-29
SLIDE 29

Introduction Deep Architecture Results Summary

Feature Extraction

Rectifiers

Rectifiers use the activation function max(0, Wx + b) and therefore create sparse representation with true zeros. Those are used to be trained as Denoising Autoencoders. more details in X.Glorot, A.Bordes and Y.Bengio, Domain Adaptation for Large-Scale Sentiment Classification : A Deep Learning Approach, ICML 2011.

UTL Challenge, ICML Workshop 15/ 25

slide-30
SLIDE 30

Introduction Deep Architecture Results Summary

Feature Extraction

Rectifiers

Rectifiers use the activation function max(0, Wx + b) and therefore create sparse representation with true zeros. Those are used to be trained as Denoising Autoencoders. more details in X.Glorot, A.Bordes and Y.Bengio, Domain Adaptation for Large-Scale Sentiment Classification : A Deep Learning Approach, ICML 2011. For huge sparse distributions, e.g : input dimension is 50, 000 embedding dimension is 1, 000 = ⇒ decoding requires 50, 000, 000 operations. Expensive...

UTL Challenge, ICML Workshop 15/ 25

slide-31
SLIDE 31

Introduction Deep Architecture Results Summary

Feature Extraction

Reconstruction Sampling

Reconstruction sampling : reconstruct all the non-zeros elements and a small random subset of the zeros elements and speed-up training.

UTL Challenge, ICML Workshop 16/ 25

slide-32
SLIDE 32

Introduction Deep Architecture Results Summary

Feature Extraction

Reconstruction Sampling

Reconstruction sampling : reconstruct all the non-zeros elements and a small random subset of the zeros elements and speed-up training. more details in Y.Dauphin, X.Glorot and Y.Bengio, Large-Scale Learning

  • f Embeddings with Reconstruction Sampling, ICML 2011.

UTL Challenge, ICML Workshop 16/ 25

slide-33
SLIDE 33

Introduction Deep Architecture Results Summary

Plan

1

Introduction

2

Deep Architecture Preprocessing Feature Extraction Postprocessing

3

Results

4

Summary

UTL Challenge, ICML Workshop 17/ 25

slide-34
SLIDE 34

Introduction Deep Architecture Results Summary

Postprocessing

Transductive PCA

The feature extraction is performed on the training set while a Transductive PCA is a PCA trained not on the training set but on the valid (or test) set. Trained on the representation learned by the feature extraction process. Only retains dominant variations on the test or validation test. Validation of the number of components on the valid set (assume there is the same number of classes in the test and valid set).

UTL Challenge, ICML Workshop 18/ 25

slide-35
SLIDE 35

Introduction Deep Architecture Results Summary

Postprocessing

Transductive PCA

The feature extraction is performed on the training set while a Transductive PCA is a PCA trained not on the training set but on the valid (or test) set. Trained on the representation learned by the feature extraction process. Only retains dominant variations on the test or validation test. Validation of the number of components on the valid set (assume there is the same number of classes in the test and valid set).

UTL Challenge, ICML Workshop 18/ 25

slide-36
SLIDE 36

Introduction Deep Architecture Results Summary

Postprocessing

Transductive PCA

The feature extraction is performed on the training set while a Transductive PCA is a PCA trained not on the training set but on the valid (or test) set. Trained on the representation learned by the feature extraction process. Only retains dominant variations on the test or validation test. Validation of the number of components on the valid set (assume there is the same number of classes in the test and valid set).

UTL Challenge, ICML Workshop 18/ 25

slide-37
SLIDE 37

Introduction Deep Architecture Results Summary

Postprocessing

Transductive PCA

The feature extraction is performed on the training set while a Transductive PCA is a PCA trained not on the training set but on the valid (or test) set. Trained on the representation learned by the feature extraction process. Only retains dominant variations on the test or validation test. Validation of the number of components on the valid set (assume there is the same number of classes in the test and valid set).

UTL Challenge, ICML Workshop 18/ 25

slide-38
SLIDE 38

Introduction Deep Architecture Results Summary

Computation

How much time ?

From preprocessing to postprocessing, the time spent for training is at most 12 hours for every model...

UTL Challenge, ICML Workshop 19/ 25

slide-39
SLIDE 39

Introduction Deep Architecture Results Summary

Computation

How much time ?

From preprocessing to postprocessing, the time spent for training is at most 12 hours for every model... Once you have found the good hyperparameters ! And there is a lot.

UTL Challenge, ICML Workshop 19/ 25

slide-40
SLIDE 40

Introduction Deep Architecture Results Summary

Computation

How much time ?

From preprocessing to postprocessing, the time spent for training is at most 12 hours for every model... Once you have found the good hyperparameters ! And there is a lot. Software : Theano (Python Library) Hardware : GPU (Geforce GTX 580) http ://deeplearning.net/

UTL Challenge, ICML Workshop 19/ 25

slide-41
SLIDE 41

Introduction Deep Architecture Results Summary

Harry

Best model

input dimension is 5,000 (98% sparse) Human actions

UTL Challenge, ICML Workshop 20/ 25

slide-42
SLIDE 42

Introduction Deep Architecture Results Summary

Terry

Best model

input dimension is 47,236 (99% sparse) Natural Language Processing

UTL Challenge, ICML Workshop 21/ 25

slide-43
SLIDE 43

Introduction Deep Architecture Results Summary

Sylvester

Best model

input dimension is 100 (no sparsity) Ecology Stacking effect PCA-8

UTL Challenge, ICML Workshop 22/ 25

slide-44
SLIDE 44

Introduction Deep Architecture Results Summary

Sylvester

Best model

input dimension is 100 (no sparsity) Ecology Stacking effect PCA-8 // CAE-6

UTL Challenge, ICML Workshop 22/ 25

slide-45
SLIDE 45

Introduction Deep Architecture Results Summary

Sylvester

Best model

input dimension is 100 (no sparsity) Ecology Stacking effect PCA-8 // CAE-6 // CAE-6

UTL Challenge, ICML Workshop 22/ 25

slide-46
SLIDE 46

Introduction Deep Architecture Results Summary

Sylvester

Best model

input dimension is 100 (no sparsity) Ecology Stacking effect PCA-8 // CAE-6 // CAE-6 // PCA-1

UTL Challenge, ICML Workshop 22/ 25

slide-47
SLIDE 47

Introduction Deep Architecture Results Summary

Sylvester

Best model

input dimension is 100 (no sparsity) Ecology Stacking effect compared to raw data

UTL Challenge, ICML Workshop 22/ 25

slide-48
SLIDE 48

Introduction Deep Architecture Results Summary

Overall

Best models

ALC computed at each stage on the five data sets.

AVICENNA SYLVESTER RITA HARRY TERRY 0.0 0.2 0.4 0.6 0.8 1.0 ALC

VALID ALC by dataset and by step Raw Preproc

  • Feat. Extr.

Postproc

AVICENNA SYLVESTER RITA HARRY TERRY 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ALC

TEST ALC by dataset and by step

UTL Challenge, ICML Workshop 23/ 25

slide-49
SLIDE 49

Introduction Deep Architecture Results Summary

Summary

We proposed a successful deep approach decomposed in three steps :

1

Preprocessing

2

Feature Extraction

3

Postprocessig We ranked 4th in the phase 1 and 1st in the phase 2. more details in our JMLR paper : G.Mesnil, Y.Dauphin, X.Glorot, Y.bengio, et al. Unsupervised and Transfer Learning Challenge : a Deep Learning Approach. (to appear)

UTL Challenge, ICML Workshop 24/ 25

slide-50
SLIDE 50

Introduction Deep Architecture Results Summary

UTLC Unsupervised Transfer Learning Challenge

Gr´ egoire Mesnil1,2, Yann Dauphin1, Xavier Glorot1, Salah Rifai1, Yoshua Bengio1 et al.

1 LISA, Universit´

e de Montr´ eal, Canada

2 LITIS, Universit´

e de Rouen, France

Thanks for your attention. Questions?

UTL Challenge, ICML Workshop 25/ 25