UTLC Unsupervised Transfer Learning Challenge egoire Mesnil 1 , 2 , - PowerPoint PPT Presentation

Introduction Deep Architecture Results Summary UTLC Unsupervised Transfer Learning Challenge egoire Mesnil 1 , 2 , Yann Dauphin 1 , Xavier Glorot 1 , Gr´ Salah Rifai 1 , Yoshua Bengio 1 et al. 1 LISA, Universit´ e de Montr´ eal, Canada 2 LITIS, Universit´ e de Rouen, France July 2 nd 2011 UTL Challenge, ICML Workshop 1/ 25

Introduction Deep Architecture Results Summary Plan Introduction 1 Deep Architecture 2 Preprocessing Feature Extraction Postprocessing Results 3 Summary 4 UTL Challenge, ICML Workshop 2/ 25

Introduction Deep Architecture Results Summary UTL Challenge Presentation Dates : Phase 1 : Unsupervised Learning ; start : january 3, end : march 4. Phase 2 : Transfer Learning ; start : march 4, end : april 15. Five different Data sets : data set # samples dimension sparsity AVICENNA Arabic Manuscripts 150205 120 0 % HARRY Human actions 69652 5000 98 % RITA CIFAR-10 111808 7200 1 % SYLVESTER Ecology 572820 100 0 % TERRY NLP 217034 47236 99 % UTL Challenge, ICML Workshop 3/ 25

Introduction Deep Architecture Results Summary UTL Challenge Evaluation ALC : Area under Learning Curve 1 to 64 samples per class UTL Challenge, ICML Workshop 4/ 25

Introduction Deep Architecture Results Summary UTL Challenge Performance How to evaluate the performance of one model without any label or prior knowledge on the training set ? UTL Challenge, ICML Workshop 5/ 25

Introduction Deep Architecture Results Summary UTL Challenge Performance How to evaluate the performance of one model without any label or prior knowledge on the training set ? proxy : ALC Valid versus Test (Phase 1) UTL Challenge, ICML Workshop 5/ 25

Introduction Deep Architecture Results Summary UTL Challenge Performance How to evaluate the performance of one model without any label or prior knowledge on the training set ? proxy : ALC Valid versus Test (Phase 1) valid ALC returned by the competition servers (Phase 1 & 2) UTL Challenge, ICML Workshop 5/ 25

Introduction Deep Architecture Results Summary UTL Challenge Performance How to evaluate the performance of one model without any label or prior knowledge on the training set ? proxy : ALC Valid versus Test (Phase 1) valid ALC returned by the competition servers (Phase 1 & 2) ALC with the given labels (Phase 2) UTL Challenge, ICML Workshop 5/ 25

Introduction Deep Architecture Results Summary UTL Challenge Performance How to evaluate the performance of one model without any label or prior knowledge on the training set ? proxy : ALC Valid versus Test (Phase 1) valid ALC returned by the competition servers (Phase 1 & 2) ALC with the given labels (Phase 2) From phase 1 to phase 2, we over-explored the hyperparameters of the next models to grab the 1 st place. UTL Challenge, ICML Workshop 5/ 25

Introduction Deep Architecture Results Summary Deep Architecture Stack different blocks We used this template : Pre-processing : PCA w/wo whitening, Contrast Normalization, 1 Uniformization Feature Extraction : Rectifiers, DAE, CAE, µ -ss-RBM 2 Post-processing : Transductive PCA 3 UTL Challenge, ICML Workshop 6/ 25

Introduction Deep Architecture Results Summary Preprocessing Given a training set D = { x ( j ) } j =1 ... n where x ( j ) ∈ R d : Uniformization (t-IDF) Rank all the x ( j ) and map them to [0 , 1] i Contrast Normalization For each x ( j ) , compute its mean µ ( j ) = � d i =1 x ( j ) and its i deviation σ ( j ) . x ( j ) ← ( x ( j ) − µ ( j ) ) /σ ( j ) Principal Component Analysis with/without whitening i.e divide by the squared root eigen value or not. UTL Challenge, ICML Workshop 8/ 25

Introduction Deep Architecture Results Summary Feature Extraction µ -ss-RBM µ -Spike & Slab Restricted Boltzmann Machine modelizes the interac- tion between three random vectors : visible vector v representing the observed data 1 binary “ spike ” variables h 2 real-valued “ slab ” variables s 3 UTL Challenge, ICML Workshop 10/ 25

Introduction Deep Architecture Results Summary Feature Extraction µ -ss-RBM µ -Spike & Slab Restricted Boltzmann Machine modelizes the interac- tion between three random vectors : visible vector v representing the observed data 1 binary “ spike ” variables h 2 real-valued “ slab ” variables s 3 It is defined by the energy function : N � N � v T W i s i h i + 1 � 2 v T � E ( v , s , h ) = − Λ + Φ i h i v i =1 i =1 N N N N 1 � � � � 2 s T µ T µ T + i α i s i − i α i s i h i − b i h i + i α i µ i h i , i =1 i =1 i =1 i =1 In training, we use Persistent Contrastive Divergence with a Gibbs Sampling procedure. UTL Challenge, ICML Workshop 10/ 25

Introduction Deep Architecture Results Summary Feature Extraction µ -ss-RBM more details in A.Courville, J.Bergstra and Y.Bengio, Unsupervised Models of Images by Spike-and-Slab RBMs , ICML 2011 . Pools of filters learned on CIFAR-10 UTL Challenge, ICML Workshop 11/ 25

Introduction Deep Architecture Results Summary Feature Extraction Denoising Autoencoders A Denoising Autoencoder is an autoencoder trained to denoise artifi- cially corrupted training samples. x = x + ǫ where ǫ ∼ N (0 , σ 2 ) Corruption e.g ˜ Encoder : h (˜ x ) = s ( W ˜ x + b ) where s is the sigmoid function. ′ (tied weights). x ) = W T h (˜ Decoder : r (˜ x ) + b UTL Challenge, ICML Workshop 12/ 25

Introduction Deep Architecture Results Summary Feature Extraction Denoising Autoencoders A Denoising Autoencoder is an autoencoder trained to denoise artifi- cially corrupted training samples. x = x + ǫ where ǫ ∼ N (0 , σ 2 ) Corruption e.g ˜ Encoder : h (˜ x ) = s ( W ˜ x + b ) where s is the sigmoid function. ′ (tied weights). x ) = W T h (˜ Decoder : r (˜ x ) + b Different loss functions to be minimized using stochastic gradient de- scent : x ) − x � 2 � r (˜ 2 (linear reconstruction and MSE) x )) − x � 2 � s ( r (˜ 2 (non-linear reconstruction) − � i x i log r (˜ x i ) − (1 − x i ) log(1 − r (˜ x i )) (cross-entropy) UTL Challenge, ICML Workshop 12/ 25

UTLC Unsupervised Transfer Learning Challenge egoire Mesnil 1 , 2 , - PowerPoint PPT Presentation

Introduction Deep Architecture Results Summary UTLC Unsupervised Transfer Learning Challenge egoire Mesnil 1 , 2 , Yann Dauphin 1 , Xavier Glorot 1 , Gr Salah Rifai 1 , Yoshua Bengio 1 et al. 1 LISA, Universit e de Montr eal, Canada 2

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Machine Learning for NLP Unsupervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Industrial Transfer Learning Introduction to Industrial Transfer Learning Industrial Transfer

Radiative Transfer Radiative Transfer Radiative transfer is a branch of atmospheric physics. We

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

Place recognition with instance search from hand-crafted to learning-based methods Giorgos Tolias

Outline Evaluating Models of Natural Image Patches Evaluating Models Comparing Whitening

Non-Negative and Geodesic approaches to Independent Component Analysis Mark Plumbley Queen Mary,

Machine Learning for Signal Processing Independent Component Analysis Class 10. 6 Oct 2016

Wh Why using ing artifici icial al intelligenc igence e in the search rch for gr gravita

Finite Alphabet Estimation Graham C. Goodwin Day 5: Lecture 3 17th September 2004 International

Adaptive Control Chapter 5: Recursive plant model identification in open loop 1 Adaptive Control

Visual Instance Retrieval Praveen Krishnan CVIT, IIIT Hyderabad June 15, 2017 1 Outline Image