A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle - PowerPoint PPT Presentation

Introduction Generative models Low density separation Graph based methods Unsupervised learning Conclusions A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005

Introduction Generative models Low density separation Graph based methods Unsupervised learning Conclusions Outline Introduction 1 Generative models 2 Low density separation 3 Graph based methods 4 Unsupervised learning 5 Conclusions 6

Introduction Generative models Low density separation Graph based methods Unsupervised learning Conclusions The semi-supervised learning (SSL) paradigm We consider here the problem of binary classification. Definition (Supervised learning) Given a training set { ( x i , y i ) } estimate a decision function (or more generally a probability P ( y | x )). Definition (Semi-supervised learning) Same goal as supervised learning, but in addition a set of unlabeled points { x ′ i } is available. Typically, much more unlabeled data than labeled data. Note: differs from the related notion of transduction.

Introduction Generative models Low density separation Graph based methods Unsupervised learning Conclusions Are unlabeled data useful ?

Introduction Generative models Low density separation Graph based methods Unsupervised learning Conclusions Are unlabeled data useful ? No

Introduction Generative models Low density separation Graph based methods Unsupervised learning Conclusions Are unlabeled data useful ?

Introduction Generative models Low density separation Graph based methods Unsupervised learning Conclusions Are unlabeled data useful ? Yes !

Introduction Generative models Low density separation Graph based methods Unsupervised learning Conclusions Are unlabeled data useful ? Well, not sure.

Introduction Generative models Low density separation Graph based methods Unsupervised learning Conclusions The cluster assumption Need for assumption Standard supervised assumption Two points which are near are likely to be of the same label. Cluster assumption Two points which are in the same cluster (i.e. which are linked by a high density path) are likely to be of the same label. Equivalently, Low density separation assumption The decision boundary should lie in a low density region.

Introduction Generative models Low density separation Graph based methods Unsupervised learning Conclusions The cluster assumption This assumption seems sensible for a lot of real world datasets. It is used in nearly all SSL algorithms, but most of the time implicitly. No equivalent formulation for regression. It seems that SSL is not very useful for regression.

Introduction Generative models Low density separation Graph based methods Unsupervised learning Conclusions Infinite amount of unlabeled data A core fundamental question that an SSL algorithm should tackle is What should I do if I knew exactly the marginal distribution P ( x ) ? Semi-supervised algorithms should be seen as a special case of this limiting case. Unfortunately, lack of research in this direction. Probably due to historical reasons: for supervised learning, when P ( x , y ) is known, classification is trivial.

Introduction Generative models Low density separation Graph based methods Unsupervised learning Conclusions Generative vs discriminative learning Generative learning 1 For each y , learn the class conditional density P ( x | y , θ ) (and also the class prior P ( y | θ )). 2 For a test point x , compute P ( y | x , θ ) ∝ P ( x | y , θ ) P ( y | θ ). [Bayes rule] Discriminative learning Learn directly P ( y | x ) (or a decision function). Generative learning was popular in the 70s. Main advantage of discriminative learning: it avoids the difficult step of modeling class conditional densities. Nowadays, discriminative classifiers are usually preferred.

Introduction Generative models Low density separation Graph based methods Unsupervised learning Conclusions Outline Introduction 1 Generative models 2 Low density separation 3 Graph based methods 4 Unsupervised learning 5 Conclusions 6

Introduction Generative models Low density separation Graph based methods Unsupervised learning Conclusions Generative models It is straightforward to use unlabeled data in a generative model: Find the model parameters θ maximizing the log-likelihood of the labeled and unlabeled data, � � � P ( x ′ log( P ( x i | y i , θ ) P ( y i | θ ) ) + log( i | y , θ ) P ( y | θ ) ) . i i y � �� P ( x i , y i | θ ) P ( x ′ i | θ ) Simplest example: each class has a Gaussian distribution. This is a missing value problem. − → Can be learned with the Expectation-Maximization (EM) algorithm.

Introduction Generative models Low density separation Graph based methods Unsupervised learning Conclusions Generative learning - EM EM is used to maximize the likelihood of model with hidden variables. EM algorithm for SSL E-step: compute q i ( y ) = P ( y | x ′ i , θ ) M-step: maximize over θ , � � � q i ( y ) log( P ( x ′ log( P ( x i | y i , θ ) P ( y i | θ ))+ i | y , θ ) P ( y | θ )) y i i Nice interpretation and relation to self-learning: E-step: estimate the labels according to the current decision function. M-step: estimate the decision function with the current labels.

Introduction Generative models Low density separation Graph based methods Unsupervised learning Conclusions Toy example Class conditional density is Gaussian. Demo EM

Introduction Generative models Low density separation Graph based methods Unsupervised learning Conclusions Experiments on text classification Nigam et al, Text Classification from Labeled and Unlabeled Documents Using EM , Machine Learning, 2000 Bag of words representation 100% 10000 unlabeled documents No unlabeled documents 90% Multinomial distribution 80% 70% � θ x w 60% P ( x | , y , θ ) = Accuracy w | y 50% words 40% 30% − → Naive Bayes classifier 20% 10% Several components per class 0% 10 20 50 100 200 500 1000 2000 5000 Number of Labeled Documents 20 Newsgroups dataset Intuition: SSL detects words co-occurrence.

Introduction Generative models Low density separation Graph based methods Unsupervised learning Conclusions Analysis of generative methods Advantages Easy to use Unlabeled data are very useful. − → In the limit, they determine the decision boundary (labeled points are only useful for the direction). Drawback Usually, the model is misspecified. − → There is no θ such that P ( x ) ≡ P ( x | θ ). Unlabeled data can be misleading since Maximum Likelihood tries to model P ( x ) rather than P ( y | x ). Note: the cluster assumption is not explicitly stated, but implied by standard models such as mixture of Gaussians.

A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle - PowerPoint PPT Presentation

Introduction Generative models Low density separation Graph based methods Unsupervised learning Conclusions A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005

NCTracks Taxonomy Presentation Agenda Taxonomy Code Information Using Taxonomy Codes in

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Introduction to Plant Taxonomy Introduction to Plant Taxonomy (See P. 1169) (See P. 1169)

Taxonomy Jrg Cassens Data and Process Visualization SoSe 2017 SoSe 2017 Jrg Cassens

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

Parallelizing Semi- ReDAS Lab Supervised Learning Algorithms with MapReduce Nick Gauthier

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

Realistic Evaluation of Deep Semi-Supervised Learning Algorithms Avital Oliver* Augustus Odena*

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

5 Semi-Supervised Learning BVM Tutorial: Advanced Deep Learning Methods David Zimmerer, Division

Structural Improvements Committee Report of Activities Ray Plzak Committee Chair Members

Structural Improvements Committee Report of Activities Ray Plzak Committee Chair Members

Success Criteria I can say the names of objects in a pencil case. I can use the infinitive

Nurse Leadership on Climate Change Katie Huffling, MS '06, RN, CNM, FAAN Cara Cook, MS '15, RN,

DISTRICT OF SQUAMISH Minutes of the Meeting of the Committee of the Whole held Tuesday, April 18,

I NTERNET GOVERNANCE DI SCUSSI ON AND W SI S Bertrand de LA CHAPELLE Director

THE FUTURE OF URBAN LIVING New forms of work, planning for the unknown in Amsterdam The

Safeguarding urban areas confronting climate trends and extreme weather by means of a