Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: - PowerPoint PPT Presentation

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning. Encyclopedia of Machine • Learning. Jerry Zhu, 2010 Combining Labeled and Unlabeled Data with Co- • Training. Avrim Blum, Tom Mitchell. COLT 1998.

Fully Supervised Learning Data Distribution D on X Source Expert / Oracle Learning Algorithm Labeled Examples (x 1 ,c*(x 1 )),…, ( x m ,c*(x m )) c* : X ! Y Alg.outputs h : X ! Y x 1 > 5 + + - + - +1 + x 6 > 2 - - - - -1 +1

Fully Supervised Learning Data Distribution D on X Source Expert / Oracle Learning Algorithm Labeled Examples (x 1 ,c*(x 1 )),…, ( x m ,c*(x m )) c* : X ! Y Alg.outputs h : X ! Y S l ={( x 1 , y 1 ) , …,( x m l , y m l )} Goal : h has small error over D. x i drawn i.i.d from D, y i = c ∗ (x i ) err D h = Pr x~ D (h x ≠ c ∗ (x))

Two Core Aspects of Supervised Learning Computation Algorithm Design. How to optimize? Automatically generate rules that do well on observed data. • E.g.: Naïve Bayes, logistic regression, SVM, Adaboost, etc. Confidence Bounds, Generalization (Labeled) Data Confidence for rule effectiveness on future data. • VC-dimension, Rademacher complexity, margin based bounds, etc.

Classic Paradigm Insufficient Nowadays Modern applications: massive amounts of raw data. Only a tiny fraction can be annotated by human experts. Protein sequences Billions of webpages Images

Modern ML: New Learning Approaches Modern applications: massive amounts of raw data. Techniques that best utilize data, minimizing need for expert/human intervention. Paradigms where there has been great progress. • Semi-supervised Learning, (Inter)active Learning. Expert

Semi-Supervised Learning Data Source Learning Expert / Oracle Unlabeled Algorithm Unlabeled examples examples Labeled Examples Algorithm outputs a classifier S l ={( x 1 , y 1 ) , …,( x m l , y m l )} Goal : h has small error over D. x i drawn i.i.d from D, y i = c ∗ (x i ) err D h = Pr x~ D (h x ≠ c ∗ (x)) S u ={ x 1 , …, x m u } drawn i.i.d from D

Semi-supervised Learning • Major topic of research in ML. • Several methods have been developed to try to use unlabeled data to improve performance, e.g.: – Transductive SVM [Joachims ’99] Test of time – Co-training [Blum & Mitchell ’98] awards at ICML ! – Graph-based methods [B&C01], [ZGL03] Workshops [ICML ’03, ICML’ 05, …] Books: Semi-Supervised Learning, MIT 2006 • O. Chapelle, B. Scholkopf and A. Zien (eds) Introduction to Semi-Supervised Learning, • Morgan & Claypool, 2009 Zhu & Goldberg

Semi-supervised Learning • Major topic of research in ML. • Several methods have been developed to try to use unlabeled data to improve performance, e.g.: – Transductive SVM [Joachims ’99] Test of time – Co-training [Blum & Mitchell ’98] awards at ICML ! – Graph-based methods [B&C01], [ZGL03] Both wide spread applications and solid foundational understanding!!!

Semi-supervised Learning • Major topic of research in ML. • Several methods have been developed to try to use unlabeled data to improve performance, e.g.: – Transductive SVM [Joachims ’99] Test of time – Co-training [Blum & Mitchell ’98] awards at ICML ! – Graph-based methods [B&C01], [ZGL03] Today: discuss these methods. Very interesting, they all exploit unlabeled data in different, very interesting and creative ways.

Semi-supervised learning: no querying. Just have lots of additional unlabeled data. A bit puzzling; unclear what unlabeled data can do for us…. It is missing the most important info. How can it help us in substantial ways? Key Insight Unlabeled data useful if we have beliefs not only about the form of the target, but also about its relationship with the underlying distribution.

Semi-supervised SVM [Joachims ’99]

Margins based regularity Target goes through low density regions (large margin). • assume we are looking for linear separator • belief: should exist one with large separation _ + _ _ + + + _ + + _ _ SVM Transductive SVM Labeled data only

Transductive Support Vector Machines Optimize for the separator with large margin wrt labeled and 0 unlabeled data. [Joachims ’99] 0 0 0 0 0 0 0 0 w’ 0 𝑥’ ⋅ 𝑦 = −1 0 0 0 0 0 + 0 0 0 0 0 + 0 0 0 0 - + Input: S l ={( x 1 , y 1 ) , …,( x m l , y m l )} 0 0 0 0 0 0 0 0 + 0 0 0 - 0 0 0 0 S u ={ x 1 , …, x m u } + - 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 - 0 0 0 0 2 s.t.: - 0 𝑥’ ⋅ 𝑦 = 1 0 0 - argmin w w 0 0 0 0 0 0 y i w ⋅ x i ≥ 1 , for all i ∈ {1, … , m l } 0 0 • 0 • y u w ⋅ x u ≥ 1 , for all u ∈ {1, … , m u } • y u ∈ {−1, 1} for all u ∈ {1, … , m u } Find a labeling of the unlabeled sample and 𝑥 s.t. 𝑥 separates both labeled and unlabeled data with maximum margin.

Transductive Support Vector Machines Optimize for the separator with large margin wrt labeled and 0 unlabeled data. [Joachims ’99] 0 0 0 0 0 0 0 0 w’ 0 𝑥’ ⋅ 𝑦 = −1 0 0 0 0 0 + 0 0 0 0 0 + 0 0 0 0 - + Input: S l ={( x 1 , y 1 ) , …,( x m l , y m l )} 0 0 0 0 0 0 0 0 + 0 0 0 - 0 0 0 0 S u ={ x 1 , …, x m u } + - 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 - 0 2 + 𝐷 𝜊 𝑗 0 0 0 - 0 + 𝐷 𝜊 𝑣 𝑥’ ⋅ 𝑦 = 1 0 0 - argmin w w 0 0 0 𝑣 𝑗 0 0 0 y i w ⋅ x i ≥ 1 - 𝜊 𝑗 , for all i ∈ {1, … , m l } 0 0 • 0 , for all u ∈ {1, … , m u } • y u w ⋅ x u ≥ 1 − 𝜊 𝑣 • y u ∈ {−1, 1} for all u ∈ {1, … , m u } Find a labeling of the unlabeled sample and 𝑥 s.t. 𝑥 separates both labeled and unlabeled data with maximum margin.

Transductive Support Vector Machines Optimize for the separator with large margin wrt labeled and 0 unlabeled data. Input: S l ={( x 1 , y 1 ) , …,( x m l , y m l )} S u ={ x 1 , …, x m u } 2 + 𝐷 𝜊 𝑗 + 𝐷 𝜊 𝑣 argmin w w 𝑣 𝑗 y i w ⋅ x i ≥ 1 - 𝜊 𝑗 , for all i ∈ {1, … , m l } • , for all u ∈ {1, … , m u } • y u w ⋅ x u ≥ 1 − 𝜊 𝑣 • y u ∈ {−1, 1} for all u ∈ {1, … , m u } NP- hard….. Convex only after you guessed the labels… too many possible guesses…

Transductive Support Vector Machines Optimize for the separator with large margin wrt labeled and unlabeled data. Heuristic (Joachims) high level idea: • First maximize margin over the labeled points • Use this to give initial labels to unlabeled points based on this separator. • Try flipping labels of unlabeled points to see if doing so can increase margin Keep going until no more improvements. Finds a locally-optimal solution.

Experiments [Joachims99]

Transductive Support Vector Machines Helpful distribution + _ + Highly compatible _ Non-helpful distributions Margin not satisfied Margin satisfied 1/ ° 2 clusters, all partitions separable by large margin

Co-training [Blum & Mitchell ’98] Different type of underlying regularity assumption: Consistency or Agreement Between Parts

Co-training: Self-consistency Agreement between two parts : co-training [Blum-Mitchell98]. - examples contain two sufficient sets of features, x = h x 1 , x 2 i - belief: the parts are consistent, i.e. 9 c 1 , c 2 s.t. c 1 (x 1 )=c 2 (x 2 )=c * (x) For example, if we want to classify web pages: x = h x 1 , x 2 i as faculty member homepage or not Prof. Avrim Blum My Advisor Prof. Avrim Blum My Advisor x - Link info & Text info x 2 - Link info x 1 - Text info

Iterative Co-Training Idea : Use small labeled sample to learn initial rules. E.g., “my advisor” pointing to a page is a good indicator it is a • faculty home page. E.g., “I am teaching” on a page is a good indicator it is a faculty • home page. Idea : Use unlabeled data to propagate learned information. my advisor

Iterative Co-Training Idea : Use small labeled sample to learn initial rules. E.g., “my advisor” pointing to a page is a good indicator it is a • faculty home page. E.g., “I am teaching” on a page is a good indicator it is a faculty • home page. Idea : Use unlabeled data to propagate learned information. Look for unlabeled examples where one rule is confident and the other is not. Have it label the example for the other. h x 1 ,x 2 i h x 1 ,x 2 i h x 1 ,x 2 i h x 1 ,x 2 i h x 1 ,x 2 i h x 1 ,x 2 i Training 2 classifiers, one on each type of info. Using each to help train the other.

Iterative Co-Training X 2 X 1 Works by using unlabeled data to + propagate learned information. + h 1 h + • Have learning algos A 1 , A 2 on each of the two views. • Use labeled data to learn two initial hyp. h 1 , h 2 . Repeat • Look through unlabeled data to find examples where one of h i is confident but other is not. • Have the confident h i label it for algorithm A 3-i .

Original Application: Webpage classification 12 labeled examples, 1000 unlabeled (sample run)

Iterative Co-Training A Simple Example: Learning Intervals Labeled examples Unlabeled examples + h 2 1 c 2 - - c 1 h 1 1 Use labeled data to learn h 1 1 and h 2 1 Use unlabeled data to bootstrap h 2 2 h 2 1 h 1 2 h 1 2

Expansion, Examples: Learning Intervals Consistency: zero probability mass in the regions c 1 c 2 Non-expanding (non-helpful) Expanding distribution distribution D + c 1 c 1 D + S 2 S 1 c 2 c 2

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: - PowerPoint PPT Presentation

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning. Encyclopedia of Machine Learning. Jerry Zhu, 2010 Combining Labeled and Unlabeled Data with Co- Training. Avrim Blum, Tom Mitchell. COLT

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

5 Semi-Supervised Learning BVM Tutorial: Advanced Deep Learning Methods David Zimmerer, Division

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

10701 Semi supervised learning Can Unlabeled Data improve supervised learning? Important

Parallelizing Semi- ReDAS Lab Supervised Learning Algorithms with MapReduce Nick Gauthier

Keepin It Real: Semi-Supervised Learning with Realistic Tuning Andrew B. Goldberg Xiaojin

Semi-Supervised Learning Tutorial Xiaojin Zhu Department of Computer Sciences University of

Classification Semi-supervised learning based on network Speakers: Hanwen Wang, Xinxin Huang, and

IJ in Slovenia by Violeta Bulc www.vibacom.si www.iprk.si www.innovationjournalism.si 5th IJ

Midland Section ACS Board Meeting April 6, 2020 Agenda Time Topic Presenter 7:00 Call to

Health Interview Survey James Dahlhamer, Aaron Maitland, Ben Zablotsky, Adena Galinsky National

Presented by Sponsored by Presented by Sponsored by Audio instructions Select Computer

Office of Research and Econom ic Developm ent ecrt Effort Coordinator Training June 2 0 1 5 1

Oc tobe r 2017 Me mbe rs Me e ting Anno unc e me nts Sponsor ship Dr ive T ha nks to

ESTABLISHING A POSITIVE SAFETY CULTURE FOR 10 CFR PARTS 71 AND 72 Office Allegation Coordinator

Searching for Grant- Funding Opportunities Mark van t Hooft Office of Sponsored Programs