Co-Training Based on Combining Labeled and Unlabeled Data with - PowerPoint PPT Presentation

0. Co-Training Based on “Combining Labeled and Unlabeled Data with Co-Training” by A. Blum & T. Mitchell, 1998

1. Problem: Learning to classify data (ex: web pages) when the description of each example can be partitioned in 2 distinct views. Assumption: Either view of the example would be sufficient for learning if we had enough labeled data, but: Goal: use both views to allow inexpensive unlabeled data to augment a much smaller set of labeled examples. Idea: 2 learning algorithms are trained separately on each view. Then each algorithm’s predictions on new unlabeled examples are used to enlarge the training set of the other. Empirical result on real data: The use of unlabeled examples can lead to significant improvement of hypotheses in prac- tice.

2. Not presented here: (see the paper) Theoretical goal: Provide a PAC-style analysis for this setting. More general: Provide a PAC-style framework for the general problem of learning from both labeled and unlabeled data.

3. Example Classify web pages at CS departments at some universities as belonging or not to faculty members. Views: 1. the text appearing on the document itself 2. the anchor text attached to hyperlinks pointing to this page from other pages on the web. Use weak predictors, like 1. “research interests” 2. “my advisor” Pages pointed to by links having the phrase “my advisor” can be used as ‘probably positive’ examples to further train a learning algorithm based on the words on the text page, and vice-versa.

4. Co-training Algorithm Input: L , a set of labeled training examples U , a set of unlabeled examples Create a pool U ′ of examples by choosing u examples at random from U . Loop for k iterations: use L to train a classifier h 1 that considers only the x 1 view of x use L to train a classifier h 2 that considers only the x 2 view of x select from U ′ p most confidently labeled by h 1 as positive examples select from U ′ n most confidently labeled by h 1 as negative examples select from U ′ p most confidently labeled by h 2 as positive examples select from U ′ n most confidently labeled by h 2 as negative examples add these self-labeled examples to L randomly choose 2p+2n examples from U to replenish U ′

5. Working example Classify course home pages 1051 web pages at CS departments at several universities: Cornell, Washington, Wisconsin, and Texas 22% course pages 263 (25%) were first selected as a test set; from the remaining data it was generated L , the set of labeled examples, by selecting at random 9 negative examples and 3 positive examples; the ramining examples form U , the set of unlabeled examples. use a Naive Bayes classifier for each of the two views.

6. Results page-based hyperlink-based combined classifier classifier classifier 12 . 9 12 . 4 11 . 1 supervised training 6 . 2 11 . 6 5 . 0 co-training Explanation: The combined classifier uses the naive independent assupmtion: P ( Y | h 1 ∧ h 2 ) = P ( Y | h 1 ) P ( Y | h 2 ) Conclusion: The co-trained classifier outperforms the classifier formed by supervised training.

7. Onother suggested practical application Classifying segments of TV broadcasts, for instance: learning to identify televised segments containing the US president. Views: X 1 – video images, X 2 – audio signals. Weakly predictive recognizers: 1. one that spots full frontal images of the president’s face 2. one that spots his voice when no background is present. Use co-training to improve the accuracy of both calssifiers.

8. Onother suggested practical application Robot training, recognizing an open doorway using a collection of vision ( X 1 ), sonar ( X 2 ) and laser range ( X 3 ) sensors.

Co-Training Based on Combining Labeled and Unlabeled Data with - PowerPoint PPT Presentation

0. Co-Training Based on Combining Labeled and Unlabeled Data with Co-Training by A. Blum & T. Mitchell, 1998 1. Problem: Learning to classify data (ex: web pages) when the description of each example can be partitioned in 2

Compliance Training 2012 Compliance Training 2012 Training Objectives Training Objectives

New Staff Training Training Site Development Training Site Development 2 Training Site

Product Features Technical Training 2007 Technical Training 2007 Technical Training 2007

Food Handler Training Food Handler Training Food Handler Training Food Handler Training Online

Service Section Service Section Technical Training Technical Training Technical Training

LAS Links Online Administration Training 1 Coordinator and Proctor Training Agenda Training

Presentation Health and Employability Training 01 02 Health and Employability Training Health

TRAINING INTER STATE STUDY TOUR TO NDRI KARNAL TRAINING INTER STATE STUDY TOUR TO

1 Facilitator Training Facilitator Training Facilitator Training Facilitator Training 2

Potty Training in Potty Training in Potty Training in Potty Training in Four Days Four Days

Colorado Altitude Training Altitude Training for Aviators 1 Problem Statement Hypoxia training

Leadplane Training Course Leadplane Training Course Target Descriptions Leadplane Training

GSA OLU End-User Training GSA OLU End-User Training Training Objectives How to navigate the

Planning Commission Training Washoe County Planning Commission Training Mt. Rose Conference Room

THE TRAINING LAYOFF SCHEME THE TRAINING LAYOFF SCHEME 1 October 2009 The Training Layoff Scheme

Brainstorm ing on training strategy Current training Training m anual (info on EMA,

Beyond REST An approach to creating stable, evolve-able Web applications Mike Amundsen @mamund

Searching for Axion-Like- Particles in the Sky Clare Burrage (DESY) arXiv:0902.2320 With A.C.

Solid Type System vs Runtime Checks and Unit Tests Vladimir Pavkin Plan Fail Fast concept

Grounding HEX-Programs with Expanding Domains Thomas Eiter, Michael Fink, Thomas Krennwallner,

Woodlands odlands Ri Ring ng Pri rimar mary y School hool Primary 5 Parents Briefing 30

Kodi and Embedded Linux Moving Towards Common Windowing and Video Acceleration Lukas Rusak

From GGRT to GSO through the Ademollo et al. Collaboration F . Gliozzi DFT & INFN, Torino U.

mpiFileUtils Parallel File Utilities for HPC November 13, 2017 Danielle Sikich LLNL-PRES-740981

Co-Training Based on Combining Labeled and Unlabeled Data with - PowerPoint PPT Presentation

0. Co-Training Based on Combining Labeled and Unlabeled Data with Co-Training by A. Blum & T. Mitchell, 1998 1. Problem: Learning to classify data (ex: web pages) when the description of each example can be partitioned in 2

Compliance Training 2012 Compliance Training 2012 Training Objectives Training Objectives

New Staff Training Training Site Development Training Site Development 2 Training Site

Product Features Technical Training 2007 Technical Training 2007 Technical Training 2007

Food Handler Training Food Handler Training Food Handler Training Food Handler Training Online

Service Section Service Section Technical Training Technical Training Technical Training

LAS Links Online Administration Training 1 Coordinator and Proctor Training Agenda Training

Presentation Health and Employability Training 01 02 Health and Employability Training Health

TRAINING INTER STATE STUDY TOUR TO NDRI KARNAL TRAINING INTER STATE STUDY TOUR TO

1 Facilitator Training Facilitator Training Facilitator Training Facilitator Training 2

Potty Training in Potty Training in Potty Training in Potty Training in Four Days Four Days

Colorado Altitude Training Altitude Training for Aviators 1 Problem Statement Hypoxia training

Leadplane Training Course Leadplane Training Course Target Descriptions Leadplane Training

GSA OLU End-User Training GSA OLU End-User Training Training Objectives How to navigate the

Planning Commission Training Washoe County Planning Commission Training Mt. Rose Conference Room

THE TRAINING LAYOFF SCHEME THE TRAINING LAYOFF SCHEME 1 October 2009 The Training Layoff Scheme

Brainstorm ing on training strategy Current training Training m anual (info on EMA,

Beyond REST An approach to creating stable, evolve-able Web applications Mike Amundsen @mamund

Searching for Axion-Like- Particles in the Sky Clare Burrage (DESY) arXiv:0902.2320 With A.C.

Solid Type System vs Runtime Checks and Unit Tests Vladimir Pavkin Plan Fail Fast concept

Grounding HEX-Programs with Expanding Domains Thomas Eiter, Michael Fink, Thomas Krennwallner,

Woodlands odlands Ri Ring ng Pri rimar mary y School hool Primary 5 Parents Briefing 30

Kodi and Embedded Linux Moving Towards Common Windowing and Video Acceleration Lukas Rusak

From GGRT to GSO through the Ademollo et al. Collaboration F . Gliozzi DFT &amp; INFN, Torino U.

mpiFileUtils Parallel File Utilities for HPC November 13, 2017 Danielle Sikich LLNL-PRES-740981

From GGRT to GSO through the Ademollo et al. Collaboration F . Gliozzi DFT & INFN, Torino U.