Multi-view Active Learning
Ion Muslea
University of Southern California
Multi-view Active Learning Ion Muslea University of Southern - - PowerPoint PPT Presentation
Multi-view Active Learning Ion Muslea University of Southern California Outline Multi-view active learning Robust multi-view learning View validation as meta-learning Related Work Contributions Future work
University of Southern California
– algorithms that learn concepts from labeled examples
– detect & ask-user-to-label only most informative exs.
– disjoint sets of features that are sufficient for learning
– previous multi-view learners are semi-supervised
– The intuition – The Co-Testing family of algorithms – Empirical evaluation
– salary – office number
Office Salary 300 50K
?
Office Salary
Labeled Examples Unlabeled Examples
Office Salary
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Office Salary
Labeled Examples Unlabeled Examples
Office Salary
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Office Salary
Labeled Examples Unlabeled Examples
Office Salary
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
– Learn one hypothesis in each view – Query one of the contention points (CP)
»– output hypothesis: winner-takes-all, majority/weighted vote – query selection strategy:
equal confidence CP
maximum confidence CP
»2. Compatible views
Ad Parse Courses Wrapper
IB C4.5 Naïve-Bayes Stalker Random Sampling Uncertainty Sampling Query-by-Committee Query-by-Boosting Query-by-Bagging Naïve Co-Testing Conservative Co-Testing Aggressive Co-Testing
[Kushmerick ‘99]
[Marcu et al. ‘00]
converts Japanese discourse tree into an equivalent English one [Blum+Mitchell ‘98]
homepages and other pages [Kushmerick ‘00]
data from Web pages wins works cannot-be-applied Ad Parse Courses Wrapper
IB C4.5 Naïve-Bayes Stalker Random Sampling Uncertainty Sampling Query-by-Committee Query-by-Boosting Query-by-Bagging Naïve Co-Testing Conservative Co-Testing Aggressive Co-Testing
… Hilton <p> Phone: <b> (211) 111-1111 </b> Fax: (211) 121-1… … Phone (toll free) : <i> (800) 171-1771 </i> Fax: (800) 777-1… SkipTo( Phone : <b> ) SkipTo(</b>)
SkipTo(Phone) SkipTo(Html) SkipTo(Html)
… Hilton <p> Phone: <b> (211) 111-1111 </b> Fax: <b> (211) … SkipTo(Phone) SkipTo(<b>) BackTo( Fax ) BackTo( ( Nmb ) …Motel 6 <p> Phone : <b> (311) 101-1110 </b> Fax: <b> (311) … … Phone (tool free) : <i> (800) 171-1771 </i> Fax: <b> (111) … …Motel 6 <p> Phone : <b> (311) 101-1110 </b> Fax: <b> (311) … … Phone (tool free) : <i> (800) 171-1771 </i> Fax: <b> (111) …
5 10 15 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18+ Queries until 100% accuracy Random sampling Tasks
18+
5 10 15 20 1 3 5 7 9 11 13 15 17 Queries until 100% accuracy Naïve Co-Testing Random sampling Tasks
18+
5 10 15 20 1 3 5 7 9 11 13 15 17 Queries until 100% accuracy Aggressive Co-Testing Naïve Co-Testing Random sampling Tasks
18+
5 10 15 20 25 1 3 5 7 9 11 13 15 17 Queries until 100% accuracy Aggressive Co-Testing Query-by-Bagging Tasks
18+
– motivation – Co-EMT = active + semi-supervised learning – robustness to assumption violations
– queries only the most informative examples – ignores all remaining (unlabeled) examples
– few labeled + many unlabeled examples
– views V1 & V2 – L & U, sets of labeled & unlabeled examples
REPEAT
– use labeled examples in L to learn h1 and h2 – query contention point: h1(u) h2(u)
Semi-supervised MVL
views from each other
– stand-alone Co-EM uses random examples – Co-Testing provides more informative examples
– stand-alone Co-Testing uses only labeled exs – Co-EM also exploits unlabeled examples
ADS
4 5 6 7 8 9 error rate (%)
COURSES
3.5 4 4.5 5 5.5 error rate (%)
Co-EMT
Co-Testing Co-EM Co-Training semi-supervised EM
… Spring teaching … … favorite class … … my favorite class …
…neural nets …
Neural nets papers:…
…neural nets …
Neural nets papers:… …neural nets … CS-511: Neural Nets
Theory clump A.I. clump Systems clump Faculty clump Admin clump Students clump
0 10 20 30 40 incompatibility (%) clumps per class 4 2 1
Co-EM Co-Training EM
0 10 20 30 40 incompatibility (%) clumps per class 4 2 1
Co-EMT Co-EM Co-Training EM
– Motivation – Adaptive view validation – Empirical results
2 4 6 8 10 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18+ Queries until 100% accuracy Aggressive Co-Testing Domains
Example:
53% accurate
sound vs. lip motion
MVL is adequate for new, unseen task
– labeled tasks [Task1, L1], [Task2, L2], …, [Taskn, Ln]
– generate view validation example
For each new, unseen task use learned decision tree to predict whether MVL
– F1: agreement of h1 & h2 on unlabeled examples – F2: min( TrainError(h1), TrainError(h2) ) – F3: max( TrainError(h1), TrainError(h2) ) – F4: F3 – F2 – F5: min( Complexity(h1), Complexity(h2) ) – F6: max( Complexity(h1), Complexity(h2) ) – F7: F6 – F5
IF h1 & h2 agree on at least 62% unlabeled exs & |TrainError(h1)- TrainError(h2)| < 10% THEN task’s views are adequate for MVL
7 12 17 22 27 32 37 42 15 31 47 63 percentage of tasks used for training error rate (%) ViewValid - TC ViewValid - WI baseline - WI baseline - TC
16% 33% 66%
– uncertainty reduction [Lewis 94,Schohn 01, Thompson 99] – version space reduction [Seung 92, Cohn 94, Abe 98] – expected-error minimization [Lindenbaum 99, Tong 00, Roy 01]
– multi-view vs. single-view active learning – “domain” oriented vs. “base learner” oriented
– [Blum+Mitchell 98] formalization of multi-view learning – [Dasgupta+ 01]
Co-Training’s proof of convergence
– [Abney 02]
allowing (some) view correlation
– algorithmic
[Collins 99] [Nigam 00] [Pierce 01] [Ghani 02]
– applicability [Nigam 00] [Goldman 00] [Raskutti 02]
– all other MVL MVL are “passive” & semi-supervised
– general features [Aha 92][Brazdil+ 95][Todorovski+ 99]
– classifier-based [Bensusan 99] : max-depth & shape of DT, … – landmarking [Pfaringer 00]: accuracies of simple, fast learners
– single- vs. multi- view learning – few labeled + many unlabeled examples – landmarking (training error) + classifier-based (complexity)
to predict whether or not MVL MVL is appropriate for new, unseen task.
– propose feature split into views
– myopic vs. look-ahead queries
– Co-Testing for regression & semi-supervised clustering
– “general purpose” vs. “per multi-view problem”