Active Learning for Supervised Classification Maria-Florina Balcan - PowerPoint PPT Presentation

Active Learning for Supervised Classification Maria-Florina Balcan Carnegie Mellon University

Active Learning of Linear Separators Maria-Florina Balcan Carnegie Mellon University

Two Minute Version Modern applications: massive amounts of raw data. Only a tiny fraction can be annotated by human experts. Protein sequences Billions of webpages Images Active Learning : utilize data, minimize expert intervention. Expert

Two Minute Version Active Learning: technique for best utilizing data while minimizing need for human intervention. + This talk : AL for classification, label efficient, + - + + - noise tolerant, poly time algo for learning linear + - - - separators [Balcan- Long COLT’13] [Awasthi-Balcan- Long STOC’14] - + [Awasthi-Balcan-Haghtalab-Urner COLT’15 ] Much better noise tolerance than previously known for • classic passive learning via poly time algos. [KKMS’05] [KLS’09] Solve an adaptive sequence of convex optimization pbs on • smaller & smaller bands around current guess for target. Exploit structural properties of log-concave distributions. •

Passive and Active Learning

Supervised Learning E.g., which emails are spam and which are important. • spam Not spam E.g., classify objects as chairs vs non chairs. • Not chair chair

Statistical / PAC learning model Data Source Distribution D on X Expert / Oracle Learning Algorithm Labeled Examples (x 1 ,c*(x 1 )),…, ( x m ,c*(x m )) c* : X ! {0,1} - + + - h : X ! {0,1} + - - + - Algo sees (x 1 ,c*(x 1 )),…, ( x m ,c*(x m )), x i i.i.d. from D • • Does optimization over S, finds hypothesis h 2 C. • Goal: h has small error, err(h)=Pr x 2 D (h(x)  c*(x)) • c* in C, realizable case; else agnostic

Two Main Aspects in Classic Machine Learning Algorithm Design. How to optimize? 8 Automatically generate rules that do well on observed data. E.g., Boosting, SVM, etc. Generalization Guarantees, Sample Complexity Confidence for rule effectiveness on future data. 1 1 1 O ϵ VCdim C log ϵ + log δ + + - 1 1 1 + C= linear separators in R d : O ϵ d log ϵ + log + - + - δ - - - +

Two Main Aspects in Classic Machine Learning Algorithm Design. How to optimize? 9 Automatically generate rules that do well on observed data. 1 1 Runing time: poly d, δ ϵ , Generalization Guarantees, Sample Complexity Confidence for rule effectiveness on future data. 1 1 1 O ϵ VCdim C log ϵ + log δ + + - 1 1 1 + C= linear separators in R d : O ϵ d log ϵ + log + - + - δ - - - +

Modern ML: New Learning Approaches Modern applications: massive amounts of raw data. Only a tiny fraction can be annotated by human experts. Protein sequences Billions of webpages Images

Active Learning Data Source Expert Unlabeled Learning examples Algorithm Request for the Label of an Example A Label for that Example Request for the Label of an Example A Label for that Example . . . Algorithm outputs a classifier • Learner can choose specific examples to be labeled. • Goal: use fewer labeled examples [pick informative examples to be labeled].

Active Learning in Practice • Text classification: active SVM ( Tong & Koller, ICML2000) . e.g., request label of the example closest to current separator. • • Video Segmentation ( Fathi-Balcan-Ren-Regh, BMVC 11) .

Can adaptive querying help? [CAL92, Dasgupta04] • Threshold fns on the real line: h w (x) = 1(x ¸ w), C = {h w : w 2 R} - + Active Algorithm w • Get N = O(1/ϵ) unlabeled examples • How can we recover the correct labels with ≪ N queries? • Do binary search! Just need O(log N) labels! + - - • Output a classifier consistent with the N inferred labels. Passive supervised: Ω(1/ϵ) labels to find an  -accurate threshold. Active: only O(log 1/ϵ) labels. Exponential improvement.

Active learning, provable guarantees Lots of exciting results on sample complexity E.g., • “ D isagreement based” algorithms surviving classifiers Pick a few points at random from the current region of disagreement (uncertainty), query region of their labels, throw out hypothesis if you are disagreement statistically confident they are suboptimal. [ BalcanBeygelzimerLangford’06, Hanneke07, DasguptaHsuMontleoni’07 , Wang’09, Fridman’09 , Koltchinskii10, BHW’08, BeygelzimerHsuLangfordZhang’10 , Hsu’10, Ailon’12, …] Generic (any class), adversarial label noise. suboptimal in label complexity • computationally prohibitive. •

Poly Time, Noise Tolerant/Agnostic, Label Optimal AL Algos .

Margin Based Active Learning Margin based algo for learning linear separators • Realizable: exponential improvement, only O(d log 1/  ) labels to find w error  when D logconcave. [Balcan-Long COLT 2013] • Agnostic & malicious noise: poly-time AL algo outputs w with err(w) =O( ´ ) , ´ =err( best lin. sep). [Awasthi-Balcan-Long STOC 2014] • First poly time AL algo in noisy scenarios! • First for malicious noise [Val85] (features corrupted too). • Improves on noise tolerance of previous best passive [KKMS’05], [KLS’09] algos too!

Margin Based Active-Learning, Realizable Case Draw m 1 unlabeled examples, label them, add them to W(1). iterate k = 2, …, s • find a hypothesis w k-1 consistent with W(k-1). • W(k)=W(k-1).  1 • sample m k unlabeled samples x satisfying |w k-1 ¢ x| ·  k-1 • label them and add them to W(k). w 2 w 3 w 1  2

Margin Based Active-Learning, Realizable Case Log-concave distributions: log of density fnc concave. • wide class: uniform distr. over any convex set, Gaussian, etc. Theorem D log-concave in R d . If then err(w s ) ·  after rounds using labels per round. Active learning Passive learning label requests label requests unlabeled examples

Analysis: Aggressive Localization Induction: all w consistent with W(k), err(w) · 1/2 k w w *

Analysis: Aggressive Localization Induction: all w consistent with W(k), err(w) · 1/2 k Suboptimal w w w k-1 w k-1 w * w *  k-1  k-1

Analysis: Aggressive Localization Induction: all w consistent with W(k), err(w) · 1/2 k w w k-1 · 1/2 k+1 w *  k-1

Analysis: Aggressive Localization Induction: all w consistent with W(k), err(w) · 1/2 k w w k-1 · 1/2 k+1 w *  k-1 Enough to ensure Need only labels in round k. Key point: localize aggressively, while maintaining correctness.

Margin Based Active-Learning, Agnostic Case Draw m 1 unlabeled examples, label them, add them to W. iterate k=2, …, s • find w k-1 in B(w k-1 , r k-1 ) of small Localization in t k-1 hinge loss wrt W . concept space. • Clear working set. • sample m k unlabeled samples x satisfying |w k-1 ¢ x| ·  k-1 ; • label them and add them to W. end iterate Localization in instance space. Analysis, key idea: • Pick 𝜐 𝑙 ≈ 𝛿 𝑙 • Localization & variance analysis control the gap between hinge loss and 0/1 loss (only a constant) .

Improves over Passive Learning too! Passive Learning Prior Work Our Work Malicious err(w) = 𝑃 𝜃 1/3 log 2/3 𝑒 1 err(w) = 𝑃 𝜃 log 2 𝜃 𝜃 [Awasthi-Balcan- Long’14 ] [ KLS’09] err(w) = 𝑃 𝜃 1/3 log 1/3 1 err(w) = 𝑃 𝜃 log 2 1 Agnostic 𝜃 𝜃 [ KLS’09] [Awasthi-Balcan- Long’14 ] 𝜃 + 𝜗 NA Bounded Noise ≥  P Y = 1 x − P Y = −1 x [Awasthi-Balcan-Haghtalab- Urner’15 ] Active Learning NA same as above! [agnostic/malicious/ bounded] [Awasthi-Balcan- Long’14 ]

Improves over Passive Learning too! Passive Learning Prior Work Our Work err(w) = 𝑃(𝜃𝑒 1/4 ) err(w) = 𝑃(𝜃) [KKMS’05] Malicious Info theoretic optimal err(w) = 𝑃 (𝑒/𝜃) 𝜃 log [Awasthi-Balcan- Long’14 ] [KLS’09] err(w) = 𝑃 𝜃 log (1/𝜃) err(w) = 𝑃(𝜃) Agnostic [KKMS’05] [Awasthi-Balcan- Long’14 ] 𝜃 + 𝜗 Bounded Noise NA [Awasthi-Balcan-Haghtalab- Urner’15 ] P Y = 1 x − P Y = −1 x ≥  Active Learning same as above! [agnostic/malicious/ NA bounded] Info theoretic optimal [Awasthi-Balcan- Long’14 ] Slightly better results for the uniform distribution case.

Localization both algorithmic and analysis tool! Useful for active and passive learning!

Discussion, Open Directions • Active learning: important modern learning paradigm. • First poly time, label efficient AL algo for agnostic learning in high dimensional cases. • Also leads to much better noise tolerant algos for passive learning of linear separators! Open Directions More general distributions, other concept spaces. • • Exploit localization insights in other settings (e.g., online convex optimization with adversarial noise).

Active Learning for Supervised Classification Maria-Florina Balcan - PowerPoint PPT Presentation

Active Learning for Supervised Classification Maria-Florina Balcan Carnegie Mellon University Active Learning of Linear Separators Maria-Florina Balcan Carnegie Mellon University Two Minute Version Modern applications: massive amounts of raw

Agenda Intro to Active Learning Activity Design Resources for Active Learning Lunch with Active

The Active Card An Active Mind in an Active Body More people, More Active, More often! The

Active Adversary Lecture 7 CCA Security MAC Active Adversary Active Adversary An active

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

Learning Loss for Active Learning Rymarczyk D., Zieliski B., Tabor J., Sadowski M., Titov M.

Partnership event 21 st November 2019 Welcome #ActiveBradford Active Bradford Members Active

MAC. SKE in Practice. Lecture 5 Active Adversary Active Adversary An active adversary can

Active Learning Passive Learning Active Learning 1. Think 1. Acquisition of knowledge Ability

Active Threat on Campus Prevention & Response Active threat defined An active threat can be

Active Learning with Active Learning with Model Selection Neil Rubens Sugiyama Lab / Tokyo

Predator and Prey: Active Learning Is Social Learning Kenneth Ronkowitz Active Learning

Active Learning: Rethinking Our Teaching to Promote Deeper Learning Facilitated by Ken Silvestri,

Active Transport Active Transport Requires Energy Why does active transport require energy?

Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University Tova Milo Tel

Active Adversary Lecture 7 CCA Security MAC Active Adversary An active adversary can inject

The Active Versus Passive Management Debate In Defense of Active Management Thierry Roncalli

Buffers and centroids Zev Ross President, ZevRoss Spatial Analysis DataCamp Spatial Analysis

Mathematically Structured but not Necessarily Functional Programming Andrej Bauer Department of

Faster hitting-sets for certain ROABP Nitin Saxena (IIT Kanpur, India) (Based on joint works with

Toward Automated Grammar Extraction via Semantic Labeling of Parser Implementations Carson

Complexity Theory Michael Luttenberger Chair for Theoretical Computer Science Prof. Esparza TU

Scien&fic Data File Formats Han-Wei Shen The Ohio

MAP INTERNATIONAL SPRING SCH L ON FORMALIZATION OF MATHEMATICS 2012 SOPHIA ANTIPOLIS, FRANCE /

Outline Introduction 1 Fooling AC 0 circuits Dinesh (IITM) April 18, 2012 2 / 14 Outline

Active Learning for Supervised Classification Maria-Florina Balcan - PowerPoint PPT Presentation

Active Learning for Supervised Classification Maria-Florina Balcan Carnegie Mellon University Active Learning of Linear Separators Maria-Florina Balcan Carnegie Mellon University Two Minute Version Modern applications: massive amounts of raw

Agenda Intro to Active Learning Activity Design Resources for Active Learning Lunch with Active

The Active Card An Active Mind in an Active Body More people, More Active, More often! The

Active Adversary Lecture 7 CCA Security MAC Active Adversary Active Adversary An active

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

Learning Loss for Active Learning Rymarczyk D., Zieliski B., Tabor J., Sadowski M., Titov M.

Partnership event 21 st November 2019 Welcome #ActiveBradford Active Bradford Members Active

MAC. SKE in Practice. Lecture 5 Active Adversary Active Adversary An active adversary can

Active Learning Passive Learning Active Learning 1. Think 1. Acquisition of knowledge Ability

Active Threat on Campus Prevention &amp; Response Active threat defined An active threat can be

Active Learning with Active Learning with Model Selection Neil Rubens Sugiyama Lab / Tokyo

Predator and Prey: Active Learning Is Social Learning Kenneth Ronkowitz Active Learning

Active Learning: Rethinking Our Teaching to Promote Deeper Learning Facilitated by Ken Silvestri,

Active Transport Active Transport Requires Energy Why does active transport require energy?

Peer-to-Peer Data Integration with Active XML Tova Milo Tel-Aviv University Tova Milo Tel

Active Adversary Lecture 7 CCA Security MAC Active Adversary An active adversary can inject

The Active Versus Passive Management Debate In Defense of Active Management Thierry Roncalli

Buffers and centroids Zev Ross President, ZevRoss Spatial Analysis DataCamp Spatial Analysis

Mathematically Structured but not Necessarily Functional Programming Andrej Bauer Department of

Faster hitting-sets for certain ROABP Nitin Saxena (IIT Kanpur, India) (Based on joint works with

Toward Automated Grammar Extraction via Semantic Labeling of Parser Implementations Carson

Complexity Theory Michael Luttenberger Chair for Theoretical Computer Science Prof. Esparza TU

Scien&amp;fic Data File Formats Han-Wei Shen The Ohio

MAP INTERNATIONAL SPRING SCH L ON FORMALIZATION OF MATHEMATICS 2012 SOPHIA ANTIPOLIS, FRANCE /

Outline Introduction 1 Fooling AC 0 circuits Dinesh (IITM) April 18, 2012 2 / 14 Outline

Active Threat on Campus Prevention & Response Active threat defined An active threat can be

Scien&fic Data File Formats Han-Wei Shen The Ohio