Efficient Nonmyopic Active Search Jiang, Malkomes, Converse, - PowerPoint PPT Presentation

Efficient Nonmyopic Active Search Jiang, Malkomes, Converse, Shofner, Moseley, Garnett. ICML 2017

Active Learning unlabeled data ③ ④ expert / oracle active learner 2

Supervised Learning unlabeled data random sample active learner expert / oracle 3

Why active learning matters? ▪ Collecting data is much cheaper than annotating them we have large-scale unlabeled data ▪ ▪ Labeling data is very difficult, time-consuming, or expensive Active learning helps model learn more efficiently (compared to random sampling) 4

Uncertainty Sampling ▪ Query examples that the learner are most uncertain about (i.e., instances near the decision boundary of the model) Binary: query the instance whose posterior probability of being positive is nearest 0.5 [Lewis & Gale, SIGIR’94] 5

Uncertainty Sampling ▪ For multiclass problems least confidence ▪ margin sampling ▪ entropy ▪ 6

Other Query Strategies ▪ Query-By-Committee (QBC) maintain a committee for voting query candidates ▪ ▪ Expected Model Change impart the greatest change to the current model ▪ ▪ Expected Error Reduction how much its generalization error is likely to be reduced ▪ ▪ Variance Reduction minimizing output variance ▪ ▪ Density-Weighted Methods modifying the input distribution and pick informative instances ▪ (uncertain and representative) 7

Active Search sequentially inspecting data to discover members of a rare, desired class. 8

Active Search sequentially inspecting data to discover members of a rare, desired class. What is the best policy to select between data points such that we can find more of the target class in a given number of queries? 9

Active Search ▪ Given a finite domain of elements ▪ target set ▪ budget Goal: Maximizing the utility function in budget where 10

Optimal Bayesian Policy ▪ Assume we have a probabilistic classification model that provides ▪ The optimal policy ▪ How to solve above Equation? 11

Optimal Bayesian Policy Optimal Policy for the last query ( ) : Time step i = t [n - (t-1)] nodes are unlabeled ▪ Intuition ▪ There is no need to explore y=1 ▪ The optimal decision should be greedy y=0 y=1 y=0 y=1 y=0 y=1 y=0 12

Optimal Bayesian Policy Optimal Policy for the last query ( ) : Time step i = t [n - (t-1)] nodes are unlabeled ▪ Intuition ▪ There is no need to explore y=1 ▪ The optimal decision should be greedy y=0 y=1 y=0 ▪ Solving Bayesian Policy equation confirms y=1 y=0 y=1 y=0 13

Optimal Bayesian Policy (Example) last query for our example: 14

Optimal Bayesian Policy (Example) last query for our example: 15

Optimal Bayesian Policy Optimal Policy when two queries are left ( ) Time step i = t-1 ▪ policy is not as trivial [n - (t-2)] nodes are unlabeled ▪ the probability model changes after the first choice y=1 y=0 y=1 . y=0 . . . . . y=1 y=0 y=1 y=0 16

Optimal Bayesian Policy Optimal Policy when two queries are left ( ) Time step i = t-1 ▪ policy is not as trivial [n - (t-2)] nodes are unlabeled ▪ the probability model changes after the first choice Solving Bayesian Policy equation y=1 y=0 y=1 . y=0 . . . . . y=1 y=0 y=1 y=0 (n-(t-2)) * 2 * (n-(t-1) * 2) computation 17

Optimal Bayesian Policy Optimal Policy when two queries are left ( ) Time step i = t-1 ▪ policy is not as trivial [n - (t-2)] nodes are unlabeled ▪ the probability model changes after the first choice Solving Bayesian Policy equation y=1 y=0 y=1 . y=0 . . . . . y=1 Exploitation y=0 y=1 y=0 (n-(t-2)) * 2 * (n-(t-1) * 2) computation 18

Optimal Bayesian Policy Optimal Policy when two queries are left ( ) Time step i = t-1 ▪ policy is not as trivial [n - (t-2)] nodes are unlabeled ▪ the probability model changes after the first choice Solving Bayesian Policy equation y=1 y=0 y=1 . y=0 . . . . . y=1 y=0 Exploration y=1 y=0 (n-(t-2)) * 2 * (n-(t-1) * 2) computation 19

Optimal Bayesian Policy (Example) Two queries are left: 20

Optimal Bayesian Policy (Example) Two queries are left: First step choosing this 21

Optimal Bayesian Policy (Example) Two queries are left: Second step choosing this 22

Optimal Bayesian Policy Bayesian Policy equation (General Form) Time complexity: ▪ where is the lookahead ▪ n is the total number of unlabeled point 23

Optimal Bayesian Policy Bayesian Policy equation (General Form) Time complexity: ▪ where is the lookahead ▪ n is the total number of unlabeled point 24

Hardness of Approximation There is no polynomial-time active search policy with a constant factor approximation ratio for optimizing the expected utility. 25

Myopic Approach ▪ 1-step ahead myopic ▪ 2-step ahead myopic 26

Toy Example ▪ ▪ Target: all points within Euclidean distance from either the center or any corner of uncertainty 1-step optimal sampling 27

Experiments (Active Search) ▪ Dataset: CiteSeer citation network (38079 nodes) ▪ Target: Papers appearing in NeurIPS (2198 in total, 5.2%) ▪ Features: extracted by PCA ▪ 1-step: 167 targets ▪ 2-step: 180 targets ▪ 3-step: 187 targets ▪ 6.5 times better than random search 28

Search-space pruning ▪ Pruning improves the search efficiency ▪ Still exponential 29

Approximating Bayesian Optimal Policy Reminder: Bayesian Optimal Policy 30

Approximating Bayesian Optimal Policy assume that any remaining points, in our budget will be selected simultaneously in one big batch 31

Approximating Bayesian Optimal Policy We will call this policy efficient nonmyopic search (ENS) . Time complexity: 32

ENS (Example) at query ( nodes are left to be labelled) 33

ENS (Example) at query ( nodes are left to be labelled) Until we find the with maximum utility... 34

Efficient nonmyopic search (ENS) When does ENS become the exact Bayesian optimal policy? 35

Efficient nonmyopic search (ENS) When does ENS become the exact Bayesian optimal policy? ▪ if after observing , the labels of all remaining unlabeled points are conditionally independent 36

Nonmyopic Behavior ▪ ENS: ▪ Target: all points within Euclidean distance from either the center or any corner of ▪ Budget: 200 2-step lookhead: first 100 points last 100 points 37

Experiment 38

Zoom 39

Experiment 40

Limitations ▪ Bayesian optimal policy and myopic methods (when lookahead step is large) are sample inefficient ▪ Assume the conditional independence of unlabelled data (ENS) ▪ limited performance when budget is very small ▪ Can not deal with the continuous search space ▪ Difficult to generalize other more general setting ▪ Bayesian Optimization, Multi-bandits, Reinforcement Learning 41

Takeaways ▪ Optimal Bayesian Policy (intractable) ▪ Myopic approach for approximating the optimal policy ▪ Less-myopic approximations perform better ▪ Efficient nonmyopic search (ENS) improves the search efficiency but rely on strong assumptions 42

Related Work ENS in batch mode (query a batch of points at a time) efficiency improvement theoretical guarantee of performance - not that worse compared to query one at a time (Jiang et al., 2018) Bayesian Optimization (BO) AS can be seen as a special case of BO - with binary observations and cumulative reward Non-myopic policies for BO in the regression setting (Ling et al., 2016) ENS is similar to GLASS algorithm (González et al., 2016) Multi-armed bandit electing an item can understood as “pulling an arm” items are correlated and cannot be played twice ENS is similar to knowledge gradient policy (Frazier et al., 2008) 43

References [1] Settles, B. (2009). Active learning literature survey . University of Wisconsin- Madison Department of Computer Sciences. [2] Garnett, R., Krishnamurthy, Y., Xiong, X., Schneider, J., & Mann, R. (2012). Bayesian optimal active search and surveying . arXiv preprint arXiv:1206.6406. [3] Jiang, S., Malkomes, G., Converse, G., Shofner, A., Moseley, B., & Garnett, R. (2017, August). Efficient nonmyopic active search . In ICML 2017 [4] Jiang, S., Malkomes, G., Converse, G., Shofner, A., Moseley, B., & Garnett, R. Efficient nonmyopic batch active search. In NeurIPS 2018. [5] González, J., Osborne, M. & Lawrence, N., 2016, May. GLASSES: Relieving the myopia of Bayesian optimisation. In Artificial Intelligence and Statistics [6] Hasan Z. & Hidru D. Slide for Efficient nonmyopic batch active search. https://bayesopt .github.io/slides/2016/ContributedGarnett.pdf [7] Jiang, S. Slide for Efficient nonmyopic batch active search. https://bayesopt.github.io/ slides/2016/ContributedGarnett.pdf 44

Q & A 45

Appendix: Myopic Approach simple greedy one-step policy vs two-step look ahead : one-step: (1) (2) 46

Appendix: Myopic Approach simple greedy one-step policy vs two-step look ahead : one-step: two-step(left): (2) (1) 47

Efficient Nonmyopic Active Search Jiang, Malkomes, Converse, - PowerPoint PPT Presentation

Efficient Nonmyopic Active Search Jiang, Malkomes, Converse, Shofner, Moseley, Garnett. ICML 2017 Active Learning unlabeled data expert / oracle active learner 2 Supervised Learning unlabeled data random sample active learner

EFFICIENT NONMYOPIC ACTIVE SEARCH Shali Jiang, Gustavo Malkomes, Geoff Converse, Alyssa Shofner,

Efficient Nonmyopic Active Search Jiang, Malkomes, Converse, Shofner, Moseley and Garnett STA

Efficient Nonmyopic Batch Active Search Shali Jiang Gustavo Malkomes Matthew Abbott Benjamin

The Active Card An Active Mind in an Active Body More people, More Active, More often! The

Active Adversary Lecture 7 CCA Security MAC Active Adversary Active Adversary An active

Efficient visual search of local features Efficient visual search of local features Cordelia

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Agenda Intro to Active Learning Activity Design Resources for Active Learning Lunch with Active

Partnership event 21 st November 2019 Welcome #ActiveBradford Active Bradford Members Active

MAC. SKE in Practice. Lecture 5 Active Adversary Active Adversary An active adversary can

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

Elastic Search - Aditi Choksi (EW18455) Elastic Search Search engine Distributed

2 EBI Search 3 EBI Search 4 EBI

Multiple Myopias, Multiple Selves, and the Under Saving Problem Daniel Shaviro, NYU Law School

1. Anesthetic neurotoxicity Growing concern about the effects of Pediatric Ophthalmology

Raluca Mateescu, University of Florida 6/2/17 What do consumers want?

BayesOpt: hot topics and current challenges Javier Gonz alez Masterclass, 7-February, 2107

Equity Vesting and Managerial Myopia Alex Edmans, LBS, Wharton, NBER, CEPR, ECGI Vivian W. Fang,

Coefficient of Correlation The regression equation Y = 0 + 1 x + shows the linear

Sovereign debt, government myopia and the financial sector Viral V Acharya (NYU Stern, CEPR and

Terminating Ring Exploration with Myopic Oblivious Robots GRASTA-MAC Open Problem Session