Efficient Nonmyopic Active Search Jiang, Malkomes, Converse, - - PowerPoint PPT Presentation

efficient nonmyopic active search
SMART_READER_LITE
LIVE PREVIEW

Efficient Nonmyopic Active Search Jiang, Malkomes, Converse, - - PowerPoint PPT Presentation

Efficient Nonmyopic Active Search Jiang, Malkomes, Converse, Shofner, Moseley, Garnett. ICML 2017 Active Learning unlabeled data expert / oracle active learner 2 Supervised Learning unlabeled data random sample active learner


slide-1
SLIDE 1

Efficient Nonmyopic Active Search

Jiang, Malkomes, Converse, Shofner, Moseley, Garnett. ICML 2017

slide-2
SLIDE 2

Active Learning

expert / oracle unlabeled data active learner

③ ④

2

slide-3
SLIDE 3

Supervised Learning

expert / oracle

random sample

active learner unlabeled data

3

slide-4
SLIDE 4

Why active learning matters?

▪ Collecting data is much cheaper than annotating them

we have large-scale unlabeled data

▪ Labeling data is very difficult, time-consuming, or expensive

Active learning helps model learn more efficiently (compared to random sampling)

4

slide-5
SLIDE 5

Uncertainty Sampling

[Lewis & Gale, SIGIR’94] ▪ Query examples that the learner are most uncertain about

(i.e., instances near the decision boundary of the model)

Binary: query the instance whose posterior probability of being positive is nearest 0.5

5

slide-6
SLIDE 6

▪ For multiclass problems

least confidence

margin sampling

entropy

Uncertainty Sampling

6

slide-7
SLIDE 7

▪ Query-By-Committee (QBC)

maintain a committee for voting query candidates

▪ Expected Model Change

impart the greatest change to the current model

▪ Expected Error Reduction

how much its generalization error is likely to be reduced

▪ Variance Reduction

minimizing output variance

▪ Density-Weighted Methods

modifying the input distribution and pick informative instances (uncertain and representative)

Other Query Strategies

7

slide-8
SLIDE 8

Active Search

8

sequentially inspecting data to discover members of a rare, desired class.

slide-9
SLIDE 9

Active Search

9

sequentially inspecting data to discover members of a rare, desired class. What is the best policy to select between data points such that we can find more of the target class in a given number of queries?

slide-10
SLIDE 10

Active Search

▪ Given a finite domain of elements ▪ target set ▪ budget Goal: Maximizing the utility function in budget where

10

slide-11
SLIDE 11

Optimal Bayesian Policy

▪ Assume we have a probabilistic classification model

that provides

▪ The optimal policy ▪ How to solve above Equation?

11

slide-12
SLIDE 12

Optimal Policy for the last query ( ) :

▪ Intuition ▪ There is no need to explore ▪ The optimal decision should be greedy

Optimal Bayesian Policy

Time step i = t [n - (t-1)] nodes are unlabeled

y=1 y=0 y=1 y=0 y=1 y=0 y=1 y=0

12

slide-13
SLIDE 13

Time step i = t [n - (t-1)] nodes are unlabeled

Optimal Policy for the last query ( ) :

▪ Intuition ▪ There is no need to explore ▪ The optimal decision should be greedy

▪ Solving Bayesian Policy equation confirms

Optimal Bayesian Policy

y=1 y=0 y=1 y=0 y=1 y=0 y=1 y=0

13

slide-14
SLIDE 14

Optimal Bayesian Policy (Example)

14

last query for our example:

slide-15
SLIDE 15

Optimal Bayesian Policy (Example)

15

last query for our example:

slide-16
SLIDE 16

Optimal Policy when two queries are left ( ) ▪ policy is not as trivial ▪ the probability model changes after the first choice

Time step i = t-1 [n - (t-2)] nodes are unlabeled

y=1 y=0 y=1 y=0 y=1 y=0 y=1 y=0

. . . . . .

Optimal Bayesian Policy

16

slide-17
SLIDE 17

Optimal Policy when two queries are left ( ) ▪ policy is not as trivial ▪ the probability model changes after the first choice Solving Bayesian Policy equation

(n-(t-2)) * 2 * (n-(t-1) * 2) computation

y=1 y=0 y=1 y=0 y=1 y=0 y=1 y=0

. . . . . .

Time step i = t-1 [n - (t-2)] nodes are unlabeled

Optimal Bayesian Policy

17

slide-18
SLIDE 18

Optimal Policy when two queries are left ( ) ▪ policy is not as trivial ▪ the probability model changes after the first choice Solving Bayesian Policy equation

Exploitation

y=1 y=0 y=1 y=0 y=1 y=0 y=1 y=0

. . . . . .

Optimal Bayesian Policy

18

Time step i = t-1 [n - (t-2)] nodes are unlabeled

(n-(t-2)) * 2 * (n-(t-1) * 2) computation

slide-19
SLIDE 19

Optimal Policy when two queries are left ( ) ▪ policy is not as trivial ▪ the probability model changes after the first choice Solving Bayesian Policy equation

Exploration

y=1 y=0 y=1 y=0 y=1 y=0 y=1 y=0

. . . . . .

Optimal Bayesian Policy

19

Time step i = t-1 [n - (t-2)] nodes are unlabeled

(n-(t-2)) * 2 * (n-(t-1) * 2) computation

slide-20
SLIDE 20

Optimal Bayesian Policy (Example)

20

Two queries are left:

slide-21
SLIDE 21

Optimal Bayesian Policy (Example)

21

Two queries are left:

First step choosing this

slide-22
SLIDE 22

Optimal Bayesian Policy (Example)

22

Two queries are left:

Second step choosing this

slide-23
SLIDE 23

Bayesian Policy equation (General Form) Time complexity: ▪ where is the lookahead ▪ n is the total number of unlabeled point

Optimal Bayesian Policy

23

slide-24
SLIDE 24

Bayesian Policy equation (General Form) Time complexity: ▪ where is the lookahead ▪ n is the total number of unlabeled point

Optimal Bayesian Policy

24

slide-25
SLIDE 25

There is no polynomial-time active search policy with a constant factor approximation ratio for optimizing the expected utility.

Hardness of Approximation

25

slide-26
SLIDE 26

▪ 1-step ahead myopic ▪ 2-step ahead myopic

Myopic Approach

26

slide-27
SLIDE 27

▪ ▪ Target: all points within Euclidean distance from either the center or

any corner of

Toy Example

1-step optimal uncertainty sampling

27

slide-28
SLIDE 28

Experiments (Active Search)

▪ Dataset: CiteSeer citation network (38079 nodes) ▪ Target: Papers appearing in NeurIPS (2198 in total, 5.2%) ▪ Features: extracted by PCA

▪ 1-step: 167 targets ▪ 2-step: 180 targets ▪ 3-step: 187 targets ▪ 6.5 times better than random search

28

slide-29
SLIDE 29

Search-space pruning

29

▪ Pruning improves the search efficiency ▪ Still exponential

slide-30
SLIDE 30

Approximating Bayesian Optimal Policy

30

Reminder: Bayesian Optimal Policy

slide-31
SLIDE 31

Approximating Bayesian Optimal Policy

31

assume that any remaining points, in our budget will be selected simultaneously in one big batch

slide-32
SLIDE 32

Approximating Bayesian Optimal Policy

32

We will call this policy efficient nonmyopic search (ENS). Time complexity:

slide-33
SLIDE 33

ENS (Example)

33

at query ( nodes are left to be labelled)

slide-34
SLIDE 34

ENS (Example)

34

at query ( nodes are left to be labelled)

Until we find the with maximum utility...

slide-35
SLIDE 35

When does ENS become the exact Bayesian optimal policy?

Efficient nonmyopic search (ENS)

35

slide-36
SLIDE 36

When does ENS become the exact Bayesian optimal policy? ▪

if after observing , the labels of all remaining unlabeled points are conditionally independent

Efficient nonmyopic search (ENS)

36

slide-37
SLIDE 37

▪ ▪ Target:

all points within Euclidean distance from either the center or any corner of

▪ Budget: 200

Nonmyopic Behavior

2-step lookhead: ENS: first 100 points last 100 points

37

slide-38
SLIDE 38

Experiment

38

slide-39
SLIDE 39

Zoom

39

slide-40
SLIDE 40

Experiment

40

slide-41
SLIDE 41

Limitations

41

▪ Bayesian optimal policy and myopic methods (when lookahead step is large) are sample inefficient ▪ Assume the conditional independence of unlabelled data (ENS) ▪ limited performance when budget is very small ▪ Can not deal with the continuous search space ▪ Difficult to generalize other more general setting ▪ Bayesian Optimization, Multi-bandits, Reinforcement Learning

slide-42
SLIDE 42

Takeaways

42

▪ Optimal Bayesian Policy (intractable) ▪ Myopic approach for approximating the optimal policy

▪ Less-myopic approximations perform better

▪ Efficient nonmyopic search (ENS) improves the search efficiency but rely on strong assumptions

slide-43
SLIDE 43

Related Work

ENS in batch mode (query a batch of points at a time)

efficiency improvement theoretical guarantee of performance - not that worse compared to query one at a time (Jiang et al., 2018)

Bayesian Optimization (BO)

AS can be seen as a special case of BO - with binary observations and cumulative reward Non-myopic policies for BO in the regression setting (Ling et al., 2016) ENS is similar to GLASS algorithm (González et al., 2016)

Multi-armed bandit

electing an item can understood as “pulling an arm” items are correlated and cannot be played twice ENS is similar to knowledge gradient policy (Frazier et al., 2008)

43

slide-44
SLIDE 44

[1] Settles, B. (2009). Active learning literature survey. University of Wisconsin- Madison Department of Computer Sciences. [2] Garnett, R., Krishnamurthy, Y., Xiong, X., Schneider, J., & Mann, R. (2012). Bayesian

  • ptimal active search and surveying. arXiv preprint arXiv:1206.6406.

[3] Jiang, S., Malkomes, G., Converse, G., Shofner, A., Moseley, B., & Garnett, R. (2017, August). Efficient nonmyopic active search. In ICML 2017 [4] Jiang, S., Malkomes, G., Converse, G., Shofner, A., Moseley, B., & Garnett, R. Efficient nonmyopic batch active search. In NeurIPS 2018. [5] González, J., Osborne, M. & Lawrence, N., 2016, May. GLASSES: Relieving the myopia of Bayesian optimisation. In Artificial Intelligence and Statistics [6] Hasan Z. & Hidru D. Slide for Efficient nonmyopic batch active search. https://bayesopt .github.io/slides/2016/ContributedGarnett.pdf [7] Jiang, S. Slide for Efficient nonmyopic batch active search. https://bayesopt.github.io/ slides/2016/ContributedGarnett.pdf

References

44

slide-45
SLIDE 45

Q & A

45

slide-46
SLIDE 46

Appendix: Myopic Approach

simple greedy one-step policy vs two-step look ahead:

  • ne-step:

46

(1) (2)

slide-47
SLIDE 47

Appendix: Myopic Approach

simple greedy one-step policy vs two-step look ahead:

  • ne-step:

two-step(left):

47

(1) (2)

slide-48
SLIDE 48

(2)

Appendix: Myopic Approach

simple greedy one-step policy vs two-step look ahead:

  • ne-step:

two-step (left): two-step (right):

48

(1)

slide-49
SLIDE 49

Appendix: Myopic Approach

simple greedy one-step policy vs two-step look ahead:

  • ne-step:

two-step(left): two-step(right):

49

(1) (2)