Active Learning with Active Learning with Model Selection Neil - - PowerPoint PPT Presentation

active learning with active learning with model selection
SMART_READER_LITE
LIVE PREVIEW

Active Learning with Active Learning with Model Selection Neil - - PowerPoint PPT Presentation

Active Learning with Active Learning with Model Selection Neil Rubens Sugiyama Lab / Tokyo Institute of Technology Active Learning (NLP Motivation) NLP (common scenario) Large amounts of unlabeled data Large amounts of unlabeled


slide-1
SLIDE 1

Active Learning with Active Learning with Model Selection

Neil Rubens

Sugiyama Lab / Tokyo Institute of Technology

slide-2
SLIDE 2

Active Learning (NLP Motivation)

  • NLP (common scenario)

– Large amounts of unlabeled data Large amounts of unlabeled data – Labeling data is expensive

  • Active Learning (Optimal Experimental

D i ) Design)

– Allows to select the most informative l examples

1

slide-3
SLIDE 3

Supervised Learning as Function Approximation

) ( f i f ) ( function target x f function learned

) ( f

y

1

y

n

y ) (

2

x f ) (

n

x f ) ( 1 x f

2

y

1

x

2

x

n

x

x

Goal: From training samples obtain that minimizes G ( ) t t i t d it

2

q(x) – test input density

slide-4
SLIDE 4

Design Cycle (Common)

Collect Data (Active Learning) Model Selection Parameter Learning Evaluation

3

There is a problem with this flow

slide-5
SLIDE 5

Active Learning (AL) Target function Target function Learned function G d i t P i t Good inputs Poor inputs

  • Choice of training input points can significantly

affect the learned function.

  • Active Learning – choose training input points

4

g g p p so that generalization error is minimized

slide-6
SLIDE 6

Setting

  • Linear Model
  • Least-squares Learning
  • In AL can’t use training output values

5

g p for estimating generalization error

slide-7
SLIDE 7

Orthogonal Decomposition

6

slide-8
SLIDE 8

Bias Variance Decomposition

model error C bias B variance V

7

slide-9
SLIDE 9

Active Learning

  • Nothing can be done about model error C
  • Bias -> 0; (least-squares is unbiased)

Bias -> 0; (least-squares is unbiased)

  • Minimize variance -> Minimize Error

8

slide-10
SLIDE 10

Variance AL (assuming zero bias)

9

(assuming zero bias)

slide-11
SLIDE 11

Active Learning - Approximation

  • In general, simultaneous optimizing n points is

not tractable

  • Approximation Approaches:
  • Optimize points one by one (greedy)

Opt e po ts o e by o e (g eedy)

  • Optimize probability distribution from which

points are drawn points are drawn

10

slide-12
SLIDE 12

Bias / Variance (no unbiasedness guarantee)

f f

bias

f

11

variance

slide-13
SLIDE 13

Bias / Variance

Best fit “min error”

12

slide-14
SLIDE 14

Model Selection (MS)

M d l ld b t d b b

  • Model – could be represented by number

and type of basis functions, e.g.

Target function Learned function

  • Model Selection – select appropriate

model M:

Learned function simple appropriate too complex

slide-15
SLIDE 15

Model Selection

  • Cross-validation: Measure generalization

accuracy by testing on data unused during training training

  • Regularization: Penalize complex models

E’=error on data + λ model complexity E error on data λ model complexity Akaike’s information criterion (AIC), Bayesian ( ), y information criterion (BIC)

  • Minimum description length (MDL): Kolmogorov

l i h d i i f d complexity, shortest description of data

  • Structural risk minimization (SRM)

14

slide-16
SLIDE 16

Active Learning with Model Selection

Active Learning Model Selection Share the same goal: Active Learning with Model Selection: Possible Approaches: naïve sequential batch Possible Approaches: naïve, sequential, batch

15

slide-17
SLIDE 17

Naïve Approach

N ï A h bi i ti AL d MS th d

Naïve approach is not possible due to:

  • Naïve Approach – combine existing AL and MS methods

– Active Learning – model should be fixed ALMS Dilemma (MS already performed)

Target function Learned function

[Fedorov 78, MacKay 92, Kanomori and Shimodaira 04]

Good inputs Poor inputs

Kanomori and Shimodaira 04]

– Model Selection – points should be fixed Model Selection points should be fixed (AL already performed)

[Akaike 78, Rissanen 78, Schwarz 78]

16

too simple appropriate too complex

slide-18
SLIDE 18

Sequential Approach

bf

Model Selection

plexity)

Active Learning

b (comp

Learning

Optimal points depend on the model

n (number of samples)

Has a risk of large error

17

Has a risk of large error (due to overfitting to a different model).

slide-19
SLIDE 19

Batch Approach I i i l MS i li bl

bf

Initial Model Selection

Initial MS is not reliable

Active Learning

mplexity)

Active Learning

b (com

Final Model Selection

n (number of samples)

Has a risk of large error

18

Has a risk of large error (due to overfitting to a different model).

slide-20
SLIDE 20

Motivation – Hedge the Risk of Large Error

Active Learning with Model Selection

  • Naïve – impossible
  • Batch, Sequential – risk of large error

( f ff ) (due to overfitting to a different model) Goal: Hedge the risk of large error

19

g g

(minimize risk of overfitting to a different model)

slide-21
SLIDE 21

Ensemble Active Learning Approach (Proposed)

H d th i k f fitti b d i i

  • Hedge the risk of overfitting by designing

input points for all of the models.

G

Criterion EAL G1 G2 GCEAL Data EAL

Location of training points X

20

X2 X1 XDEAL

slide-22
SLIDE 22

Evaluation

D – D-EAL C – C-EAL B – Batch S S ti l d S – Sequential P - Passive proposed

  • Compares favorably with existing methods
  • Compares favorably with existing methods
  • Minimized worst case performance (in most cases)
  • Surprisingly, improved average performance (in some cases)

21

slide-23
SLIDE 23

Current / Future Work

  • Improving AL by utilizing existing data
  • My work mostly deals with theoretical aspects.
  • I am also looking for practical applications.

If h bl th t i l ti – If you have any problems that involve active learning, I would be very glad to help.

22

slide-24
SLIDE 24

References

  • Sugiyama. Active learning in approximately

linear regression based on conditional expectation of generalization error. JMLR 2006

  • Bishop, Pattern Recognition and Machine

Learning

  • Alpaydin, Introduction to Machine Learning

payd ,

  • duc o
  • ac

e ea g

  • Rubens, Sugiyama. Coping with active learning

with model selection dilemma: Minimizing with model selection dilemma: Minimizing expected generalization error. IBIS 2006

23

slide-25
SLIDE 25

24

slide-26
SLIDE 26

25

slide-27
SLIDE 27

26

slide-28
SLIDE 28

27

slide-29
SLIDE 29

28

slide-30
SLIDE 30

29

slide-31
SLIDE 31

30

slide-32
SLIDE 32

31

slide-33
SLIDE 33

32

slide-34
SLIDE 34

33

slide-35
SLIDE 35

34

slide-36
SLIDE 36

35