Learning with Marginalized Corrupted Features L. van der Maaten, M. - - PowerPoint PPT Presentation

learning with marginalized corrupted features
SMART_READER_LITE
LIVE PREVIEW

Learning with Marginalized Corrupted Features L. van der Maaten, M. - - PowerPoint PPT Presentation

Learning with Marginalized Corrupted Features L. van der Maaten, M. Chen, S. Tyree, K. Weinberger ICML 2013 Jan Gasthaus Tea talk April 11, 2013 1 / 17 2 / 17 2 / 17 2 / 17 2 / 17 2 / 17 2 / 17 2 / 17 2 / 17 Data Augmentation Secret


slide-1
SLIDE 1

Learning with Marginalized Corrupted Features

  • L. van der Maaten, M. Chen, S. Tyree, K. Weinberger

ICML 2013

Jan Gasthaus Tea talk April 11, 2013

1 / 17

slide-2
SLIDE 2

2 / 17

slide-3
SLIDE 3

2 / 17

slide-4
SLIDE 4

2 / 17

slide-5
SLIDE 5

2 / 17

slide-6
SLIDE 6

2 / 17

slide-7
SLIDE 7

2 / 17

slide-8
SLIDE 8

2 / 17

slide-9
SLIDE 9

2 / 17

slide-10
SLIDE 10

Data Augmentation

Secret 4: lots of jittering, mirroring, and color perturbation of the

  • riginal images generated on the fly to increase the size of the

training set Yann LeCun on Google+ about Alex Krizhevsky’s ImageNet results

3 / 17

slide-11
SLIDE 11

Main Idea

Old idea: create artificial additional training data by corrupting it with “noise”

4 / 17

slide-12
SLIDE 12

Main Idea

Old idea: create artificial additional training data by corrupting it with “noise” One easy way to incorporate domain knowledge (e.g. possible transformations)

4 / 17

slide-13
SLIDE 13

Main Idea

Old idea: create artificial additional training data by corrupting it with “noise” One easy way to incorporate domain knowledge (e.g. possible transformations) But: additional training data = ⇒ additional computation Idea: Corrupt with known ExpFam noise and integrate it out

4 / 17

slide-14
SLIDE 14

Explicit vs. Implicit Corruption

Explicit corruption: Take training set D = {(xn, yn)}N

n=1 and

corrupt it M times L(˜ D, Θ) =

N

  • n=1

1 M

M

  • m=1

L(˜ xnm, yn, Θ) with xnm ∼ p(˜ xnm|xn).

5 / 17

slide-15
SLIDE 15

Explicit vs. Implicit Corruption

Implicit corruption: Minimize the expected value of the loss under p(˜ xn|xn): L(D, Θ) =

N

  • n=1

E [L(˜ xn, yn, Θ)]p(˜

xn|xn)

i.e. replace the empirical average with the expectation.

6 / 17

slide-16
SLIDE 16

Wait a second . . .

This is so obvious that it must have been done before . . .

7 / 17

slide-17
SLIDE 17

Wait a second . . .

This is so obvious that it must have been done before . . .

◮ Vicinal Risk Minimization, Chapelle, Weston, Bottou, &

Vapnik, NIPS 2000

7 / 17

slide-18
SLIDE 18

Wait a second . . .

This is so obvious that it must have been done before . . .

◮ Vicinal Risk Minimization, Chapelle, Weston, Bottou, &

Vapnik, NIPS 2000

Explicitly only consider the case of Gaussian noise distributions

7 / 17

slide-19
SLIDE 19

Quadratic Loss

8 / 17

slide-20
SLIDE 20

Quadratic Loss

9 / 17

slide-21
SLIDE 21

Exponential Loss

10 / 17

slide-22
SLIDE 22

Logistic Loss

11 / 17

slide-23
SLIDE 23

MGFs

12 / 17

slide-24
SLIDE 24

Results

13 / 17

slide-25
SLIDE 25

Results

14 / 17

slide-26
SLIDE 26

Results

15 / 17

slide-27
SLIDE 27

Results

16 / 17

slide-28
SLIDE 28

Results

17 / 17