A Kernel Theory of Modern Data Augmentation Tr Tri Dao ao , Albert - - PowerPoint PPT Presentation

a kernel theory of modern data augmentation
SMART_READER_LITE
LIVE PREVIEW

A Kernel Theory of Modern Data Augmentation Tr Tri Dao ao , Albert - - PowerPoint PPT Presentation

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:309:00 P 9:00 PM A Kernel Theory of Modern Data Augmentation Tr Tri Dao ao , Albert


slide-1
SLIDE 1

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

A Kernel Theory of Modern Data Augmentation

Tr Tri Dao ao, Albert Gu, Alex Ratner, Virginia Smith, Chris De Sa, Chris Ré

slide-2
SLIDE 2

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

Data augmentation is important to accuracy…

slide-3
SLIDE 3

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

Data augmentation is important to accuracy…

3.7 pt. average gain across top ten CIFAR-10 models

slide-4
SLIDE 4

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

Data augmentation is important to accuracy…

3.7 pt. average gain across top ten CIFAR-10 models 13.9 pt. average gain for CIFAR-100

slide-5
SLIDE 5

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

Data augmentation is important to accuracy…

3.7 pt. average gain across top ten CIFAR-10 models 13.9 pt. average gain for CIFAR-100 A form of weak supervision: expresses domain knowledge (invariance)

slide-6
SLIDE 6

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

… but is not well understood

slide-7
SLIDE 7

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

How does data augmentation affect the model?

  • Learning process
  • Parameters and decision surface

… but is not well understood

slide-8
SLIDE 8

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

Augmentation as sequence modeling

  • TANDA [Ratner et al., 2017]
  • AutoAugment [Cubuk et al., 2018]
slide-9
SLIDE 9

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

Augmentation as sequence modeling

Model augmentation as a Markov chain

  • TANDA [Ratner et al., 2017]
  • AutoAugment [Cubuk et al., 2018]
slide-10
SLIDE 10

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

Augmentation as kernels

Base classifier: k-nearest neighbors + Data augmentation = Asymptotic kernel classifier

slide-11
SLIDE 11

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

Effects of data augmentation on kernel classifiers

slide-12
SLIDE 12

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

Effects of data augmentation on kernel classifiers

  • x

x x x

  • x

x

Invariance

slide-13
SLIDE 13

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

Effects of data augmentation on kernel classifiers

  • x

x x x

  • x

x

  • x

x x x

  • x

x

Invariance Regularization

slide-14
SLIDE 14

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

Effects of data augmentation on kernel classifiers

  • x

x x x

  • x

x

  • x

x x x

  • x

x

Invariance Regularization

Practical utility

slide-15
SLIDE 15

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

Effects of data augmentation on kernel classifiers

  • x

x x x

  • x

x

  • x

x x x

  • x

x

Invariance Regularization speeding up training

Practical utility

slide-16
SLIDE 16

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

Effects of data augmentation on kernel classifiers

  • x

x x x

  • x

x

  • x

x x x

  • x

x

Invariance Regularization speeding up training as a diagnostic

Practical utility

slide-17
SLIDE 17

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

Non-augmented:

Model of data augmentation: kernel classifier

min

w

1 n

n

X

i=1

`(w>(xi))

Feature map Loss function

slide-18
SLIDE 18

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

Non-augmented: Augmented:

Model of data augmentation: kernel classifier

min

w

1 n

n

X

i=1

`(w>(xi))

Feature map Loss function

min

w

1 n

n

X

i=1

Ezi⇠T (xi)`(w>(zi))

Transformed versions of data point

slide-19
SLIDE 19

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

Data augmentation effects

Average of augmented features (i.e. kernel mean embedding)

1 n

n

X

i=1

Ezi⇠T (xi)`(w>(zi)) ≈ 1 n

n

X

i=1

`(w>Ezi⇠T (xi)(zi))

slide-20
SLIDE 20

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

Data augmentation effects

Average of augmented features (i.e. kernel mean embedding)

1st order effect: induces invariance by feature averaging

1 n

n

X

i=1

Ezi⇠T (xi)`(w>(zi)) ≈ 1 n

n

X

i=1

`(w>Ezi⇠T (xi)(zi))

slide-21
SLIDE 21

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

Data augmentation effects

Average of augmented features (i.e. kernel mean embedding)

1st order effect: induces invariance by feature averaging

1 n

n

X

i=1

Ezi⇠T (xi)`(w>(zi)) ≈ 1 n

n

X

i=1

`(w>Ezi⇠T (xi)(zi))

2nd order effect: reduces model complexity via a data-dependent regularization

slide-22
SLIDE 22

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

ψ(x) = Ez∼T (x)φ(z)

A diagnostic: kernel alignment metric

Averaged features: Kernel target alignment [Cristianini et al., 2002]: how well separated are features from different classes

slide-23
SLIDE 23

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

A diagnostic: kernel alignment metric

MNIST

Kernel alignment Kernel alignment

slide-24
SLIDE 24

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

A diagnostic: kernel alignment metric

MNIST

Kernel alignment correlates with accuracy.

Kernel alignment Kernel alignment

slide-25
SLIDE 25

ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 11th

th 6:30

6:30—9:00 P 9:00 PM

  • Data augmentation + k-NN = asymptotic kernel classifier.
  • Data augmentation induces invariance and regularizes.
  • Application in speeding up training and diagnostics.

Summary Poster #227 on Tuesday Jun 11th at 6:30pm

Tri Dao trid@stanford.edu