Breaking Inter-Layer Co-Adaptation by Classifier Anonymization Ikuro - - PowerPoint PPT Presentation

β–Ά
breaking inter layer co adaptation
SMART_READER_LITE
LIVE PREVIEW

Breaking Inter-Layer Co-Adaptation by Classifier Anonymization Ikuro - - PowerPoint PPT Presentation

ICML2019 Breaking Inter-Layer Co-Adaptation by Classifier Anonymization Ikuro Sato 1 Denso IT Laboratory. Inc., Japan 1 Kohta Ishikawa 1 National Institute of Advanced Industrial 2 Guoqing Liu 1 Science and Technology, Japan Masayuki Tanaka 2


slide-1
SLIDE 1

/10

  • I. Sato, et al., Breaking Inter-Layer Co-Adaptation by Classifier Anonymization, ICML 2019

1

Breaking Inter-Layer Co-Adaptation by Classifier Anonymization

Denso IT Laboratory. Inc., Japan National Institute of Advanced Industrial Science and Technology, Japan

Ikuro Sato1 Kohta Ishikawa1 Guoqing Liu1 Masayuki Tanaka2

1 2

ICML2019

slide-2
SLIDE 2

/10

  • I. Sato, et al., Breaking Inter-Layer Co-Adaptation by Classifier Anonymization, ICML 2019

2

Summary first About what? How? Theory? In reality? Breaking co-adaptation between feature extractor and classifier. By classifier anonymization technique. Proved: Features form simple point-like distribution. Point-like property largely confirmed on real datasets.

slide-3
SLIDE 3

/10

  • I. Sato, et al., Breaking Inter-Layer Co-Adaptation by Classifier Anonymization, ICML 2019

3

E2E optimization scheme flourishes. Is it always good?

πœšβ‹†, πœ„ ⋆ = arg min

𝜚,πœ„

1 𝒠 0 ෍

𝑦,𝑒 βˆˆπ’ 

𝑀 π·πœ„ 𝐺𝜚 𝑦 , 𝑒 𝑦 𝐺𝜚 𝑦 π·πœ„ 𝐺𝜚 𝑦 𝑀 π·πœ„ 𝐺𝜚 𝑦 , 𝑒

Feature Ext. Classifier Input Loss w/ target 𝑒 DNN

Feature dim-1 Feature dim-2 color: π·πœ„ value β€˜+1’ β€˜-1’

Toy ex.) 2-class regression Features may form excessively complex distribution.

  • Disjointed
  • Split

Feature extractor πΊπœšβ‹† adapts to a particular classifier π·πœ„. E2E opt.

slide-4
SLIDE 4

/10

  • I. Sato, et al., Breaking Inter-Layer Co-Adaptation by Classifier Anonymization, ICML 2019

4

FOCA: Feature-extractor Optimization through Classifier Anonymization

πœšβ‹† = arg min

𝜚

1 𝒠 0 ෍

𝑦,𝑒 βˆˆπ’ 

π”½πœ„~Ξ˜πœšπ‘€ π·πœ„ 𝐺𝜚 𝑦 , 𝑒

FOCA Random weak classifier: πœ„~Θ𝜚

Want to know more about π›ͺ𝜚? Please come to the poster!

Features form simple point-like distribution per class under some conditions.

Feature extractor πΊπœšβ‹† adapts to a set of weak classifiers π·πœ„ .

Feature dim-1 Feature dim-2

slide-5
SLIDE 5

/10

  • I. Sato, et al., Breaking Inter-Layer Co-Adaptation by Classifier Anonymization, ICML 2019

5

Proposition about the point-like property

Please see the paper for the proof.

In words, If feature extractor has an enough representation ability, all input data of the same class are projected to a single point in the feature space in a class-separable way under certain conditions.

slide-6
SLIDE 6

/10

  • I. Sato, et al., Breaking Inter-Layer Co-Adaptation by Classifier Anonymization, ICML 2019

6

Toy problem demonstration

x-axis y-axis Feature dim. #1 Feature dim. #2 data used to generate classifier decision boundary

start end

Small perturbations lead to point-like distribution. Small-batch classifier works as a weak classifier to the entire dataset.

slide-7
SLIDE 7

/10

  • I. Sato, et al., Breaking Inter-Layer Co-Adaptation by Classifier Anonymization, ICML 2019

7

Experiment #1: partial-dataset training

Thing we wish to confirm:

full-dataset classifier partial-dataset classifier

Do they perform similarly for given πΊπœšβ‹† ??

slide-8
SLIDE 8

/10

  • I. Sato, et al., Breaking Inter-Layer Co-Adaptation by Classifier Anonymization, ICML 2019

8

Experiment #1: partial-dataset training

classifier trained with large dataset classifier trained with small dataset CIFAR10 test error rates Performance gap much smaller for FOCA large for

  • ther methods

One indication of point-like property

(The same, fixed feature extractor is used within each method.)

slide-9
SLIDE 9

/10

  • I. Sato, et al., Breaking Inter-Layer Co-Adaptation by Classifier Anonymization, ICML 2019

9

More experiments …

including:

  • Approximate geodesic distance measurements

between large- and small-dataset solutions

  • Low-dimensional analyses

to further study the point-like property.

slide-10
SLIDE 10

/10 10

What? How? Theory? Reality? Breaking co-adaptation between feature extractor and classifier. By classifier anonymization. Point-like property largely confirmed on real datasets. Proved: Features form simple point-like distribution.

Poster #28 tonight