ICML2019 Breaking Inter-Layer Co-Adaptation by Classifier Anonymization Ikuro Sato 1 Denso IT Laboratory. Inc., Japan 1 Kohta Ishikawa 1 National Institute of Advanced Industrial 2 Guoqing Liu 1 Science and Technology, Japan Masayuki Tanaka 2 I. Sato, et al. , Breaking Inter-Layer Co-Adaptation by Classifier Anonymization , ICML 2019 1 /10
Summary first About what? Breaking co-adaptation between feature extractor and classifier. How? By classifier anonymization technique. Theory? Proved: Features form simple point-like distribution . In reality? Point-like property largely confirmed on real datasets. I. Sato, et al. , Breaking Inter-Layer Co-Adaptation by Classifier Anonymization , ICML 2019 2 /10
E2E optimization scheme flourishes. Is it always good? 1 E2E opt. π β , π β = arg min ΰ· π π· π πΊ π π¦ , π’ π 0 π,π π¦,π’ βπ Input DNN Feature Ext. Classifier Loss w/ target π’ πΊ π π¦ π¦ π· π πΊ π π¦ π π· π πΊ π π¦ , π’ Feature extractor πΊ π β adapts to a particular classifier π· π . β+1β color: π· π value Feature dim-2 β - 1β Toy ex.) Features may form 2-class regression excessively complex distribution. Disjointed β’ Split β’ Feature dim-1 I. Sato, et al. , Breaking Inter-Layer Co-Adaptation by Classifier Anonymization , ICML 2019 3 /10
FOCA: Feature-extractor Optimization through Classifier Anonymization 1 π β = arg min FOCA ΰ· π½ π~Ξ π π π· π πΊ π π¦ , π’ π 0 π π¦,π’ βπ Want to know more about πͺ π ? Random weak classifier: π~Ξ π Please come to the poster! Feature extractor πΊ π β adapts to a set of weak classifiers π· π . Feature dim-2 Features form simple point-like distribution per class under some conditions. Feature dim-1 I. Sato, et al. , Breaking Inter-Layer Co-Adaptation by Classifier Anonymization , ICML 2019 4 /10
Proposition about the point-like property In words, If feature extractor has an enough representation ability, all input data of the same class are projected to a single point in the feature space in a class-separable way under certain conditions. Please see the paper for the proof. I. Sato, et al. , Breaking Inter-Layer Co-Adaptation by Classifier Anonymization , ICML 2019 5 /10
x-axis Feature dim. #1 Toy problem demonstration y-axis Feature dim. #2 data used to generate classifier decision boundary start Small-batch classifier works as a weak classifier to the entire dataset. Small perturbations lead to end point-like distribution. I. Sato, et al. , Breaking Inter-Layer Co-Adaptation by Classifier Anonymization , ICML 2019 6 /10
Experiment #1: partial-dataset training Thing we wish to confirm: full-dataset classifier partial-dataset classifier Do they perform similarly for given πΊ π β ?? I. Sato, et al. , Breaking Inter-Layer Co-Adaptation by Classifier Anonymization , ICML 2019 7 /10
Experiment #1: partial-dataset training CIFAR10 test error rates Performance gap large for other methods much smaller One indication of for FOCA point-like property classifier trained classifier trained with large dataset with small dataset (The same, fixed feature extractor is used within each method.) I. Sato, et al. , Breaking Inter-Layer Co-Adaptation by Classifier Anonymization , ICML 2019 8 /10
More experiments β¦ including: β’ Approximate geodesic distance measurements between large- and small-dataset solutions β’ Low-dimensional analyses to further study the point-like property. I. Sato, et al. , Breaking Inter-Layer Co-Adaptation by Classifier Anonymization , ICML 2019 9 /10
Poster #28 tonight What? Breaking co-adaptation between feature extractor and classifier. How? By classifier anonymization . Proved: Features form simple Theory? point-like distribution . Reality? Point-like property largely confirmed on real datasets. 10 /10
Recommend
More recommend