 
              Augmenting Supervised Neural Networks with Unsupervised Objectives for Large-scale Image Classification Yu Yuting Zh Zhang, Ki Kibok Le Lee, Ho Honglak Le Lee University of Michigan, Ann Arbor
Unsupervised and supervised deep learning Deep feature representations can be learned in supervised and unsupervised manners. o Supervised objectives learns from the correspondence between data and label space. § Unsupervised objectives learns from the data space itself. § Supervised deep learning o Deep neural networks, convolutional neural networks, recurrent neural networks, … § Task-specific, requires large amounts of supervision § Unsupervised deep learning o Stacked autoencoders, deep belief networks, deep Boltzmann machines, … § Preserves input information, can leverage large amounts of unlabeled data, but may be § suboptimal for supervised tasks. Yuting Zhang, Kibok Lee, Honglak Lee
Unsupervised and supervised deep learning Historically, unsupervised learning (e.g., SAE) can be used as a pretraining step for o improving and even enabling the supervised learning of deep networks. However, such pretraining became unnecessary if the deep neural network is o initialized properly, and large amount of labeled data are available. E.g., large-scale convolutional neural networks: AlexNet (Krizhevskyet al., 2012), § VGGNet (Simounyanand Zisserman, 2015), GoogLeNet (Szegedy et al., 2015), etc. As a result, unsupervised deep learning has been overshadowed by supervised o methods. Yuting Zhang, Kibok Lee, Honglak Lee
Revisiting the importance of unsupervised learning à o Pretraining: Unsupervised Supervised Yuting Zhang, Kibok Lee, Honglak Lee
Revisiting the importance of unsupervised learning + o Combination: Unsupervised Supervised + reconstruction classification o Previous work: § Autoencoders: Ranzato & Szummer (2008); Larochelle et al. (2009) § (Restricted) Boltzmann machines: Larochelle & Bengio, (2008); Goodfellow et al. (2013); Sohn et al. (2013) § Dictionary learning: Boureau et al. (2010); Mairal et al. (2010) Ladder network : Rasmus et al. (2015) layer-wise skip links & pathway combinators Stacked “what-where” AE ( SWWAE ): Zhao et al. (2015) unpooling switches (Zeiler and Fergus, 2009) Promising for improving classification performance, but have not been shown to be beneficial for large-scale supervised deep neural nets. Yuting Zhang, Kibok Lee, Honglak Lee
Revisiting the importance of unsupervised learning + o Combination: Unsupervised Supervised + reconstruction classification o Previous work: § Autoencoders: Ranzato & Szummer (2008); Larochelle et al. (2009) § (Restricted) Boltzmann machines: Larochelle & Bengio, (2008); Goodfellow et al. (2013); Sohn et al. (2013) § Dictionary learning: Boureau et al. (2010); Mairal et al. (2010) § Ladder network : Rasmus et al. (2015) layer-wise skip links & pathway combinators • § Stacked “what-where” AE ( SWWAE ): Zhao et al. (2015) using unpooling switches (Zeiler and Fergus, 2009) • o Promising for improving classification performance, but have not been shown to be beneficial for large-scale supervised deep neural nets. Yuting Zhang, Kibok Lee, Honglak Lee
Outlines The invertibility of large-scale image 1 classification networks Large-scale image classification networks 2 with stronger invertibility Yuting Zhang, Kibok Lee, Honglak Lee
Invertibility of deep convolutional neural networks
A typicalclassification network (VGGNet) One or more dec: dec: dec: dec: dec: one-hot convolutional layers probability image pool1 pool2 pool3 pool4 label pool2 + a deconv deconv deconv deconv deconv softmax a max-pooling layer conv3_1 loss conv3_2 conv3_3 L2 inner loss product pool3 inner inner product product conv conv conv conv conv image pool1 pool2 pool3 pool4 pool5 fc6 fc7 (a) SAE-first (stacked architecture; reconstruction loss at the first layer) dec: dec: dec: dec: dec: one-hot probability image pool1 pool2 pool3 pool4 label Yuting Zhang, Kibok Lee, Honglak Lee a image pool1 pool2 pool3 pool4 pool5 fc6 fc7 (b) SAE-all (stacked architecture; reconstruction loss at all layers) dec: dec: dec: dec: dec: one-hot image pool1 pool2 pool3 pool4 probability label a image pool1 pool2 pool3 pool4 pool5 fc6 fc7 (c) SAE-layerwise (layer-wise architecture)
Inducing an autoencoder from a classification network (VGGNet, pool5) dec: dec: dec: dec: dec: one-hot probability image pool1 pool2 pool3 pool4 label a deconv deconv deconv deconv deconv softmax loss L2 inner loss product inner inner product product conv conv conv conv conv image pool1 pool2 pool3 pool4 pool5 fc6 fc7 (a) SAE-first (stacked architecture; reconstruction loss at the first layer) dec: dec: dec: dec: dec: one-hot probability image pool1 pool2 pool3 pool4 label Yuting Zhang, Kibok Lee, Honglak Lee a image pool1 pool2 pool3 pool4 pool5 fc6 fc7 (b) SAE-all (stacked architecture; reconstruction loss at all layers) dec: dec: dec: dec: dec: one-hot image pool1 pool2 pool3 pool4 probability label a image pool1 pool2 pool3 pool4 pool5 fc6 fc7 (c) SAE-layerwise (layer-wise architecture)
Training a decoding pathway fora classification network (VGGNet, pool5) Learnable dec: dec: dec: dec: dec: one-hot probability image pool1 pool2 pool3 pool4 label a deconv deconv deconv deconv deconv softmax loss L2 inner loss product inner inner product product conv conv conv conv conv image pool1 pool2 pool3 pool4 pool5 fc6 fc7 (a) SAE-first (stacked architecture; reconstruction loss at the first layer) Fixed dec: dec: dec: dec: dec: one-hot probability image pool1 pool2 pool3 pool4 label Yuting Zhang, Kibok Lee, Honglak Lee a image pool1 pool2 pool3 pool4 pool5 fc6 fc7 (b) SAE-all (stacked architecture; reconstruction loss at all layers) dec: dec: dec: dec: dec: one-hot image pool1 pool2 pool3 pool4 probability label a image pool1 pool2 pool3 pool4 pool5 fc6 fc7 (c) SAE-layerwise (layer-wise architecture)
Micro-architectures for decoders o Use “Unpooling” to approximately invert the pooling operation pool2 conv3_1 dec:conv3_1 conv3_2 dec: conv3_2 conv3_3 dec: conv3_3 pool3 dec: pool3 Yuting Zhang, Kibok Lee, Honglak Lee
Micro-architectures for decoders (Unpoolingwith fi fixed switches, ordinary SAE) o One can use the ordinary stacked autoencoder (SAE). Related work : Dosovitskiy, A. and Brox, T , “Inverting visual representations § 4 7 with convolutional networks”, CVPR 2016. 5 1 4 0 6 0 0 0 0 0 pool2 5 0 1 0 0 0 0 0 conv3_1 dec:conv3_1 conv3_2 dec: conv3_2 conv3_3 dec: conv3_3 Unpooling with pool3 dec: pool3 fixed switches (Upsampling) Yuting Zhang, Kibok Lee, Honglak Lee
Micro-architectures for decoders (Unpoolingwith kn known switches, SWWAE) o We can also use stacked “what-where” autoencoders (SWWAE). § Unpooling with the known switches transferred from the encoder. 4 7 § More accurate inversion, since spatial details are recovered better. 5 1 (SWWAE only) Pooling switches 0 0 6 0 0 4 0 0 pool2 0 5 0 0 0 0 0 1 conv3_1 dec:conv3_1 conv3_2 dec: conv3_2 conv3_3 dec: conv3_3 Unpooling with pool3 dec: pool3 known switches Yuting Zhang, Kibok Lee, Honglak Lee
Reconstruction from different layers (AlexNet) Input Image SAE SWWAE Dosovitskiy & Brox (2015 ) Reconstructed from one layer Yuting Zhang, Kibok Lee, Honglak Lee
Reconstruction from different layers (AlexNet) Input Image SAE SWWAE Dosovitskiy & Brox (2015 ) Reconstructed from pool1 Yuting Zhang, Kibok Lee, Honglak Lee
Reconstruction from different layers (AlexNet) Input Image SAE SWWAE Dosovitskiy & Brox (2015 ) Reconstructed from pool2 Yuting Zhang, Kibok Lee, Honglak Lee
Reconstruction from different layers (AlexNet) Input Image SAE SWWAE Dosovitskiy & Brox (2015 ) Reconstructed from pool3 Yuting Zhang, Kibok Lee, Honglak Lee
Reconstruction from different layers (AlexNet) Input Image SAE SWWAE Dosovitskiy & Brox (2015 ) Reconstructed from pool4 Yuting Zhang, Kibok Lee, Honglak Lee
Reconstruction from different layers (AlexNet) Input Image SAE SWWAE Dosovitskiy & Brox (2015 ) Reconstructed from pool5 Yuting Zhang, Kibok Lee, Honglak Lee
Reconstruction from different layers (AlexNet) Input Image SAE SWWAE Dosovitskiy & Brox (2015 ) Reconstructed from fc6 Yuting Zhang, Kibok Lee, Honglak Lee
Reconstruction from different layers (AlexNet) Input Image SAE SWWAE Dosovitskiy & Brox (2015 ) Reconstructed from fc7 Yuting Zhang, Kibok Lee, Honglak Lee
Reconstruction from different layers (AlexNet) Input Image SAE SWWAE Dosovitskiy & Brox (2015 ) Reconstructed from fc8 Yuting Zhang, Kibok Lee, Honglak Lee
Reconstruction via SAE decoders Layer image pool1 pool2 conv3 conv4 pool5 fc6 fc7 fc8 SAE Dosovitskiy & Dosovitskiy & Brox (2016) Brox (2016) SWWAE-first The network is less invertible for higher layers, ( known SWWAE so deeper representations preserve less input information. unpooling switches) o Two possible sources of information loss § Convolutional filters and non-linearity (Transformation) § Max-pooling (Spatial invariance) o They are mixed in the SAE reconstruction results. Yuting Zhang, Kibok Lee, Honglak Lee
Recommend
More recommend