Augmenting Supervised Neural Networks with Unsupervised Objectives - PowerPoint PPT Presentation

Augmenting Supervised Neural Networks with Unsupervised Objectives for Large-scale Image Classification Yu Yuting Zh Zhang, Ki Kibok Le Lee, Ho Honglak Le Lee University of Michigan, Ann Arbor

Unsupervised and supervised deep learning Deep feature representations can be learned in supervised and unsupervised manners. o Supervised objectives learns from the correspondence between data and label space. § Unsupervised objectives learns from the data space itself. § Supervised deep learning o Deep neural networks, convolutional neural networks, recurrent neural networks, … § Task-specific, requires large amounts of supervision § Unsupervised deep learning o Stacked autoencoders, deep belief networks, deep Boltzmann machines, … § Preserves input information, can leverage large amounts of unlabeled data, but may be § suboptimal for supervised tasks. Yuting Zhang, Kibok Lee, Honglak Lee

Unsupervised and supervised deep learning Historically, unsupervised learning (e.g., SAE) can be used as a pretraining step for o improving and even enabling the supervised learning of deep networks. However, such pretraining became unnecessary if the deep neural network is o initialized properly, and large amount of labeled data are available. E.g., large-scale convolutional neural networks: AlexNet (Krizhevskyet al., 2012), § VGGNet (Simounyanand Zisserman, 2015), GoogLeNet (Szegedy et al., 2015), etc. As a result, unsupervised deep learning has been overshadowed by supervised o methods. Yuting Zhang, Kibok Lee, Honglak Lee

Revisiting the importance of unsupervised learning à o Pretraining: Unsupervised Supervised Yuting Zhang, Kibok Lee, Honglak Lee

Revisiting the importance of unsupervised learning + o Combination: Unsupervised Supervised + reconstruction classification o Previous work: § Autoencoders: Ranzato & Szummer (2008); Larochelle et al. (2009) § (Restricted) Boltzmann machines: Larochelle & Bengio, (2008); Goodfellow et al. (2013); Sohn et al. (2013) § Dictionary learning: Boureau et al. (2010); Mairal et al. (2010) Ladder network : Rasmus et al. (2015) layer-wise skip links & pathway combinators Stacked “what-where” AE ( SWWAE ): Zhao et al. (2015) unpooling switches (Zeiler and Fergus, 2009) Promising for improving classification performance, but have not been shown to be beneficial for large-scale supervised deep neural nets. Yuting Zhang, Kibok Lee, Honglak Lee

Revisiting the importance of unsupervised learning + o Combination: Unsupervised Supervised + reconstruction classification o Previous work: § Autoencoders: Ranzato & Szummer (2008); Larochelle et al. (2009) § (Restricted) Boltzmann machines: Larochelle & Bengio, (2008); Goodfellow et al. (2013); Sohn et al. (2013) § Dictionary learning: Boureau et al. (2010); Mairal et al. (2010) § Ladder network : Rasmus et al. (2015) layer-wise skip links & pathway combinators • § Stacked “what-where” AE ( SWWAE ): Zhao et al. (2015) using unpooling switches (Zeiler and Fergus, 2009) • o Promising for improving classification performance, but have not been shown to be beneficial for large-scale supervised deep neural nets. Yuting Zhang, Kibok Lee, Honglak Lee

Outlines The invertibility of large-scale image 1 classification networks Large-scale image classification networks 2 with stronger invertibility Yuting Zhang, Kibok Lee, Honglak Lee

Invertibility of deep convolutional neural networks

A typicalclassification network (VGGNet) One or more dec: dec: dec: dec: dec: one-hot convolutional layers probability image pool1 pool2 pool3 pool4 label pool2 + a deconv deconv deconv deconv deconv softmax a max-pooling layer conv3_1 loss conv3_2 conv3_3 L2 inner loss product pool3 inner inner product product conv conv conv conv conv image pool1 pool2 pool3 pool4 pool5 fc6 fc7 (a) SAE-first (stacked architecture; reconstruction loss at the first layer) dec: dec: dec: dec: dec: one-hot probability image pool1 pool2 pool3 pool4 label Yuting Zhang, Kibok Lee, Honglak Lee a image pool1 pool2 pool3 pool4 pool5 fc6 fc7 (b) SAE-all (stacked architecture; reconstruction loss at all layers) dec: dec: dec: dec: dec: one-hot image pool1 pool2 pool3 pool4 probability label a image pool1 pool2 pool3 pool4 pool5 fc6 fc7 (c) SAE-layerwise (layer-wise architecture)

Inducing an autoencoder from a classification network (VGGNet, pool5) dec: dec: dec: dec: dec: one-hot probability image pool1 pool2 pool3 pool4 label a deconv deconv deconv deconv deconv softmax loss L2 inner loss product inner inner product product conv conv conv conv conv image pool1 pool2 pool3 pool4 pool5 fc6 fc7 (a) SAE-first (stacked architecture; reconstruction loss at the first layer) dec: dec: dec: dec: dec: one-hot probability image pool1 pool2 pool3 pool4 label Yuting Zhang, Kibok Lee, Honglak Lee a image pool1 pool2 pool3 pool4 pool5 fc6 fc7 (b) SAE-all (stacked architecture; reconstruction loss at all layers) dec: dec: dec: dec: dec: one-hot image pool1 pool2 pool3 pool4 probability label a image pool1 pool2 pool3 pool4 pool5 fc6 fc7 (c) SAE-layerwise (layer-wise architecture)

Training a decoding pathway fora classification network (VGGNet, pool5) Learnable dec: dec: dec: dec: dec: one-hot probability image pool1 pool2 pool3 pool4 label a deconv deconv deconv deconv deconv softmax loss L2 inner loss product inner inner product product conv conv conv conv conv image pool1 pool2 pool3 pool4 pool5 fc6 fc7 (a) SAE-first (stacked architecture; reconstruction loss at the first layer) Fixed dec: dec: dec: dec: dec: one-hot probability image pool1 pool2 pool3 pool4 label Yuting Zhang, Kibok Lee, Honglak Lee a image pool1 pool2 pool3 pool4 pool5 fc6 fc7 (b) SAE-all (stacked architecture; reconstruction loss at all layers) dec: dec: dec: dec: dec: one-hot image pool1 pool2 pool3 pool4 probability label a image pool1 pool2 pool3 pool4 pool5 fc6 fc7 (c) SAE-layerwise (layer-wise architecture)

Micro-architectures for decoders o Use “Unpooling” to approximately invert the pooling operation pool2 conv3_1 dec:conv3_1 conv3_2 dec: conv3_2 conv3_3 dec: conv3_3 pool3 dec: pool3 Yuting Zhang, Kibok Lee, Honglak Lee

Micro-architectures for decoders (Unpoolingwith fi fixed switches, ordinary SAE) o One can use the ordinary stacked autoencoder (SAE). Related work : Dosovitskiy, A. and Brox, T , “Inverting visual representations § 4 7 with convolutional networks”, CVPR 2016. 5 1 4 0 6 0 0 0 0 0 pool2 5 0 1 0 0 0 0 0 conv3_1 dec:conv3_1 conv3_2 dec: conv3_2 conv3_3 dec: conv3_3 Unpooling with pool3 dec: pool3 fixed switches (Upsampling) Yuting Zhang, Kibok Lee, Honglak Lee

Micro-architectures for decoders (Unpoolingwith kn known switches, SWWAE) o We can also use stacked “what-where” autoencoders (SWWAE). § Unpooling with the known switches transferred from the encoder. 4 7 § More accurate inversion, since spatial details are recovered better. 5 1 (SWWAE only) Pooling switches 0 0 6 0 0 4 0 0 pool2 0 5 0 0 0 0 0 1 conv3_1 dec:conv3_1 conv3_2 dec: conv3_2 conv3_3 dec: conv3_3 Unpooling with pool3 dec: pool3 known switches Yuting Zhang, Kibok Lee, Honglak Lee

Reconstruction from different layers (AlexNet) Input Image SAE SWWAE Dosovitskiy & Brox (2015 ) Reconstructed from one layer Yuting Zhang, Kibok Lee, Honglak Lee

Reconstruction from different layers (AlexNet) Input Image SAE SWWAE Dosovitskiy & Brox (2015 ) Reconstructed from pool1 Yuting Zhang, Kibok Lee, Honglak Lee

Reconstruction from different layers (AlexNet) Input Image SAE SWWAE Dosovitskiy & Brox (2015 ) Reconstructed from fc6 Yuting Zhang, Kibok Lee, Honglak Lee

Reconstruction via SAE decoders Layer image pool1 pool2 conv3 conv4 pool5 fc6 fc7 fc8 SAE Dosovitskiy & Dosovitskiy & Brox (2016) Brox (2016) SWWAE-first The network is less invertible for higher layers, ( known SWWAE so deeper representations preserve less input information. unpooling switches) o Two possible sources of information loss § Convolutional filters and non-linearity (Transformation) § Max-pooling (Spatial invariance) o They are mixed in the SAE reconstruction results. Yuting Zhang, Kibok Lee, Honglak Lee

Augmenting Supervised Neural Networks with Unsupervised Objectives - PowerPoint PPT Presentation

Augmenting Supervised Neural Networks with Unsupervised Objectives for Large-scale Image Classification Yu Yuting Zh Zhang, Ki Kibok Le Lee, Ho Honglak Le Lee University of Michigan, Ann Arbor Unsupervised and supervised deep learning

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Martin Emms September 20, 2019 4CSLL5

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Unsupervised Learning in Neural Networks Keith L. Downing The Norwegian University of Science and

Augmenting Paths Math 482, Lecture 25 Misha Lavrov April 3, 2020 The greedy algorithm

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Current State of Unsupervised Deep Learning William Falcon, PhD Student AGENDA AGENDA

Supervised Learning in Neural Networks Keith L. Downing The Norwegian University of Science and

COMMUNICATING [with empathy] @ DY DYNAMIC JILL JILL @ DY DYNAMIC JILL TENSION IS INEVITABLE @

The new MGI Clive Bennett What were going to talk about How the network was created

MTN Group Limited Results for the year ended 31 December 2013 Agenda 01 Strategic and

What is the role of journalism? What is the role of journalism? People have an intrinsic need, an

Bill Parke www . hcca - info . org / quiz , see page 13 Vice President Unauthorized access

European Parliament Joint hearing on LEADER Tracking overall inputs and outputs in rural

Total Rewards Update Presenters: Bruce Lawson, Managing Director, Gallagher Lori Messer,

Second Home Market Outlook and State of the Mountain Resort Markets Park City, Utah March 31,