Convolutional Neural Networks II Milan Straka April 01, 2019 - PowerPoint PPT Presentation

NPFL114, Lecture 5 Convolutional Neural Networks II Milan Straka April 01, 2019 Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated

Designing and Training Neural Networks Designing and training a neural network is not a one-shot action, but instead an iterative procedure. When choosing hyperparameters, it is important to verify that the model does not underfit and does not overfit. Underfitting can be checked by increasing model capacity or training longer. Overfitting can be tested by observing train/dev difference and by trying stronger regularization. Specifically, this implies that: We need to set number of training epochs so that training loss/performance no longer increases at the end of training. Generally, we want to use a large batchsize that does not slow us down too much (GPUs sometimes allow larger batches without slowing down training). However, with increasing batch size we need to increase learning rate, which is possible only to some extent. Also, small batch size sometimes work as regularization (especially for vanilla SGD algorithm). NPFL114, Lecture 5 Howto ResNet ResNet Modifications CNN Regularization Image Detection Segmentation 2/51

Loading and Saving Models Using tf.keras.Model.save , both the architecture and model weights are saved. But saving the architecture is currently quite brittle: tf.keras.layers.InputLayer does not work correctly object losses (inherited from tf.losses.Loss ) cannot be loaded TensorFlow specific functions (not in tf.keras.layers ) works only sometimes … Of course, the bugs are being fixed. Using tf.keras.Model.save_weights , only the weights of the model are saved. If the model is constructed again by the script (which usually required specifying the same hyperparameters as during model training), weights can be loaded using tf.keras.Model.load_weights . NPFL114, Lecture 5 Howto ResNet ResNet Modifications CNN Regularization Image Detection Segmentation 3/51

Main Takeaways From Previous Lecture Convolutions can provide local interactions in spacial/temporal dimensions shift invariance much less parameters than a fully connected layer 3 × 3 Usually repeated convolutions are enough, no need for larger filter sizes. When pooling is performed, double number of channels. Final fully connected layers are not needed, global average pooling is usually enough. Batch normalization is a great regularization method for CNNs. NPFL114, Lecture 5 Howto ResNet ResNet Modifications CNN Regularization Image Detection Segmentation 4/51

ResNet – 2015 (3.6% error) Figure 1 of paper "Deep Residual Learning for Image Recognition", https://arxiv.org/abs/1512.03385. NPFL114, Lecture 5 Howto ResNet ResNet Modifications CNN Regularization Image Detection Segmentation 5/51

ResNet – 2015 (3.6% error) Table 1 of paper "Deep Residual Learning for Image Recognition", https://arxiv.org/abs/1512.03385. NPFL114, Lecture 5 Howto ResNet ResNet Modifications CNN Regularization Image Detection Segmentation 8/51

ResNet – 2015 (3.6% error) VGG-19 34-layer plain 34-layer residual im ag e im ag e im ag e output 3x 3conv,64 The residual connections cannot be applied size: 224 3x 3conv,64 pool, /2 output size: 112 directly when number of channels increase. 3x 3conv, 128 3x 3conv, 128 7x 7conv, 64, /2 7x 7conv, 64, /2 pool, /2 pool, /2 pool,/2 output size:56 3x 3conv, 256 3x 3conv , 64 3x 3conv , 64 The authors considered several alternatives, and 3x 3conv, 256 3x 3conv , 64 3x 3conv , 64 3x 3conv, 256 3x 3conv , 64 3x 3conv , 64 chose the one where in case of channels 3x 3conv, 256 3x 3conv , 64 3x 3conv , 64 1 × 1 3x 3conv , 64 3x 3conv , 64 3x 3conv , 64 3x 3conv , 64 increase a convolution is used on the pool, /2 3x 3conv , 128,/2 3x3conv , 128,/2 output size:28 3x 3conv, 512 3x 3conv, 128 3x 3conv, 128 3x 3conv, 512 3x 3conv, 128 3x 3conv, 128 projections to match the required number of 3x 3conv,512 3x 3conv, 128 3x 3conv, 128 3x 3conv, 512 3x 3conv, 128 3x 3conv, 128 channels. 3x 3conv, 128 3x 3conv, 128 3x 3conv, 128 3x 3conv, 128 3x 3conv, 128 3x 3conv, 128 output pool, /2 3x 3conv , 256,/2 3x3conv , 256,/2 size:14 3x 3conv, 512 3x 3conv, 256 3x 3conv, 256 3x 3conv, 512 3x 3conv, 256 3x 3conv, 256 3x 3conv, 512 3x 3conv, 256 3x 3conv, 256 3x 3conv, 512 3x 3conv, 256 3x 3conv, 256 3x 3conv, 256 3x 3conv, 256 3x 3conv, 256 3x 3conv, 256 3x 3conv, 256 3x 3conv, 256 3x 3conv, 256 3x 3conv, 256 3x 3conv, 256 3x 3conv, 256 3x 3conv, 256 3x 3conv, 256 3x 3conv, 256 3x 3conv, 256 output pool, /2 3x 3conv , 512,/2 3x3conv , 512,/2 size: 7 3x 3conv, 512 3x 3conv, 512 3x 3conv, 512 3x 3conv, 512 3x 3conv, 512 3x 3conv, 512 3x 3conv, 512 3x 3conv, 512 3x 3conv, 512 3x 3conv, 512 output fc4096 avgpool avgpool size: 1 fc4096 fc1000 fc1000 fc1000 Figure 3 of paper "Deep Residual Learning for Image Recognition", https://arxiv.org/abs/1512.03385. NPFL114, Lecture 5 Howto ResNet ResNet Modifications CNN Regularization Image Detection Segmentation 9/51

ResNet – 2015 (3.6% error) Figure 1 of paper "Visualizing the Loss Landscape of Neural Nets", https://arxiv.org/abs/1712.09913. NPFL114, Lecture 5 Howto ResNet ResNet Modifications CNN Regularization Image Detection Segmentation 11/51

ResNet – 2015 (3.6% error) method top-1 err. top-5 err. 8.43 † VGG [41] (ILSVRC’14) - method top-5 err. ( test ) GoogLeNet [44] (ILSVRC’14) - 7.89 VGG [41] (ILSVRC’14) 7.32 VGG [41] (v5) 24.4 7.1 GoogLeNet [44] (ILSVRC’14) 6.66 PReLU-net [13] 21.59 5.71 BN-inception [16] 21.99 5.81 VGG [41] (v5) 6.8 PReLU-net [13] 4.94 ResNet-34 B 21.84 5.71 BN-inception [16] 4.82 ResNet-34 C 21.53 5.60 ResNet-50 20.74 5.25 ResNet (ILSVRC’15) 3.57 ResNet-101 19.87 4.60 Table 5. Error rates (%) of ensembles . The top-5 error is on the ResNet-152 19.38 4.49 test set of ImageNet and reported by the test server. Table 5 of paper "Deep Residual Learning for Image Recognition", Table 4. Error rates (%) of single-model results on the ImageNet https://arxiv.org/abs/1512.03385. validation set (except † reported on the test set). Table 4 of paper "Deep Residual Learning for Image Recognition", https://arxiv.org/abs/1512.03385. NPFL114, Lecture 5 Howto ResNet ResNet Modifications CNN Regularization Image Detection Segmentation 12/51

WideNet Figure 1 of paper "Wide Residual Networks", https://arxiv.org/abs/1605.07146 NPFL114, Lecture 5 Howto ResNet ResNet Modifications CNN Regularization Image Detection Segmentation 13/51

WideNet group name output size block type = B ( 3 , 3 ) Authors do not consider bottleneck blocks. Instead, conv1 32 × 32 [3 × 3, 16] [ ] they experiment with different block types , e.g., B (1, 3, 1) B (3, 3) 3 × 3, 16 × k conv2 32 × 32 × N 3 × 3, 16 × k of [ ] 3 × 3, 32 × k conv3 16 × 16 × N block type depth # params time,s CIFAR-10 3 × 3, 32 × k [ ] B ( 1 , 3 , 1 ) 40 1.4M 85.8 6.06 3 × 3, 64 × k conv4 8 × 8 × N B ( 3 , 1 ) 40 1.2M 67.5 5.78 3 × 3, 64 × k B ( 1 , 3 ) 40 1.3M 72.2 6.42 avg-pool 1 × 1 [8 × 8] B ( 3 , 1 , 1 ) 40 1.3M 82.2 5.86 Table 1 of paper "Wide Residual Networks", https://arxiv.org/abs/1605.07146 B ( 3 , 3 ) 28 1.5M 67.5 5.73 B ( 3 , 1 , 3 ) 22 1.1M 59.9 5.78 bl Table 2 of paper "Wide Residual Networks", https://arxiv.org/abs/1605.07146 di 5 NPFL114, Lecture 5 Howto ResNet ResNet Modifications CNN Regularization Image Detection Segmentation 14/51

WideNet k group name output size block type = B ( 3 , 3 ) Authors evaluate various widening factors conv1 32 × 32 [3 × 3, 16] [ ] 3 × 3, 16 × k depth k # params CIFAR-10 CIFAR-100 conv2 32 × 32 × N 3 × 3, 16 × k 40 1 0.6M 6.85 30.89 [ ] 3 × 3, 32 × k 40 2 2.2M 5.33 26.04 conv3 16 × 16 × N 3 × 3, 32 × k 40 4 8.9M 4.97 22.89 [ ] 3 × 3, 64 × k 40 8 35.7M 4.66 - conv4 8 × 8 × N 3 × 3, 64 × k 28 10 36.5M 4.17 20.50 avg-pool 1 × 1 [8 × 8] 28 12 52.5M 4.33 20.43 Table 1 of paper "Wide Residual Networks", https://arxiv.org/abs/1605.07146 22 8 17.2M 4.38 21.22 22 10 26.8M 4.44 20.75 16 8 11.0M 4.81 22.07 16 10 17.1M 4.56 21.59 Table 4 of paper "Wide Residual Networks", https://arxiv.org/abs/1605.07146 NPFL114, Lecture 5 Howto ResNet ResNet Modifications CNN Regularization Image Detection Segmentation 15/51

Convolutional Neural Networks II Milan Straka April 01, 2019 - PowerPoint PPT Presentation

NPFL114, Lecture 5 Convolutional Neural Networks II Milan Straka April 01, 2019 Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated Designing and Training Neural

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image

MICROBOONE Taritree Wongjirad DPF 2017 Tufts/MIT Outline Convolutional neural networks

Neural Networks + Convolutional Neural Networks Last Class Global Features The perceptron

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Industrial Sensory Data Analytics Introduction, Analysis Goals & Methods/Tools Industrial

Reduced models and the bottlenecks for problems with many parameters Jan S Hesthaven Brown

Simulation Calibration of Cluster WL Mass Measurements with DC2 Joe Hollowed, Lindsey Bleem,

Do Now: Look at the tape on the communicator on your desk and choose which one would be best for

BGV Toy MC 1. Example event retention fractions 2. Aperture: effect on the performance Update to

Parity Constrained Graph Separation M. S. Ramanujan Graph Cuts Warsaw, 2013 April 9, 2013

Foundations of Artificial Intelligence 28. Constraint Satisfaction Problems: Decomposition

Tools for Kernelizing Graph Cut Problems Stefan Kratsch Magnus Wahlstrm Max Planck Institute

Sambuz

Useful Links

Newsletter

Mail Us

Convolutional Neural Networks II Milan Straka April 01, 2019 - PowerPoint PPT Presentation

NPFL114, Lecture 5 Convolutional Neural Networks II Milan Straka April 01, 2019 Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated Designing and Training Neural

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Convolutional Neural Networks (Part III) 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image

MICROBOONE Taritree Wongjirad DPF 2017 Tufts/MIT Outline Convolutional neural networks

Neural Networks + Convolutional Neural Networks Last Class Global Features The perceptron

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Industrial Sensory Data Analytics Introduction, Analysis Goals &amp; Methods/Tools Industrial

Reduced models and the bottlenecks for problems with many parameters Jan S Hesthaven Brown

Simulation Calibration of Cluster WL Mass Measurements with DC2 Joe Hollowed, Lindsey Bleem,

Do Now: Look at the tape on the communicator on your desk and choose which one would be best for

BGV Toy MC 1. Example event retention fractions 2. Aperture: effect on the performance Update to

Parity Constrained Graph Separation M. S. Ramanujan Graph Cuts Warsaw, 2013 April 9, 2013

Foundations of Artificial Intelligence 28. Constraint Satisfaction Problems: Decomposition

Tools for Kernelizing Graph Cut Problems Stefan Kratsch Magnus Wahlstrm Max Planck Institute

Sambuz

Useful Links

Newsletter

Mail Us

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image

Industrial Sensory Data Analytics Introduction, Analysis Goals & Methods/Tools Industrial