Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 1
Introduction to High-Performance Machine Learning: Convolutional Neural Networks
SURFsara
Valeriu Codreanu
Introduction to High-Performance Machine Learning: Convolutional - - PowerPoint PPT Presentation
Introduction to High-Performance Machine Learning: Convolutional Neural Networks Valeriu Codreanu SURFsara 1 Introduction to High-Performance Machine Learning: www.prace-ri.eu Convolutional Neural Networks SURFsara History: 1971: Founded
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 1
SURFsara
Valeriu Codreanu
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 2
History: 1971: Founded by the VU, UvA, and CWI 2013: SARA (Stichting Academisch Rekencentrum A’dam) becomes part of SURF Cartesius (Bull supercomputer): 40.960 Ivy Bridge / Haswell cores: 1327 TFLOPS 56GBit/s Infiniband 64 nodes with 2 K40m GPUs each: 210 TFLOPS Broadwell & KNL extension (Nov 2016) 177 BDW and 18 KNL nodes: 284TFLOPS 7.7 PB Lustre parallel file-system Top500 position #45 2014/11 #142 2017/11 Increasing number of deep learning projects!
GPU Programming www.prace-ri.eu 3
Today’s lecture:
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 4
CV experts 1.Select / develop features: SURF, HoG, SIFT, RIFT, … 2.Add on top of this Machine Learning for multi-class recognition and train classifier
Feature Extraction: SIFT, HoG... Detection, Classification Recognition
Classical CV feature definition is domain- specific and time-consuming
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 5
Deep Learning:
DL experts: define NN topology and train NN
Deep NN... Detection, Classification Recognition Deep NN...
Deep Learning promise: train good feature automatically, same method for different domain
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 6
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 7
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 8
Figure credit: Dai, He, and Sun, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, CVPR 2016
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 9
Figure credit: Cao et al, “Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields”, arXiv 2016
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 10
Figure credit: Ledig et al, “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network”, arXiv 2016
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 11
Gatys, Ecker, and Bethge, “Image Style Transfer using Convolutional Neural Networks”, CVPR 2016 (left) Mordvintsev, Olah, and Tyka, “Inceptionism: Going Deeper into Neural Networks” (upper right) Johnson, Alahi, and Fei-Fei: “Perceptual Losses for Real-Time Style Transfer and Super-Resolution”, ECCV 2016 (bottom left)
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 12
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 13
Based on slide from Andrew Ng
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 14
Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2017
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 15
Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2017
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 16
4 easy steps:
get loss
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 17
x
[C1]
w1
[C1×C2]
Matrix Multiply
s
[C2]
Nonlinearity
a
[C2]
w2
[C2×C3]
ŷ
[C3]
Matrix Multiply
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 18
x
C1×H×W
w1
C2×C1×k×k
C
v
u t i
s
C2×H×W N
l i n e a r i t y
a
C2×H×W
w2
C2HW/4×C3
ŷ
C3
p
C2×H/2×W/2 Pooling Fully Connected
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 19
Sobel operator:
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 20
Stanford CS231n class CS231n: Convolutional Neural Networks for Visual Recognition
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 21
Stanford CS231n class CS231n: Convolutional Neural Networks for Visual Recognition
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 22
Stanford CS231n class CS231n: Convolutional Neural Networks for Visual Recognition
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 23
Stanford CS231n class CS231n: Convolutional Neural Networks for Visual Recognition
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 24
Stanford CS231n class CS231n: Convolutional Neural Networks for Visual Recognition
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 25
Stanford CS231n class CS231n: Convolutional Neural Networks for Visual Recognition
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 26
Stanford CS231n class CS231n: Convolutional Neural Networks for Visual Recognition
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 27
Very important factor motivating early GPU usage for neural network training!
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 28
Stanford CS231n class CS231n: Convolutional Neural Networks for Visual Recognition
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 29
Stanford CS231n class CS231n: Convolutional Neural Networks for Visual Recognition
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 30
Stanford CS231n class CS231n: Convolutional Neural Networks for Visual Recognition
l Subsampling pixels will not change the object
Subsampling
bird bird
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 31
Stanford CS231n class CS231n: Convolutional Neural Networks for Visual Recognition
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 32
Stanford CS231n class CS231n: Convolutional Neural Networks for Visual Recognition
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 33
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 34
5 convolutional layers 3 fully connected layers + soft-max 650K neurons , 60 M weights
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 35
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 36
Rob Fergus, NIPS 2013
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 37
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 38
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 39
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 40
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 41
Stanford CS231n class CS231n: Convolutional Neural Networks for Visual Recognition
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 42
1.5 GFLOPS 19.6 GFLOPS 3.6-11.6 GFLOPS Training VGG for 50 epochs on Imagenet uses more than 1 ExaFlop True HPC distributed training is needed
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 43
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 44
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 45
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 46
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 47
Network capacity is crucial!
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 48
But computation also scales!! Network capacity is crucial!
Introduction to High-Performance Machine Learning: Convolutional Neural Networks www.prace-ri.eu 49
But computation also scales!! Network capacity is crucial! Hardware to the rescue
GPU Programming www.prace-ri.eu 50
GPU Programming www.prace-ri.eu 51
GPU Programming www.prace-ri.eu 52
GPU Programming www.prace-ri.eu 53
Good libraries and framework integration are key!
GPU Programming www.prace-ri.eu 54
GPU Programming www.prace-ri.eu 55
GPU Programming www.prace-ri.eu 56
GPU Programming www.prace-ri.eu 57
GPU Programming www.prace-ri.eu 58
GPU Programming www.prace-ri.eu 59
2 4 7 14
Training time (days)*
IN5k-ResNeXt-101-64x4d ResNeXt-101-32x4d ResNet-101 ResNet-50
Internet-scale data / Videos months?
*measured on NVIDIA M40 8-gpus
GPU Programming www.prace-ri.eu 60
GPU Programming www.prace-ri.eu 61
Image courtesy of Jim Dowling. Chen et. al.: Revisiting Distributed Synchronous SGD
GPU Programming www.prace-ri.eu 62
GPU Programming www.prace-ri.eu 63
Courtesy NVIDIA
GPU Programming www.prace-ri.eu 64
Courtesy NVIDIA
GPU Programming www.prace-ri.eu 65
Courtesy NVIDIA
GPU Programming www.prace-ri.eu 66
Courtesy NVIDIA
GPU Programming www.prace-ri.eu 67
Courtesy NVIDIA
GPU Programming www.prace-ri.eu 68
\
GPU Programming www.prace-ri.eu 69
GPU Programming www.prace-ri.eu 70
20 40 60 80
epochs
20 30 40 50 60 70 80 90 100
training error % kn=256, = 0.1, 23.60% 0.12 kn= 8k, = 0.1, 41.78% 0.10
8192 = 256 x 32
#gpus per gpu batch
GPU Programming www.prace-ri.eu 71
Optimization difficulty
GPU Programming www.prace-ri.eu 72
GPU Programming www.prace-ri.eu 73
21 24 48 33 22 12 1 7 1 10 2 15 2 11 2 3 10 20 30 40 50 60 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 NUMBER OF BUFFERS PARAMETER SIZE (POWER OF 2)
N=214 buffers to synchronize
bandwidth bound latency bound
Small buffers Large buffers
GPU Programming www.prace-ri.eu 74
gradient 1
LAYER N LAYER (N -1) LAYER (N -2) LAYER 3 LAYER 2 LAYER 1
gradient 2 gradient 3
AllReduce AllReduce AllReduce
G1 G2 G8 ….. G1 + G2 + …..+ G8 G1 G2 G8 G1 + G2 + …..+ G8 ….. G1 G2 G8 G1 + G2 + …..+ G8 ….. G1 G2 G8 G1 + G2 + …..+ G8 …..………..
Inter-machine Intra-machine
Node-1 Node-2 Node-K Node-3
AllReduce
AllReduce is important to scale efficiently
GPU Programming www.prace-ri.eu 75
GPU Programming www.prace-ri.eu 76
GPU Programming www.prace-ri.eu 77
GPU Programming www.prace-ri.eu 78
CIFAR100 training from pretrained model CIFAR100 training from scratch ICDAR training from pretrained model ICDAR training from scratch Dataset Model type #GPU s Accuracy[%] Convergence time [min] CIFAR10 Large 4 95.65 51 CIFAR100 Large 2 80.33 124 Bangla Small 2 99.15 29 Bangla Large 2 99.47 7 Bangla-aug Large 4 99.73 160 MNIST Small 2 99.44 168 ICDAR Medium 2 95.28 2344
Faster convergence and better accuracy when fine-tuning!
Top-5 poster GTC2016
GPU Programming www.prace-ri.eu 79
GPU Programming www.prace-ri.eu 80
(also from science), and also at scale
GPU Programming www.prace-ri.eu 81
THANK YOU FOR YOUR ATTENTION
www.prace-ri.eu
GPU Programming www.prace-ri.eu 82
THANK YOU FOR YOUR ATTENTION
www.prace-ri.eu