Transparent parallelization of neural network training Cyprien Noel - - PowerPoint PPT Presentation

transparent parallelization of neural network training
SMART_READER_LITE
LIVE PREVIEW

Transparent parallelization of neural network training Cyprien Noel - - PowerPoint PPT Presentation

Transparent parallelization of neural network training Cyprien Noel Flickr / Yahoo - GTC 2015 by overthemoon Outline Neural Nets at Flickr Training Fast Parallel Distributed Q&A Tagging Photos Class Probability


slide-1
SLIDE 1

by overthemoon

Transparent parallelization of neural network training

Cyprien Noel Flickr / Yahoo - GTC 2015

slide-2
SLIDE 2
slide-3
SLIDE 3

Outline

▪ Neural Nets at Flickr ▪ Training Fast ▪ Parallel ▪ Distributed ▪ Q&A

slide-4
SLIDE 4

Tagging Photos

Any photo on Flickr is classified using computer vision

Class Probability Flowers 0.98 Outdoors 0.95 Cat 0.001 Grass 0.6 Classifiers Classifiers Classifiers

slide-5
SLIDE 5

Auto Tags Feeding Search

▪ ▪ ▪ ▪ ▪ ▪

slide-6
SLIDE 6

Tagging the Flickr corpus

▪ Classify millions of new photos per day ▪ Apply new models to billions of photos ▪ Train new models using Caffe

slide-7
SLIDE 7

Training new models

▪ Manual experimentation ▪ Hyperparameter search ▪ Limitation is training time → Parallelize Caffe

slide-8
SLIDE 8

Goals

▪ “Transparent” ▪ Code Isolation ▪ Existing Models ▪ Globally connected layers ▪ Existing Infrastructure

slide-9
SLIDE 9

Outline

▪ Neural Nets at Flickr ▪ Training Fast ▪ Parallel ▪ Distributed ▪ Q&A

slide-10
SLIDE 10

GoogLeNet, 2014

slide-11
SLIDE 11

Ways to Parallelize

▪ Model ▪ Caffe team enabling this now ▪ Data ▪ Synchronous ▪ Asynchronous

slide-12
SLIDE 12

Outline

▪ Neural Nets at Flickr ▪ Training Faster ▪ Parallel ▪ Distributed ▪ Q&A

slide-13
SLIDE 13

First Approach: CPU

▪ Hogwild! (2011) ▪ Cores read and write from shared buffer ▪ No synchronization ▪ Data races surprisingly low

slide-14
SLIDE 14
slide-15
SLIDE 15

MNIST CPU

slide-16
SLIDE 16

Hogwild

▪ Plateaus with core counts ▪ Some Potential ▪ On a grid ▪ With model parallelism

slide-17
SLIDE 17

But we are at GTC

slide-18
SLIDE 18

GPU Cluster

▪ A lot of time spent preparing experiments ▪ Code Deployment ▪ ▪ Data Handling ▪ On the fly datasets for “big data”

slide-19
SLIDE 19

Outline

▪ Neural Nets at Flickr ▪ Training Fast ▪ Parallel ▪ Distributed ▪ Q&A

slide-20
SLIDE 20

Second Approach: Lots of Boxes

slide-21
SLIDE 21

Second Approach: Lots of Boxes

▪ Exchange gradients between nodes ▪ Parameter server setup ▪ Easy: move data fast

slide-22
SLIDE 22

GPU memory - PCI - Ethernet

slide-23
SLIDE 23

Second Approach: Lots of Boxes

▪ 230MB * 2 * N per batch ▪ TCP/UDP chokes ▪ Machines unreachable ▪ No InfiniBand or RoCE

slide-24
SLIDE 24

Second Approach: Lots of Boxes

▪ Modify Caffe: chunk parameters

slide-25
SLIDE 25

Packet_mmap

Kernel App Buffer Kernel App

slide-26
SLIDE 26

MNIST

slide-27
SLIDE 27

ImageNet

slide-28
SLIDE 28

NVIDIA

▪ Large Machines ▪ 4 or 8 GPUs ▪ Root PCI switches ▪ InfiniBand

slide-29
SLIDE 29

Third Approach: CUDA P2P

▪ GPUs on single machine ▪ Data Feeding ▪ Caffe Pipeline ▪ Async Streams

slide-30
SLIDE 30

State of Things

▪ Async ~8x but no momentum ▪ Sync ~2x ▪ Combining both, and model parallelism ▪ Working on auto tuning of params (batch, rate) ▪ Different ratios of compute vs. IO

slide-31
SLIDE 31

Takeaway

▪ Check Caffe, including Flickr’s contributions ▪ CUDA + Docker = Love ▪ Small SOC servers might be interesting for ML

slide-32
SLIDE 32

Thanks!

Flickr vision team Flickr backend team Yahoo labs cypof@yahoo-inc.com