Deep Learning on Massively Parallel Processing Databases Frank - PowerPoint PPT Presentation

Deep Learning on Massively Parallel Processing Databases Frank McQuillan Feb 2019

A Brief Introduction to Deep Learning

Artificial Intelligence Landscape Deep Learning 4

Example Deep Learning Algorithms Multilayer Recurrent Convolutional perceptron (MLP) neural network (RNN) neural network (CNN) 5

Convolutional Neural Networks (CNN) • Effective for computer vision • Fewer parameters than fully connected networks • Translational invariance • Classic networks: LeNet-5, AlexNet, VGG 6

Graphics Processing Units (GPUs) • Great at performing a lot of simple computations such as matrix operations • Well suited to deep learning algorithms 7

Single Node Multi-GPU Node 1 Host … GPU 1 GPU N 8

Greenplum Database and Apache MADlib

Greenplum Database Master Standby Host Master Interconnect Node1 Node2 Node3 Node N Segment Host Segment Host Segment Host Segment Host … 10

Multi-Node Multi-GPU Master Standby Host Master Massively Parallel Processing In-Database Functions Interconnect Machine learning & statistics Node1 Node2 Node3 Node N & math Segment Host Segment Host Segment Host Segment Host & graph … & utilities … … … … … GPU 1 GPU N GPU 1 GPU N GPU 1 GPU N GPU 1 GPU N 11

Deep Learning on a Cluster Num Approach Description 1 Distributed deep learning Train single model architecture across the cluster. this talk Data distributed (usually randomly) across segments. 2 Data parallel models Train same model architecture in parallel on different data groups (e.g., build separate models per country). 3 Hyperparameter tuning Train same model architecture in parallel with different hyperparameter settings and incorporate cross validation. Same data on each segment. 4 Neural architecture Train different model architectures in parallel. Same search data on each segment. 12

Workflow

Data Loading and Formatting 14

Iterative Model Execution 1 Segment 1 Transition Function Operates on tuples Broadcast or mini-batches to update transition state Stored Procedure for Model (model) … model = init(…) WHILE model not converged Merge Combines model = 2 Function SELECT transition states Master model.aggregation(…) FROM data table Final Function Segment 2 ENDWHILE Transforms transition state into output value … 3 Segment n 15

Distributed Deep Learning Methods • Open area of research* • Methods we have investigated so far: – Simple averaging – Ensembling – Elastic averaging stochastic gradient descent (EASGD) * Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis https://arxiv.org/pdf/1802.09941.pdf 16

Some Results

Testing Infrastructure • Google Cloud Platform (GCP) • Type n1-highmem-32 (32 vCPUs, 208 GB memory) • NVIDIA Tesla P100 GPUs • Greenplum database config – Tested up to 20 segment (worker node) clusters – 1 GPU per segment 18

CIFAR-10 • 60k 32x32 color images in 10 classes, with 6k images per class • 50k training images and 10k test images https://www.cs.toronto.edu/~kriz/cifar.html 19

Places • Images comprising ~98% of the types of places in the world • Places365-Standard: 1.8M images from 365 scene categories • 256x256 color images with 50 images/category in validation set and 900 images/category in test set http://places2.csail.mit.edu/index.html 20

6-layer CNN - Test Set Accuracy (CIFAR-10) https://blog.plon.io/tutorials/cifar-10-cla ssification-using-keras-tutorial/ Method: Model weight averaging 21

6-layer CNN - Runtime (CIFAR-10) Method: Model weight averaging 22

1-layer CNN - Test Set Accuracy (CIFAR-10) Method: Model weight averaging 23

1-layer CNN - Runtime (CIFAR-10) Method: Model weight averaging 24

VGG-11 (Config A) CNN - Test Set Acc (Places50) https://arxiv.org/pdf/1409.1556.pdf Method: Model weight averaging 25

VGG-11 (Config A) CNN - Runtime (Places50) Method: Model weight averaging 26

Ensemble with Places365 365 outputs Segment 1 365*n inputs Segment 2 365 outputs 365 outputs Simple CNN Segment n 365 outputs AlexNet https://papers.nips.cc/paper/4824-imagenet-classification-with-d eep-convolutional-neural-networks.pdf 27

AlexNet+Ensemble CNN - Test Set Acc (Places 365) (20 segments) Increase in test set Increase in test set accuracy from ensemble accuracy from ensemble after 40 iterations after 1 iteration Method: Model weight averaging with simple ensemble CNN https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf 28

1-layer CNN - Test Set Accuracy (Places365) (20 segments) Method: Elastic averaging stochastic gradient descent (EASGD) https://arxiv.org/pdf/1412.6651.pdf 29

Lessons Learned and Next Steps

Lessons Learned • Distributed deep learning can potentially run faster than single node, to achieve a given accuracy • Deep learning in a distributed system is challenging (but fun!) • Database architecture imposes some limitations compared to Linux cluster 31

Infrastructure Lessons Learned • Beware the cost of GPUs on public cloud! • Memory management can be finicky – GPU initialization settings and freeing TensorFlow memory • GPU configuration – Not all GPUs available in all regions (e.g., Tesla P100 avail in us-east but not us-west on GCP) – More GPUs does not necessarily mean better performance • Library dependencies important (e.g., cuDNN, CUDA and Tensorflow) 32

Future Deep Learning Work* • 1.16 (Q1 2019) • Initial release of distributed deep learning models using Keras with TensorFlow backend, including GPU support • 2.0 (Q2 2019) • Model versioning and model management • 2.x (2H 2019) • More distributed deep learning methods • Massively parallel hyperparameter tuning • Support more deep learning frameworks • Data parallel models *Subject to community interest and contribution, and subject to change at any time without notice. 33

Thank you!

Backup Slides

Apache MADlib Resources • Web site • Mailing lists and JIRAs – – http://madlib.apache.org/ https://mail-archives.apache.org/mod_mbox/incu bator-madlib-dev/ • Wiki – http://mail-archives.apache.org/mod_mbox/incub – https://cwiki.apache.org/confluence/display/MAD ator-madlib-user/ LIB/Apache+MADlib – https://issues.apache.org/jira/browse/MADLIB • User docs • PivotalR – http://madlib.apache.org/docs/latest/index.html – https://cran.r-project.org/web/packages/PivotalR/ index.html • Jupyter notebooks • Github – https://github.com/apache/madlib-site/tree/asf-sit – https://github.com/apache/madlib e/community-artifacts – https://github.com/pivotalsoftware/PivotalR • Technical docs – http://madlib.apache.org/design.pdf • Pivotal commercial site – http://pivotal.io/madlib 36

Infrastructure Lessons Learned (Details) 37

SQL Interface 38

Greenplum Integrated Analytics Data Transformation Text Traditional BI Machine Geospatial Learning Deep Learning Data Science Graph Productivity Tools 39

Scalable, In-Database Machine Learning Apache MADlib: Big Data Machine Learning in SQL Open source, For PostgreSQL Powerful machine top level and Greenplum learning, graph, Apache project Database statistics and analytics for data scientists • Open source https://github.com/apache/madlib • Downloads and docs http://madlib.apache.org/ • Wiki https://cwiki.apache.org/confluence/display/MADLIB/ 40

History MADlib project was initiated in 2011 by EMC/Greenplum architects and Professor Joe Hellerstein from University of California, Berkeley. UrbanDictionary.com: mad (adj.): an adjective used to enhance a noun. 1- dude, you got skills. 2- dude, you got mad skills. 41

Deep Learning on Massively Parallel Processing Databases Frank - PowerPoint PPT Presentation

Deep Learning on Massively Parallel Processing Databases Frank McQuillan Feb 2019 2 A Brief Introduction to Deep Learning Artificial Intelligence Landscape Deep Learning 4 Example Deep Learning Algorithms Multilayer Recurrent

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Breaking the Linear-Memory Barrier in Massively Parallel Computing MIS on Trees with Strongly

Graph Analytics on Massively Parallel Processing Databases Frank McQuillan Feb 2017 MPP

Loosely Dependent Parallel Processes Complementary Paradigms Massively Parallel Task

Creating Databases and Tables Introduction to Databases in Python Creating Databases

Inductive Inductive Inductive Inductive Databases Databases Databases Databases and

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Massively Parallel A* Search on a GPU Yichao Zhou Jianyang Zeng Institute for Interdisciplinary

Massively Parallel Communication and Query Evaluation Paul Beame U. of Washington Based on

MPMPLAPACK: A Massively Parallel Multi-Precision Linear Algebra Package Jason Martin

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Module 3: Creating and Managing Databases Overview Creating Databases Creating

Massively Parallel Optimization on a Cluster Environment Stratis Ioannidis Data, Networks, and

Scalable Parallel I/O Alternatives for Massively Parallel Partitioned Solver Systems Jing Fu,

Changing How Programmers Think about Parallel Programming William Gropp www.cs.illinois.edu/ ~

Parallel Processing Uniprocessors (single core) come to an end Slowing ability to extract

Overview Parallel computing platforms Approaches to building parallel computers

Synchronization-Free Parallelism Today SPMD and OpenMP programming models

Parallel Processing of Large-Scale XML-Based Application Documents on Multi-core Architectures

Untanglinga)ribu-on DavidD.Clark SusanLandau October,2010 Background

THIRD QUARTER 2019 INVESTOR PRESENTATION Financing the Growth of Tomorrows Companies Today TM

Engaging Parents Through Research and Intervention Elizabeth Ozer, PhD Division of Adolescent

Sambuz

Useful Links

Newsletter

Mail Us