THE GREAT SYNERGY OF BIG DATA TECHNOLGIES Louis Capps, NVIDIA - - PowerPoint PPT Presentation

the great synergy of big data technolgies
SMART_READER_LITE
LIVE PREVIEW

THE GREAT SYNERGY OF BIG DATA TECHNOLGIES Louis Capps, NVIDIA - - PowerPoint PPT Presentation

THE GREAT SYNERGY OF BIG DATA TECHNOLGIES Louis Capps, NVIDIA Solutions Architect, lcapps@nvidia.com Background Big Data vs Fast Data AGENDA HPC and Hyperscale Evolving data technologies Future research 2 Who is NVIDIA? ACCELERATED


slide-1
SLIDE 1

Louis Capps, NVIDIA Solutions Architect, lcapps@nvidia.com

THE GREAT SYNERGY OF BIG DATA TECHNOLGIES

slide-2
SLIDE 2

2

AGENDA

Background Big Data vs Fast Data HPC and Hyperscale Evolving data technologies Future research

slide-3
SLIDE 3

5

ACCELERATED COMPUTING REVOLUTION

Tesla Accelerated Computing Platform

GPU Accelerators CUDA Servers & Interconnects Developer T

  • ols

Accelerated Data Center Accelerated Algorithms

AMBER · GROMACS · NAMD

HPC

MRI · Tomography · Ultrasound

Medical Imaging

Signal · Image · Video RTM · FWI · Elastic

Oil & Gas

person dog chair

Image & Voice Recognition

Deep Learning Defense Optimized Applications in the Data Center

Founded in 1993

Jen-Hsun Huang is co-founder/CEO

Joined NASDAQ as NVDA in 1999 FY14: $4.13 billion in revenue

>9,000 employees worldwide Headquarters: Santa Clara, CA

Created a revolution with first GPU in 1999, has shipped > 1 Billion Leader in

Parallel simulation Visualization Deep Learning Innovation with > 7,000 patents Research investment Bleeding edge technologies

Who is NVIDIA?

slide-4
SLIDE 4

6

X

THE WORLD LEADER IN VISUAL COMPUTING

GAMING ENTERPRISE OEM & IP HPC & CLOUD AUTO

slide-5
SLIDE 5

7

https://research.nvidia.com/publication/online-detection-and-classification-dynamic-hand-gestures-recurrent-3d-convolutional https://research.nvidia.com/publication/parallel-spectral-graph-partitioning https://research.nvidia.com/publication/robust-model-based-3d-head-pose-estimation

slide-6
SLIDE 6

8

BIG DATA VS FAST DATA

slide-7
SLIDE 7

9

FAST DATA IS THE NEW NORM

“...4,300 percent increase in annual data production by 2020.”

– Forbes Magazine and CSC, April 2016

“'Big Data' Is No Longer Enough: It's Now All About 'Fast Data’”

  • Entreprenuer Media, June 2016
slide-8
SLIDE 8

10

OPEN DATA SCIENCE

“Open source software is fundamental to big data, says Roman Shaposhnik, who runs the Apache Incubator project ...’In a way, open source has won in the enterprise’ says Shaposhnik, whose day job is director of open source at Pivotal.” – Datanami, Feb 2016

  • And just in the past month
  • 1. The embrace of stream processing and real-time data access is driving enterprise adoption
  • f the Apache Kafka
  • 2. Google open-sources SyntaxNet, a natural-language understanding library for TensorFlow
  • 3. IBM Is Now Letting Anyone Play With Its Quantum Computer
  • 4. Amazon open-sources its own deep learning software, DSSTNE
  • 5. Facebook details its company-wide machine learning platform, FBLearner Flow
  • 6. Google gives TensorFlow distributed computing support
  • 7. OpenAI launches Gym, a toolkit for testing and comparing reinforcement learning

algorithms

slide-9
SLIDE 9

11

SYNERGY OF BIG DATA TECHNOLOGIES

Growing Research Enterprise Embrace of Open Tech

Data Storage

  • SSD, FLASH
  • Huge DRAM

Database Scalability / Velocity

  • Hadoop
  • Spark
  • Kafka
  • Mutli system interconnect

Cloud

  • Broad acceptance
  • Production
  • Large shared storage

Machine Intelligence

  • Deep Learning
  • Image, text, speech, sensor

Data Capabilities

  • Unstructured
  • Graphs
  • Frameworks

Compute Engines

  • Large clusters
  • Acceleration
  • Extreme bandwidth

Visualization

  • Real-time point clouds
  • Interactive
  • Precise
  • Remote

Insatiable Desire for Insight Geometric Data Growth --- Reducing Insight latency Rise of the Data Scientist Intelligent Insight

slide-10
SLIDE 10

12

REBIRTH OF THE DATA SCIENTIST

  • Computerworld 2016
slide-11
SLIDE 11

13

HPC AND HYPERSCALE

slide-12
SLIDE 12

14

Hyperscale

BIG DATA – FROM HPC TO HYPERSCALE TO ...

HPC Insight Autonomy Huge compute Huge storage Big compute Big storage Huge data out Big Data in Small Data in Small Data out Discovery Prediction Create data Process data

Similar engine?

slide-13
SLIDE 13

15

REMOTE VISUALIZATION ON BLUE WATERS

Faster Time to Results

*Mark D. Klein and John E. Stone, “Unlocking the full potential of the Cray XK7 accelerator”, Cray Users Group, Lugano, Switzerland, May 2014

Data transfer Rendering

48 days 1 day

6 GPUs in local viz cluster 128 GPUs in HPC center

Stellar combustion visualized on Blue Waters (26 TB dataset) 48x Acceleration with Tesla GPUs in the HPC Center

  • Limited GPUs and other hardware

resources

  • Long data transfer times

Local Viz Cluster Limitations HPC Cluster Advantages

  • Scales to 100s of GPUs in the

cluster

  • Eliminates data transfers

Paul Woodward, U. Minnesota: HVR w/ OpenGL

  • n Blue Waters*
slide-14
SLIDE 14

16

PHOTO REALISTIC VR RENDERING

slide-15
SLIDE 15

17

DATA TECHNOLOGY

slide-16
SLIDE 16

18

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 2009 2010 2011 2012 2013 2014 2015 2016

AI RACE IS ON

IBM Watson Achieves Breakthrough in Natural Language Processing Facebook Launches Big Sur Baidu Deep Speech 2 Beats Humans Google Launches TensorFlow Microsoft & U. Science & Tech, China Beat Humans on IQ Toyota Invests $1B in AI Labs

IMAGENET Accuracy Rate

Traditional CV Deep Learning

slide-17
SLIDE 17

19

slide-18
SLIDE 18

20

DEEP LEARNING FOR IMAGE ANALYTICS

person car helmet motorcycle bird frog person dog chair person hammer flower pot power drill

slide-19
SLIDE 19

21

Uber Enters the Race Toyota Invests $1B in AI Lab Volvo Drive Me on Public Roads in 2017 NHTSA: Computer Counts as Driver Tesla Model 3: 300K pre-orders

2016: AN AMAZING YEAR FOR SELF-DRIVING CARS

Audi, BMW, Daimler Buy HERE Tesla Model S Auto-pilot Baidu Enters the Race Honda, Nissan, Toyota Team Up GM Buys Cruise

slide-20
SLIDE 20

22

DEEP LEARNING AND KITTI

slide-21
SLIDE 21

23

ACCELERATING SIGNAL & VIDEO ANALYTICS

Made possible only with GPUs

Real-time HD video enhancements and analytics

12x Faster with GPUs

Video surveillance with faster real time analytics

50-100x speed up over CPU

Unmanned submarine with accelerated sonar processing

12x Faster with GPUs

Faster satellite image processing for actionable intelligence

slide-22
SLIDE 22

24

MISSION PLANNING WITH REAL-TIME LINE OF SIGHT

http://www.luciad.com/

Video Data Image Data Signal Data 1 Computation/Second Delayed Response CPU 100 Computations/Second Real-Time Response GPU

World Leader in Geospatial Situational Awareness

slide-23
SLIDE 23

25

DEEP LEARNING REVOLUTIONIZING MEDICAL RESEARCH

Detecting Mitosis in Breast Cancer Cells Molecular Activity Prediction for Drug Discovery Predicting the Toxicity of New Drugs Understanding Gene Mutation to Prevent Disease

IDSIA Merck Johannes Kepler University University of T

  • ronto
slide-24
SLIDE 24

26

ACCELERATED DATABASE TECHNOLOGY

Big data ISVs moving to the accelerated model

SQL No SQL Graph

Trend towards accelerated computing - variety of firms with accelerated databases for big data available today –well-funded start-ups to large well-known players.

slide-25
SLIDE 25

27

MAPD

Lightning-fast analytic SQ database and vis

  • MapD processing -

> 40k cores vs

Traditional Processing 20 cores

  • 100-1000x

faster queries

  • Visualization
  • f billions of

data points

  • http://www.ma

pd.com/demos/ tweetmap/

slide-26
SLIDE 26

28

NETFLOW GRAPH ANALYTICS USE CASE

Initial graph filter by PageRank / SecureRank using DASL.

  • 140M netflows in real-time
  • Analytics in Scala from Spark
  • Blazegraph DASL
  • Run pagerank
  • Qty shown by color
slide-27
SLIDE 27

29

NETFLOW GRAPH ANALYTICS USE CASE

Interactive visual query session.

Suspicious node communicating with our internal network – but also one outside Then look at internal nodes – many are communicating with a single outside node

slide-28
SLIDE 28

30

NETFLOW GRAPH ANALYTICS USE CASE

Identification of exfiltrated data traffic.

  • Clicking and exploring

attributes

  • Appears node is scanning

from network externally

  • Possibly pulling out data
slide-29
SLIDE 29

31

NVIDIA Research

slide-30
SLIDE 30

32

slide-31
SLIDE 31

33

GPU GRAPH RESEARCH

NVGRAPH

  • Take analytics and make it linear algebra
  • What Blaze graph and DASL do today
  • Pagerank, SSSP, Some accel on GPU

Standardizing with Graph BLAS Working on glue for GraphX Spark + GPUs

  • Offload core ops to GPU
  • Compiler eval data flows and fuse ops together into one kernel (Project Tungsten)

Acceleration

  • Project Tungsten: https://spark-summit.org/2015/events/deep-dive-into-project-tungsten-bringing-spark-closer-to-bare-metal/
  • Graph BLAS www.graphblas.org
slide-32
SLIDE 32

34

NVIDIA DIGITS

Interactive Deep Learning GPU Training System

Test Image

Monitor Progress Configure DNN Process Data Visualize Layers

developer.nvidia.com/digits github.com/NVIDIA/DIGITS

slide-33
SLIDE 33

35

DIGITS FUTURE

Object Detection Workflows for Automotive and Defense Targeted at Autonomous Vehicles, Remote Sensing Object Detection Workflow

developer.nvidia.com/digits github.com/NVIDIA/DIGITS

slide-34
SLIDE 34

36

NEED FOR SPEED

Progress in DNN research depends on compute

Lavin & Gray, “Fast Algorithms for Convolutional Neural Networks”, 2015

slide-35
SLIDE 35

37

GOALS OF ACCELERATION

Progress in DNN research depends on compute

Faster Performance - inf/sec More Efficent

§ Cost – inf/$ § Energy – inf/J Inference – run example forward through the network Training Run the network forward Back-propagate the gradient Update of parameters

slide-36
SLIDE 36

38

THREE KINDS OF NETWORKS

DNN – all fully connected layers (filters, recommendation) CNN – some convolutional layers (image, vision, text) RNN – recurrent neural network, LSTM (semantics, intent)

http://scikit-learn.org/stable/tutorial/machine_learning_map/

slide-37
SLIDE 37

39

DATA PARALLEL EXAMPLE (CPU)

Linear speed-up.

NVIDIA Whitepaper “GPU based deep learning inference: A performance and power analysis.”

slide-38
SLIDE 38

40

MODEL PARALLEL EXAMPLE (CPU)

Results vary.

Dean et al, Google, “Large scale distributed deep networks.” NIPS 2012

slide-39
SLIDE 39

41

NVIDIA DGX-1

WORLD’S FIRST DEEP LEARNING SUPERCOMPUTER

170 TFLOPS FP16 8x Tesla P100 16GB NVLink Hybrid Cube Mesh Accelerates Major AI Frameworks Dual Xeon 7 TB SSD Deep Learning Cache Dual 10GbE, Quad IB 100Gb 3RU – 3200W

slide-40
SLIDE 40

42

P100 NVLINK SYSTEM

nvidia.com/nvlink github.com/NVIDIA/nccl images.nvidia.com/events/sc15/pdfs/NCCL-Woolley.pdf

NVLINK – 40GB/s

  • low overhead
  • Global memory
slide-41
SLIDE 41

43

github.com/NVIDIA/nccl images.nvidia.com/events/sc15/pdfs/NCCL-Woolley.pdf

slide-42
SLIDE 42

44

REDUCED PRECISION FOR DNN

Weight updates and sums are main operations.

The menu of options.

slide-43
SLIDE 43

45

MIXED PRECISION MODEL

Batch normalization important to “center” dynamic range.

slide-44
SLIDE 44

46

REDUCED PRECISION FOR INFERENCE

16 bits is a good spot

slide-45
SLIDE 45

47

REDUCED PRECISION FOR TRAINING

Training diverges without stochastic rounding

  • S. Gupta et al “Deep Learning with Limited Numerical Precision” ICML 2015
slide-46
SLIDE 46

48

DNN RESEARCH

Compress deep neural net with pruning

  • Deep compression - 3 main stages

1. Learn important connections – 9x to 13x reduction 2. Quantize weights for weight sharing – 32 bits to 5

  • Retrain

3. Apply Huffman coding

  • Total of 35x (AlexNet) to 49x (VGG-16)
  • Ability to fit model into on chip SRAM
  • Greatly reduced for use in mobile

Deep Compression

slide-47
SLIDE 47

49

PRUNING DNNS

Reducing size of network reduces work and storage

Han et al “Learning both Weights and Connections for Efficient Neural Networks” NIPS 2015

slide-48
SLIDE 48

50

PRUNING DNNS

Retrain to recover accuracy

Han et al “Learning both Weights and Connections for Efficient Neural Networks” NIPS 2015

Pruned

slide-49
SLIDE 49

51

PRUNING ALEXNET

slide-50
SLIDE 50

52

PRUNING SPEEDUP ON CPU/GPU

Inference

slide-51
SLIDE 51

53

GIE (GPU Inference Engine)

MANAGE TRAIN DEPLOY

DIGITS

DATA CENTER AUTOMOTIVE TRAIN TEST MANAGE / AUGMENT EMBEDDED

GPU INFERENCE ENGINE

developer.nvidia.com/gpu-inference-engine

slide-52
SLIDE 52

54

GPU INFERENCE ENGINE

Optimizations

  • Fuse network layers
  • Eliminate concatenation layers
  • Kernel specialization
  • Auto-tuning for target platform
  • Select optimal tensor layout
  • Batch size tuning

TRAINED NEURAL NETWORK

OPTIMIZED INFERENCE RUNTIME

developer.nvidia.com/gpu-inference-engine

slide-53
SLIDE 53

55

GPU INFERENCE ENGINE

Performance

BATCH SIZE PERFORMANCE POWER EFFICIENCY Tesla M4

128 1153 images/s 20 images/s/W

Jetson TX1

2 133 images/s 24 images/s/W

developer.nvidia.com/gpu-inference-engine

slide-54
SLIDE 54

56

GPU REST ENGINE (GRE)

GRE is a REST server for low-latency image classification (inference) using NVIDIA GPUs . Software that will allow you to build your own accelerated microservices. Makes use of several technologies:

  • Docker: for bundling all the dependencies of our program and for easier deployment.
  • Go: for its efficient builtin HTTP server.
  • Caffe: because it has good performance and a simple C++ API.
  • cuDNN: for accelerating common deep learning primitives on the GPU.
  • OpenCV: to have a simple C++ API for GPU image processing.

GPU REST Engine

github.com/NVIDIA/gpu-rest-engine

slide-55
SLIDE 55

57

DESIGN FOR PRODUCTION DATA

Drinking data from a firehose Scalable stream rate Generator -> Analytic Extended runs

  • Resource limitations, memory fragmentation, windowed vs continuous

Firehose Research

http://firehose.sandia.gov http://wsga.sandia.gov/docs/Anderson.firehose_benchmark.pdf

slide-56
SLIDE 56

58

NVIDIA END-TO-END AUTONOMOUS DRIVING PLATFORM

NVIDIA DRIVE PX 2 NVIDIA DGX-1 NVIDIA DRIVENET

Localization Planning Visualization Perception

DRIVEWORKS

slide-57
SLIDE 57

59

DAVE NET

End to End Learning for Self Driving Cars

slide-58
SLIDE 58

60

PX-2 INTERFACES

Sensor Fusion Interfaces:

GMSL Camera, CAN, GbE, BroadR-Reach, FlexRay, LIN, GPIO

Displays and Cockpit Computer Interfaces

HDMI, FPDLink III and GMSL

Development and Debug Interfaces

HDMI, GbE, 10GbE, USB3, USB 2 (UART/debug), JTAG

70 Gigabits per second of I/O

Auto Grade connectors Debug/Lab interfaces

slide-59
SLIDE 59

61

CALL TO ACTION

Review research.nvidia.com, developer.nvidia.com to see emerging technologies Join linked-in big data page to follow real-time trends Tutorials on Spark, Kafka, Mapd, GPUdb, Blazegraph technologies Visit OpenAI playground for ideas Kaggle competitions for ideas

  • Louis Capps, lcapps@nvidia.com
slide-60
SLIDE 60

62

END OF SLIDESHOW