Eric Thorsen, Global Retail Business Development
WITH ARTIFICIAL INTELLIGENCE Eric Thorsen, Global Retail Business - - PowerPoint PPT Presentation
WITH ARTIFICIAL INTELLIGENCE Eric Thorsen, Global Retail Business - - PowerPoint PPT Presentation
REVOLUTIONIZING RETAIL WITH ARTIFICIAL INTELLIGENCE Eric Thorsen, Global Retail Business Development CHALLENGES FACING CONSUMER INDUSTRIES Millennials outnumber Emergence of new Baby Boomers digital shopping Digital
2
Digital Competition Omnichannel Constraints Consumer Demand Demographic Changes
CHALLENGES FACING CONSUMER INDUSTRIES
- Millennials outnumber
Baby Boomers
- “Digital Natives” demand
changing experience
- Specific
- Impatient
- Particular
- Mobile
- Web
- Stores
- Emergence of new
digital shopping experiences
- Emergence of device
proxies
3
SUPPLY CHAIN
AI FOR RETAIL
STORE OPERATIONS CORPORATE HEADQUARTERS
4
AR/VR CONSUMER INTERACTION
AI ONLINE & IN THE STORE
SHELF ANALYSIS, CONSUMER ADVICE TARGETED RECOMMENDATIONS
5
VIDEO RECOMMENDATIONS
RECOMMENDATION ENGINES ON GPU CLOUD
SONG RECOMMENDATIONS TARGETED RECOMMENDATIONS
6
DYNAMIC SUPPLY CHAIN REAL-TIME RE-ROUTING
AI IN SUPPLY CHAIN
WAREHOUSE OPTIMIZATION COLLABORATIVE PLANNING AND REPLENISHMENT
7
AI AT CORPORATE HQ
SINGLE VIEW OF CONSUMER DEMAND SIGNAL ANALYSIS AD SPEND OPTIMIZATION PREDICTIVE ANALYTICS
8
GPU-ACCELERATED ECOSYSTEM
PLAN
Assortment Planning CPFR Seasonal Promotions Product Design Open to Buy
BUY (BUILD)
Procurement Vendor Management Quality Inspection Manufacturing Automation
MOVE
Inventory & Route Optimization Telemetry Autonomous Vehicles and Drones Demand Driven Supply Network
SELL
Recommendation Logic Magic Mirror Clienteling Path to Purchase Frictionless Commerce
SERVICE
Reverse Logistics Returns Management Call Center Optimization Upsell / Cross Sell
Collaborative Design / Shelf Optimization AR/VR Customer Experience Windows 10 Acceleration / Knowledge worker enablement CSP , NGC, DGX (Training) CSP , NGC, DGX (Training) TRT (Inference) TRT (Inference) GPU Accelerated Applications: Space Planning, Optimization, SAP Leonardo, SAP HANA Accelerated Analytics: Kinetica, MapD, Graphistry, H20
Pro Viz Deep Learning HPC GRA GRID
Loss Prevention, Shopper Tracking, Robotics, Frictionless Commerce
Video Analytics
Quality Inspection Consumer Engagement / Recommendation Engine NN Asst Planning / Forecast & Replenishment NN
9
GPU’S PROVIDE BETTER DATA CENTER TCO
160 CPU Servers 65,000 Watts 1 NVIDIA HGX with 8 Tesla V100 GPU’s 3,000 Watts
1/6th the cost 1/20th the power, 4 racks in a box
10
1980 1990 2000 2010 2020
RISE OF GPU-ENABLED COMPUTING
GPU-Computing perf 1.5X per year 1000X By 2025
Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp
102 103 104 105 106 107 Single-threaded perf 1.5X per year 1.1X per year APPLICATIONS SYSTEMS ALGORITHMS CUDA ARCHITECTURE
11
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
11
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
NVIDIA DEEP LEARNING EVERYWHERE, EVERY PLATFORM
TITAN X
PC Development
DGX-1
AI Supercomputing Optimized Deep Learning Software
TESLA
Servers in every shape and size
CLOUD
Everywhere
12
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
12
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
PERFORMANCE FROM THE DATA CENTER
Graphics accelerated virtual desktops and applications
All devices have graphics Virtual machines also need a GPU
13
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
NVIDIA GPUs EVERYWHERE
Industry standard servers 120+ Servers from more than 30 system vendors Hyper Converged Blade Servers Cloud offerings
Industry Standard Servers Hyper-Converged Infrastructure Blade Servers Public Cloud
14
Inception Partners – AI Startups in Retail
15
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
15
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
RESOURCES: GTC & DLI
GPU Technology Conference 2018
http://www.gputechconf.com/ Retail Breakfast to share best practices and lessons learned Selective Retail Business tracks highlighting AI success Deep dive hands-on sessions to experience AI Customer stories showing success using AI, ML, and DL
DLI WORKSHOPS
https://www.nvidia.com/en-us/deep-learning-ai/education/
October 2017
RETAIL CUSTOMER STORIES
18
USPS delivers more than 150 billion pieces of mail each year, a logistics operation that is at a scale second to none. After experiencing increased delays and instances of fraud, USPS needed a different approach to data analytics. Using Kinetica’s GPU-accelerated solution, USPS achieved near-immediate analysis of data from over 213,000 scanning devices at post offices and processing facilities around the country. Last year, USPS delivered 154 billion pieces of mail, while driving 70 million fewer miles, saving 7 million gallons of fuel and preventing 70,000 tons of carbon emissions.
19
RETAIL INVENTORY MANAGEMENT
Safety Stock
Optimization of safety stock for each store/item Home grown algorithm ported from CPU cluster to GPU Time required dropped from hundreds of days on a single CPU node to a few hours on a single GPU (x4) node. Speed up of approximately 700x
Time Forecasting models
Hundreds of millions store/item combinations forecast weekly – Multiple models utilized to forecast including Holt-Winter, Arima and GLM. NVIDIA provided a Holt-Winter GPU version specified and integrated by customer. Comparative tests of 8 million store/items showed reduction in time from 15 minutes (across
- approx. 38 servers) to 24 seconds on 1 GPU (x4) node.
GPU version could allow a daily forecast because of speed and scaling abilities.
20
AI IMPROVES THE CUSTOMER EXPERIENCE
AI is dramatically changing the online shopping experience with tangible improvements to retailers and consumers. In 2016 online British grocery giant Ocado improved customer service with their AI- enhanced contact center, and is applying machine learning and NVIDIA GPUs to develop humanoid robotics to assist maintenance technicians, and advanced computer vision for image classification and recognition to replace barcode systems. Computer vision will expedite the picking process and better ensure orders are filled correctly so customers receive exactly what they ordered.
21
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
21
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
AI-DRIVEN SMART SHOPPING
According to Forrester E-Commerce was a $390B market in 2016 and is expected to double by 2024. E-commerce company Jet.com (acquired by Walmart) partners with multitudes of suppliers with different
- fferings at different prices. Jet uses GPU-
accelerated AI to drive its smart cart solution that fulfills orders at the lowest prices though the smart bundling of supplier offers. The platform finds the ideal merchant and warehouse combination to lower the total order cost. The bigger the shopping cart, the greater the savings that can be generated.
22
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
22
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
DISCOVER MORE WITH DEEP LEARNING
Online shopping can be convenient but searching through multiple websites can be arduous and time-consuming. Pinterest makes it easy for users to quickly discover things they love. Automatic object detection lets users search for products within a Pin’s image, and Shop the Look lets users buy items seen in fashion and home décor Pins. Scientists on Pinterest’s visual search team use GPU-accelerated deep learning to teach their system to recognize image features using a dataset of billions of Pins and compute similarity scores to identify the best
- matches. One visual search study reports a
50% improvement in user engagement and traffic.
23
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
23
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
AI TOOL LETS YOU APPLY BEFORE YOU BUY
Testing different types of makeup can take hours and be a frustrating experience. ModiFace is using GPUs and facial modeling technology to help consumers explore and select the ideal products. ModiFace developed the ‘Sephora Virtual Artist’, an
- nline tool that allows consumers to virtually
experiment with new makeup without having to leave their computer screen. With technology on skin analysis and facial visualization, ModiFace and its AI features have introduced a more efficient way to style oneself.
24
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
24
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
AI PERSONALIZES SKIN CARE
Using the wrong skincare products can be a major cause of customer dissatisfaction so Olay is arming women with the knowledge they need to make informed product purchase decisions. Its Olay Skin Advisor is a GPU-accelerated AI tool that works on any mobile device — users provide a selfie, information about age, skin issues, skin type and product preferences, and the tool advises how to improve trouble areas using a daily regime of recommended Olay products. After four weeks 94% of women who tried the skin advisor continued to use the products it recommended.
25
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
25
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
REINVENTING RETAIL BY COMBINING ART AND AI
In fashion, styles change quickly but the fundamental customer experience —brick-and-mortar stores and traditional online shopping sites— hasn’t changed much in the past decade. Stitch Fix broke that mold with a fashion styling service that combines the art of personal styling with data analytics insights powered by GPU-accelerated deep
- learning. Stitch Fix’s 50+ style recommendation algorithms
match clothing and accessories to clients based on their unique style preferences. Most recently, Stitch Fix changed the game again with a deep learning image recognition system that locates fashion items for clients based their shared Pinterest boards.
26
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
26
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
We depend on a safe cyberspace for just about every aspect of our lives. Cyber attacks can be devastating, and in today’s world mutations have become the rule not the exception. Cylance leverages GPU-driven deep learning to predict and prevent malicious code execution by identifying indicators of an attack. CylancePROTECT immediately prevented the execution of the May 2017 WannaCry attack on 100%
- f its customers’ endpoints.
REDEFINING CYBERSECURITY WITH AI
27
AI TOOL BOOSTS CUSTOMER SERVICE
KLM’s 235 social media service agents engage in 15K conversations a week, 24/7. To contend with the
- verwhelming volume of messages, KLM uses GPU-
accelerated deep learning to predict the best response to an incoming message and shows it to a contact center agent for approval or personalization before sending it to the customer. The resulting time savings for KLM service agents means they can focus on customers with more pressing needs and handle a greater volume of questions while still maintaining a high degree of customer satisfaction.
28
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
28
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
A NEW WAVE OF AI BUSINESS APPLICATIONS
Many brands rely on sponsoring televised events, yet impact is difficult to track. Manual tracking takes up to six weeks to measure ROI and even longer to adjust
- expenditures. SAP Brand Impact, powered by NVIDIA
deep learning, measures brand attributes in near real- time with superhuman accuracy thanks to deep neural networks trained on NVIDIA DGX-1 and TensorRT to provide video inference analysis. Results are immediate, accurate and auditable, and delivered in a day.
29
A NEW WAVE OF AI BUSINESS APPLICATIONS
Brand impact measurement on televised events in real time vs 6 weeks. Immediate, accurate and auditable, delivered in a day Brand Impact, Service Ticketing, Invoice-to-Record applications
30
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
30
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
According to the EPA 62% of the U.S.'s electricity is consumed by the commercial and industrial
- segments. But how much of that consumption is
inefficient? Verdigris is on a mission to help businesses eliminate wasteful energy spend with their Smart Building optimization solutions. Verdigris is harnessing the power of data and GPU- driven deep learning to continually audit and analyze electronic signatures of individual devices to learn what's "normal" and identify patterns and instances of energy waste. And with real-time monitoring and alerts, response teams can react to solve problems immediately.
BETTER DATA, SMARTER BUILDINGS
31
CONSUMER MONITORING
Standard Traffic counters measure traffic into and out
- f the store. Computer Vision offers enhancements by
providing:
- Unique Identity Detection – integrated into loyalty
program where appropriate or available
- Age / Ethnicity segmentation. Detect age groups,
including children, seniors.
- Shopping behavior tracking. Groups, couples,
individuals
- Traffic patterns. Identify Path to Purchase, hot
zones, cold zones, dwell points. Helps retailers make decisions on item placement or promotions
- Can integrate into app-based recommendation
- logic. Ability to launch targeted promotions based
- n proximity, past purchase, and consumer profile
Multiple camera signals can be stitched together to detect patterns within the store. Exterior cameras can determine shopper density based
- n parking. Origin tracking can identify external
traffic sources, and/or co-marketing opportunities
32
YouTube Link: https://youtu.be/AMDiR61f86Y
Warehouse / Distribution Center Optimization
Warehouse and DC’s are not built for consumer traffic and can pose health and safety challenges for workers. During peak season, shelves fill up and working space is reduced to a minimum, making it harder for humans to navigate safely and accurately measure inventory. IFM uses NVIDIA Jetson technology mounted on a drone to autonomously monitor inventory positions in the DC or Warehouse. As an Inception partner, IFM is closely aligned with NVIDIA and is poised to deliver incredible impact on retail and supply chain business processes
33
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
33
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
YouTube Link: https://youtu.be/l7NPmJP462M
Shelf Scanning Robotics
Store Associates are representatives of the brand, and the face of the retail organization. It makes sense to reduce the time spent performing tasks that are not consumer-facing. Performing inventory counts, replacing misplaced items,
- r scanning for out-of-stock situations are examples of
basic, repetitive, and non-impactful operations for store associates. Fellow Robots has created a solution to scan shelves, monitor misplaced items, and act as a wayfinder kiosk for consumers. This allows associates to interact with the shopping public, improving consumer satisfaction and raising revenue through larger shopping baskets. As an Inception partner, Fellow is closely aligned with NVIDIA and is poised to deliver incredible impact on retail business processes
34
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
34
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
TRAFFIC PATTERNS LOSS PREVENTION
Using existing cameras, a retailer can install highly effective computer vision algorithms to detect shopper traffic patterns and prevent loss. In the US, LP is a $48B problem impacting all retailers. At the same time, investment in LP staff is flat of shrinking. While average cost of shoplifting incident is doubling to $798, 30% of inventory shrinkage is an inside job. Using computer vision can identify theft, shrinkage, and shoplifting incidents. This new technology can invigorate a longstanding problem for retail.
35
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
35
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
COLLABORATIVE DESIGN
Photorealistic Models Interactive Physics Design Flow Integration Collaboration
ART OF THE POSSIBLE
Paul Hendricks Solutions Architect phendricks@nvidia.com
The State of AI in Retail
38
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
38
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
- Paul Hendricks is a Solutions Architect at NVIDIA, helping
enterprise customers with their deep learning and AI initiatives
- Paul's background is primarily in retail, and has spent the
past 5 years working with many Fortune 500 retail companies to implement data science and AI solutions.
- Prior to joining NVIDIA, Paul worked at Victoria’s Secret as a
Data Scientist building models to understand customer propensity to purchase and how to optimize assortment in stores.
- Currently, Paul's research at NVIDIA focuses on using deep
learning in intelligent video analytics and recommendation systems.
INTRODUCTION
39
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
39
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
- Paul Hendricks is a Solutions Architect at NVIDIA, helping
enterprise customers with their deep learning and AI initiatives
- Paul's background is primarily in retail, and has spent the
past 5 years working with many Fortune 500 retail companies to implement data science and AI solutions.
- Prior to joining NVIDIA, Paul worked at Victoria’s Secret as a
Data Scientist building models to understand customer propensity to purchase and how to optimize assortment in stores.
- Currently, Paul's research at NVIDIA focuses on using deep
learning in intelligent video analytics and recommendation systems.
INTRODUCTION
40
Intelligent Video Analytics
41
Object Detection
- Data: Images
- Goal: Identify objects in an image, and output bounding boxes around the objects and their classes
Problem Background
42
Frictionless Checkout
https://www.standardcognition.com/
43
Localizing Algorithms
Sliding windows
- If one of the windows only has half of the dog, the
activation may not be strong enough
- Using small windows and small strides will be very
computationally intensive
44
Localizing Algorithms
Sliding windows
- If one of the windows only has half of the dog, the
activation may not be strong enough
- Using small windows and small strides will be very
computationally intensive
Fully convolutional neural network
- Since convolutions are basically sliding windows,
we can try replacing the fully connected layers with convolutional layers
- Bounding boxes generated are not very accurate
45
Localizing Algorithms
Sliding windows
- If one of the windows only has half of the dog, the
activation may not be strong enough
- Using small windows and small strides will be very
computationally intensive
Fully convolutional neural network
- Since convolutions are basically sliding windows,
we can try replacing the fully connected layers with convolutional layers
- Bounding boxes generated are not very accurate
Region proposals
- Selects blob-like structures and proposes these as
the regions to be passed into a CNN
- This concept is similar to sliding window
46
Localizing Algorithms
Sliding windows
- If one of the windows only has half of the dog, the
activation may not be strong enough
- Using small windows and small strides will be very
computationally intensive
Fully convolutional neural network
- Since convolutions are basically sliding windows,
we can try replacing the fully connected layers with convolutional layers
- Bounding boxes generated are not very accurate
Region proposals
- Selects blob-like structures and proposes these as
the regions to be passed into a CNN
- This concept is similar to sliding window
Single shot detection
- This algorithm predicts the coordinates of the
bounding boxes as well as the class of the objects
- Fast since model looks at image once - YOLO
47
Getting Started
DLI Courses
- Object Detection with DIGITS - https://nvlabs.qwiklab.com/focuses/4125
Papers
- Fully convolutional layers for semantic segmentation - https://arxiv.org/pdf/1605.06211.pdf
- Rich hierarchies for accurate object detection and semantic segmentation - https://arxiv.org/pdf/1311.2524.pdf
- Fast R-CNN - https://arxiv.org/pdf/1504.08083.pdf
- Faster R-CNN: Towards real-time object detection with region proposal networks - https://arxiv.org/pdf/1506.01497.pdf
- Yolo 9000: Better, Faster, Stronger - https://arxiv.org/pdf/1612.08242.pdf
Libraries
- https://github.com/pjreddie/darknet
- https://github.com/tensorflow/models/tree/master/research/object_detection
Datasets
- COCO - http://cocodataset.org/
- ImageNet - https://www.kaggle.com/c/imagenet-object-detection-challenge
48
Anomaly Detection
49
Anomaly Detection
- Data: Image, sensor data (time series), text data
- Goal: Detect if the data being generated is anomalous
Problem Background
50
Anomaly Detection
- Data: Image, sensor data (time series), text data
- Goal: Detect if the data being generated is anomalous
Problem Background
51
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
51
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
UNSUPERVISED LEARNING
Deep Autoencoder Network
Input layer
Size of data vector
Bottleneck layer
Summarized representation
▪ ‘embedding’ Output layer
Same dimensionality as input
Reconstruction error
High errors indicate potential anomaly
Anomaly detection using deep learning
Input: X Output: ෩
X
X − ෩ X
Reconstruction Error
52
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
52
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
Time Series Signals
Split into sliding windows
Normalization and preprocessing
w 1 w 2 w 3 w N… …
UNSUPERVISED LEARNING
DL anomaly detection in time series
53
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
53
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
Input
UNSUPERVISED LEARNING
Detecting anomalies via reconstruction error
54
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
54
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
Input
UNSUPERVISED LEARNING
Detecting anomalies via reconstruction error
55
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
55
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
Output (Reconstruction) Input
UNSUPERVISED LEARNING
Detecting anomalies via reconstruction error
56
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
56
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
Reconstruction error (RE) as a proxy to
- utliers
Whenever RE is high, consider it a red flag
Threshold can be set using statistical bounds
Output (Reconstruction) Input Reconstruction vs Input
UNSUPERVISED LEARNING
Detecting anomalies via reconstruction error
57
Getting Started
DLI Courses
- Introduction to Autoencoders
- Anomaly Detection with Variational Autoencoders - https://nvlabs.qwiklab.com/focuses/8362
Papers & Books
- Autoencoders - https://papers.nips.cc/paper/798-autoencoders-minimum-description-length-and-helmholtz-free-energy.pdf
- Deep Learning, Chapter 13 - http://a.co/1vbPNXr
- Hands on Machine Learning with Scikit-Learn & TensorFlow, Chapter 13 - http://a.co/aImsrRT
Datasets
- Fashion MNIST - https://github.com/zalandoresearch/fashion-mnist
- Deep Fashion - http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html
- UT Zappos 50k - http://vision.cs.utexas.edu/projects/finegrained/utzap50k/
58
Recommendation Systems
59
RECOMMENDATION SYSTEMS
- Data: Matrix R [ rows are users, columns are items, cell values are ratings ]
- Goal: Compute missing Values in R – top N unseen items are good recommendation candidates
Problem Background
X R
60
MANY APPLICATIONS FROM SIMILAR PROBLEMS
Using autoencoders to generate recommendations
61
MANY APPLICATIONS FROM SIMILAR PROBLEMS
Using autoencoders to generate recommendations
https://github.com/NVIDIA/DeepRecommender/
62
MANY APPLICATIONS FROM SIMILAR PROBLEMS
Using autoencoders to generate recommendations
https://github.com/NVIDIA/DeepRecommender/
5 5 5 4 4 5 3 3 2 3 4 2 4 3 4 4 5 4
63
Getting Started
DLI Courses
- Deep Autoencoders for Recommender Systems
Papers
- AutoRec – Autoencoders meet collaborative filtering - http://users.cecs.anu.edu.au/~u5098633/papers/www15.pdf
- Training deep autoencoders for collaborative filtering- https://arxiv.org/pdf/1708.01715.pdf
Libraries
- https://github.com/NVIDIA/DeepRecommender
- https://github.com/geffy/tffm
- https://github.com/apache/incubator-mxnet/tree/master/example/recommenders
Datasets
- Netflix - https://netflixprize.com/
- MovieLens – https://grouplens.org/datasets/movielens/
- UC Irvine Online Retail Dataset - http://archive.ics.uci.edu/ml/datasets/online+retail
64
NVIDIA Tools
65
TESLA V100 32GB
WORLD’S MOST ADVANCED DATA CENTER GPU NOW WITH 2X THE MEMORY 5,120 CUDA cores 640 NEW Tensor cores 7.8 FP64 TFLOPS | 15.7 FP32 TFLOPS | 125 Tensor TFLOPS 20MB SM RF | 16MB Cache 32GB HBM2 @ 900GB/s | 300GB/s NVLink
66
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
66
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
FASTER RESULTS ON COMPLEX DL AND HPC
Up to 50% Faster Results With 2x The Memory
Unsupervised Image Translation Input winter photo AI converts it to summer
Dual E5-2698v4 server, 512GB DDR4, Ubuntu 16.04, CUDA9, cuDNN7| NMT is GNMT-like and run with TensorFlow NGC Container 18.01 (Batch Size= 128 (for 16GB) and 256 (for 32GB) | FFT is with cufftbench 1k x 1k x 1k and comparing 2 V100 16GB (DGX1V) vs. 2 V100 32GB (DGX1V)
Neural Machine Translation (NMT) 3D FFT 1k x 1k x 1k
1.5X Faster Calculations 1.5X Faster Language Translation
1.2 step/sec 0.8 step/sec 2.5TF 3.8TF
GAN Image to ImageGen
1024x1024 res images 512x512 res images
4X Higher resolution
Accuracy (16 layers) Accuracy (152 layers)
HIGHER ACCURACY HIGHER RESOLUTION FASTER RESULTS
R-CNN for object detection at 1080P with Caffe | V100 16GB uses VGG16| V100 32GB uses Resnet-152
V100 16GB V100 32GB VGG-16 RN-152 40% Lower Error Rate
GAN by NVRESEARCH (https://arxiv.org/pdf/1703.00848.pdf) | V100 16GB and V100 32GB with FP32
67
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
67
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
NEW TENSOR CORE BUILT FOR AI
Delivering 120 TFLOPS of DL Performance TENSOR CORE
ALL MAJOR FRAMEWORKS VOLTA-OPTIMIZED cuDNN
MATRIX DATA OPTIMIZATION: Dense Matrix of Tensor Compute TENSOR-OP CONVERSION: FP32 to Tensor Op Data for Frameworks
TENSOR CORE
VOLTA TENSOR CORE
4x4 matrix processing array D[FP32] = A[FP16] * B[FP16] + C[FP32] Optimized For Deep Learning
68
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
68
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
NVIDIA DGX
AI Supercomputer-in-a-Box
960 TFLOPS | 8x Tesla V100 16GB | NVLink Hybrid Cube Mesh 2x Xeon | 8 TB RAID 0 | Quad IB 100Gbps, Dual 10GbE | 3U — 3200W
69
THE WORLD’S MOST POWERFUL AI SYSTEM FOR THE MOST COMPLEX AI CHALLENGES
- DGX-2 is the newest addition to the DGX
family, powered by DGX software
- Deliver accelerated AI-at-scale deployment
and simplified operations
- Step up to DGX-2 for unrestricted model
parallelism and faster time-to-solution
INTRODUCING NVIDIA DGX-2
THE WORLD’S FIRST 2 PETAFLOPS SYSTEM
70
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
70
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
10X PERFORMANCE GAIN LESS THAN A YEAR
DGX-1, SEP’17 DGX-2, Q3‘18
PyTorch Stack: Time to Train FAIRSEQ
software improvements across the stack including NCCL, cuDNN, etc.
5 10 15 DGX-1V DGX-2
15 days 1.5 days
71
NVSWITCH
WORLD’S HIGHEST BANDWIDTH ON-NODE SWITCH 7.2 Terabits/sec or 900 GB/sec 18 NVLINK ports | 50GB/s per port bi-directional Fully-connected crossbar 2 billion transistors | 47.5mm x 47.5mm package
72
NVSWITCH
ENABLES THE WORLD’S LARGEST GPU 16 Tesla V100 32GB Connected by New NVSwitch 2 petaFLOPS of DL Compute Unified 512GB HBM2 GPU Memory Space 300GB/sec Every GPU-to-GPU 2.4TB/sec of Total Cross-section Bandwidth
73
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
73
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
CHALLENGES WITH DEEP LEARNING
Current DIY deep learning environments are complex and time consuming to build, test and maintain Requires high level of expertise to manage driver, library, framework dependencies Development of frameworks by the community is moving very fast
NVIDIA Libraries NVIDIA Docker NVIDIA Driver NVIDIA GPU Open Source Frameworks
74
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
NVIDIA GPU CLOUD
Innovate in minutes, not weeks Removes all the DIY complexity of deep learning software integration Always up to date Monthly updates by NVIDIA to ensure maximum performance Deep learning across platforms Containers run locally on DGX Systems and TITAN PCs, or on cloud service provider GPU instances
Deep Learning Everywhere, For Everyone
NVIDIA GPU Cloud integrates GPU-optimized deep learning frameworks, runtimes, libraries, and OS into a ready-to-run container, available at no charge
75
DEEP LEARNING ACROSS PLATFORMS
NVIDIA Volta or NVIDIA Pascal-powered TITAN GPU NVIDIA DGX-1 and DGX Station Amazon EC2 P3 instances with NVIDIA Volta
76
KUBERNETES on NVIDIA GPUs
- Scale-up Thousands of GPUs Instantly
- Self-healing Cluster Orchestration
- GPU Optimized Out-of-the-Box
- Powered by NVIDIA Container Runtime
- Included with Enterprise Support on DGX
- Available end of April 2018
76
Container Orchestration for DL Training & Inference
NVIDIA GPUs AWS-EC2 | GCP | Azure | DGX NVIDIA CONTAINER RUNTIME KUBERNETES NVIDIA GPU CLOUD
77
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
77
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
TENSORRT DEPLOYMENT WORKFLOW
TensorRT Optimizer TensorRT Runtime Engine Trained Neural Network
Step 1: Optimize trained model
Plan 1 Plan 2 Plan 3
Optimized Plans
Step 2: Deploy optimized plans with runtime
Embedded Automotive Data center
Import Model Serialize Engine
Plan 1 Plan 2 Plan 3
Optimized Plans De-serialize Engine Deploy Runtime
78
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
78
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
TensorRT INTEGRATED WITH TensorFlow
Delivers 8x Faster Inference with TensorFlow + TRT
Available in TensorFlow 1.7
https://github.com/tensorflow
CPU: Skylake Gold 6140, 2.5GHz, Ubuntu 16.04; 18 CPU threads. Volta V100 SXM; CUDA (384.111; v9.0.176); Batch size: CPU=1, TF_GPU=2, TF-TRT=16 w/ latency=6ms
* Best CPU latency measured at 83 ms
11 325 2,657 500 1,000 1,500 2,000 2,500 3,000 Images/sec @ 7ms Latency ResNet-50 on TensorFlow
* CPU (FP32) V100 (TensorFlow, FP32) V100 Tensor Cores (TensorFlow+ TensorRT)
79
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
79
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
NVIDIA TensorRT 4 RC NOW AVAILABLE
Maximize RNN and MLP Throughput
RNN and MLP Layers ONNX Import NVIDIA DRIVE Support
Free download to members of NVIDIA Developer Program developer.nvidia.com/tensorrt Optimize and Deploy ONNX Models
Easily import and accelerate inference for ONNX frameworks (PyTorch, Caffe 2, CNTK, MxNet and Chainer) Speed up speech, audio and recommender app inference performance through new layers and optimizations
0X 10X 20X 30X 40X 50X
CPU TensorRT
Recommendation Engine 45X Speedup
Deploy optimized deep learning inference models NVIDIA DRIVE Xavier
Support for NVIDIA DRIVE Xavier
80
NVIDIA DIGITS
Interactive Deep Learning GPU Training System
developer.nvidia.com/digits
Interactive deep learning training application for engineers and data scientists
Simplify deep neural network training with an interactive interface to train and validate, and visualize results Built-in workflows for image classification, object detection and image segmentation Improve model accuracy with pre-trained models from the DIGITS Model Store Faster time to solution with multi-GPU acceleration
81
OBJECT DETECTION IMAGE CLASSIFICATION
DIGITS DEEP LEARNING WORKFLOWS
Classify images into classes or categories Object of interest could be anywhere in the image Find instances of objects in an image Objects are identified with bounding boxes
98% Dog 2% Cat
Partition image into multiple regions Regions are classified at the pixel level
IMAGE SEGMENTATION
82
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
82
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
WHAT’S NEW IN DIGITS 6?
TENSORFLOW SUPPORT NEW PRE-TRAINED MODELS
Train TensorFlow Models Interactively with DIGITS Image Classification: VGG-16, ResNet50 Object Detection: DetectNet