WITH ARTIFICIAL INTELLIGENCE Eric Thorsen, Global Retail Business - - PowerPoint PPT Presentation

with artificial intelligence
SMART_READER_LITE
LIVE PREVIEW

WITH ARTIFICIAL INTELLIGENCE Eric Thorsen, Global Retail Business - - PowerPoint PPT Presentation

REVOLUTIONIZING RETAIL WITH ARTIFICIAL INTELLIGENCE Eric Thorsen, Global Retail Business Development CHALLENGES FACING CONSUMER INDUSTRIES Millennials outnumber Emergence of new Baby Boomers digital shopping Digital


slide-1
SLIDE 1

Eric Thorsen, Global Retail Business Development

REVOLUTIONIZING RETAIL WITH ARTIFICIAL INTELLIGENCE

slide-2
SLIDE 2

2

Digital Competition Omnichannel Constraints Consumer Demand Demographic Changes

CHALLENGES FACING CONSUMER INDUSTRIES

  • Millennials outnumber

Baby Boomers

  • “Digital Natives” demand

changing experience

  • Specific
  • Impatient
  • Particular
  • Mobile
  • Web
  • Stores
  • Emergence of new

digital shopping experiences

  • Emergence of device

proxies

slide-3
SLIDE 3

3

SUPPLY CHAIN

AI FOR RETAIL

STORE OPERATIONS CORPORATE HEADQUARTERS

slide-4
SLIDE 4

4

AR/VR CONSUMER INTERACTION

AI ONLINE & IN THE STORE

SHELF ANALYSIS, CONSUMER ADVICE TARGETED RECOMMENDATIONS

slide-5
SLIDE 5

5

VIDEO RECOMMENDATIONS

RECOMMENDATION ENGINES ON GPU CLOUD

SONG RECOMMENDATIONS TARGETED RECOMMENDATIONS

slide-6
SLIDE 6

6

DYNAMIC SUPPLY CHAIN REAL-TIME RE-ROUTING

AI IN SUPPLY CHAIN

WAREHOUSE OPTIMIZATION COLLABORATIVE PLANNING AND REPLENISHMENT

slide-7
SLIDE 7

7

AI AT CORPORATE HQ

SINGLE VIEW OF CONSUMER DEMAND SIGNAL ANALYSIS AD SPEND OPTIMIZATION PREDICTIVE ANALYTICS

slide-8
SLIDE 8

8

GPU-ACCELERATED ECOSYSTEM

PLAN

Assortment Planning CPFR Seasonal Promotions Product Design Open to Buy

BUY (BUILD)

Procurement Vendor Management Quality Inspection Manufacturing Automation

MOVE

Inventory & Route Optimization Telemetry Autonomous Vehicles and Drones Demand Driven Supply Network

SELL

Recommendation Logic Magic Mirror Clienteling Path to Purchase Frictionless Commerce

SERVICE

Reverse Logistics Returns Management Call Center Optimization Upsell / Cross Sell

Collaborative Design / Shelf Optimization AR/VR Customer Experience Windows 10 Acceleration / Knowledge worker enablement CSP , NGC, DGX (Training) CSP , NGC, DGX (Training) TRT (Inference) TRT (Inference) GPU Accelerated Applications: Space Planning, Optimization, SAP Leonardo, SAP HANA Accelerated Analytics: Kinetica, MapD, Graphistry, H20

Pro Viz Deep Learning HPC GRA GRID

Loss Prevention, Shopper Tracking, Robotics, Frictionless Commerce

Video Analytics

Quality Inspection Consumer Engagement / Recommendation Engine NN Asst Planning / Forecast & Replenishment NN

slide-9
SLIDE 9

9

GPU’S PROVIDE BETTER DATA CENTER TCO

160 CPU Servers 65,000 Watts 1 NVIDIA HGX with 8 Tesla V100 GPU’s 3,000 Watts

1/6th the cost 1/20th the power, 4 racks in a box

slide-10
SLIDE 10

10

1980 1990 2000 2010 2020

RISE OF GPU-ENABLED COMPUTING

GPU-Computing perf 1.5X per year 1000X By 2025

Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp

102 103 104 105 106 107 Single-threaded perf 1.5X per year 1.1X per year APPLICATIONS SYSTEMS ALGORITHMS CUDA ARCHITECTURE

slide-11
SLIDE 11

11

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

11

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

NVIDIA DEEP LEARNING EVERYWHERE, EVERY PLATFORM

TITAN X

PC Development

DGX-1

AI Supercomputing Optimized Deep Learning Software

TESLA

Servers in every shape and size

CLOUD

Everywhere

slide-12
SLIDE 12

12

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

12

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

PERFORMANCE FROM THE DATA CENTER

Graphics accelerated virtual desktops and applications

All devices have graphics Virtual machines also need a GPU

slide-13
SLIDE 13

13

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

NVIDIA GPUs EVERYWHERE

Industry standard servers 120+ Servers from more than 30 system vendors Hyper Converged Blade Servers Cloud offerings

Industry Standard Servers Hyper-Converged Infrastructure Blade Servers Public Cloud

slide-14
SLIDE 14

14

Inception Partners – AI Startups in Retail

slide-15
SLIDE 15

15

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

15

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

RESOURCES: GTC & DLI

GPU Technology Conference 2018

http://www.gputechconf.com/ Retail Breakfast to share best practices and lessons learned Selective Retail Business tracks highlighting AI success Deep dive hands-on sessions to experience AI Customer stories showing success using AI, ML, and DL

DLI WORKSHOPS

https://www.nvidia.com/en-us/deep-learning-ai/education/

slide-16
SLIDE 16
slide-17
SLIDE 17

October 2017

RETAIL CUSTOMER STORIES

slide-18
SLIDE 18

18

USPS delivers more than 150 billion pieces of mail each year, a logistics operation that is at a scale second to none. After experiencing increased delays and instances of fraud, USPS needed a different approach to data analytics. Using Kinetica’s GPU-accelerated solution, USPS achieved near-immediate analysis of data from over 213,000 scanning devices at post offices and processing facilities around the country. Last year, USPS delivered 154 billion pieces of mail, while driving 70 million fewer miles, saving 7 million gallons of fuel and preventing 70,000 tons of carbon emissions.

slide-19
SLIDE 19

19

RETAIL INVENTORY MANAGEMENT

Safety Stock

Optimization of safety stock for each store/item Home grown algorithm ported from CPU cluster to GPU Time required dropped from hundreds of days on a single CPU node to a few hours on a single GPU (x4) node. Speed up of approximately 700x

Time Forecasting models

Hundreds of millions store/item combinations forecast weekly – Multiple models utilized to forecast including Holt-Winter, Arima and GLM. NVIDIA provided a Holt-Winter GPU version specified and integrated by customer. Comparative tests of 8 million store/items showed reduction in time from 15 minutes (across

  • approx. 38 servers) to 24 seconds on 1 GPU (x4) node.

GPU version could allow a daily forecast because of speed and scaling abilities.

slide-20
SLIDE 20

20

AI IMPROVES THE CUSTOMER EXPERIENCE

AI is dramatically changing the online shopping experience with tangible improvements to retailers and consumers. In 2016 online British grocery giant Ocado improved customer service with their AI- enhanced contact center, and is applying machine learning and NVIDIA GPUs to develop humanoid robotics to assist maintenance technicians, and advanced computer vision for image classification and recognition to replace barcode systems. Computer vision will expedite the picking process and better ensure orders are filled correctly so customers receive exactly what they ordered.

slide-21
SLIDE 21

21

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

21

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

AI-DRIVEN SMART SHOPPING

According to Forrester E-Commerce was a $390B market in 2016 and is expected to double by 2024. E-commerce company Jet.com (acquired by Walmart) partners with multitudes of suppliers with different

  • fferings at different prices. Jet uses GPU-

accelerated AI to drive its smart cart solution that fulfills orders at the lowest prices though the smart bundling of supplier offers. The platform finds the ideal merchant and warehouse combination to lower the total order cost. The bigger the shopping cart, the greater the savings that can be generated.

slide-22
SLIDE 22

22

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

22

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

DISCOVER MORE WITH DEEP LEARNING

Online shopping can be convenient but searching through multiple websites can be arduous and time-consuming. Pinterest makes it easy for users to quickly discover things they love. Automatic object detection lets users search for products within a Pin’s image, and Shop the Look lets users buy items seen in fashion and home décor Pins. Scientists on Pinterest’s visual search team use GPU-accelerated deep learning to teach their system to recognize image features using a dataset of billions of Pins and compute similarity scores to identify the best

  • matches. One visual search study reports a

50% improvement in user engagement and traffic.

slide-23
SLIDE 23

23

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

23

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

AI TOOL LETS YOU APPLY BEFORE YOU BUY

Testing different types of makeup can take hours and be a frustrating experience. ModiFace is using GPUs and facial modeling technology to help consumers explore and select the ideal products. ModiFace developed the ‘Sephora Virtual Artist’, an

  • nline tool that allows consumers to virtually

experiment with new makeup without having to leave their computer screen. With technology on skin analysis and facial visualization, ModiFace and its AI features have introduced a more efficient way to style oneself.

slide-24
SLIDE 24

24

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

24

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

AI PERSONALIZES SKIN CARE

Using the wrong skincare products can be a major cause of customer dissatisfaction so Olay is arming women with the knowledge they need to make informed product purchase decisions. Its Olay Skin Advisor is a GPU-accelerated AI tool that works on any mobile device — users provide a selfie, information about age, skin issues, skin type and product preferences, and the tool advises how to improve trouble areas using a daily regime of recommended Olay products. After four weeks 94% of women who tried the skin advisor continued to use the products it recommended.

slide-25
SLIDE 25

25

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

25

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

REINVENTING RETAIL BY COMBINING ART AND AI

In fashion, styles change quickly but the fundamental customer experience —brick-and-mortar stores and traditional online shopping sites— hasn’t changed much in the past decade. Stitch Fix broke that mold with a fashion styling service that combines the art of personal styling with data analytics insights powered by GPU-accelerated deep

  • learning. Stitch Fix’s 50+ style recommendation algorithms

match clothing and accessories to clients based on their unique style preferences. Most recently, Stitch Fix changed the game again with a deep learning image recognition system that locates fashion items for clients based their shared Pinterest boards.

slide-26
SLIDE 26

26

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

26

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

We depend on a safe cyberspace for just about every aspect of our lives. Cyber attacks can be devastating, and in today’s world mutations have become the rule not the exception. Cylance leverages GPU-driven deep learning to predict and prevent malicious code execution by identifying indicators of an attack. CylancePROTECT immediately prevented the execution of the May 2017 WannaCry attack on 100%

  • f its customers’ endpoints.

REDEFINING CYBERSECURITY WITH AI

slide-27
SLIDE 27

27

AI TOOL BOOSTS CUSTOMER SERVICE

KLM’s 235 social media service agents engage in 15K conversations a week, 24/7. To contend with the

  • verwhelming volume of messages, KLM uses GPU-

accelerated deep learning to predict the best response to an incoming message and shows it to a contact center agent for approval or personalization before sending it to the customer. The resulting time savings for KLM service agents means they can focus on customers with more pressing needs and handle a greater volume of questions while still maintaining a high degree of customer satisfaction.

slide-28
SLIDE 28

28

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

28

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

A NEW WAVE OF AI BUSINESS APPLICATIONS

Many brands rely on sponsoring televised events, yet impact is difficult to track. Manual tracking takes up to six weeks to measure ROI and even longer to adjust

  • expenditures. SAP Brand Impact, powered by NVIDIA

deep learning, measures brand attributes in near real- time with superhuman accuracy thanks to deep neural networks trained on NVIDIA DGX-1 and TensorRT to provide video inference analysis. Results are immediate, accurate and auditable, and delivered in a day.

slide-29
SLIDE 29

29

A NEW WAVE OF AI BUSINESS APPLICATIONS

Brand impact measurement on televised events in real time vs 6 weeks. Immediate, accurate and auditable, delivered in a day Brand Impact, Service Ticketing, Invoice-to-Record applications

slide-30
SLIDE 30

30

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

30

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

According to the EPA 62% of the U.S.'s electricity is consumed by the commercial and industrial

  • segments. But how much of that consumption is

inefficient? Verdigris is on a mission to help businesses eliminate wasteful energy spend with their Smart Building optimization solutions. Verdigris is harnessing the power of data and GPU- driven deep learning to continually audit and analyze electronic signatures of individual devices to learn what's "normal" and identify patterns and instances of energy waste. And with real-time monitoring and alerts, response teams can react to solve problems immediately.

BETTER DATA, SMARTER BUILDINGS

slide-31
SLIDE 31

31

CONSUMER MONITORING

Standard Traffic counters measure traffic into and out

  • f the store. Computer Vision offers enhancements by

providing:

  • Unique Identity Detection – integrated into loyalty

program where appropriate or available

  • Age / Ethnicity segmentation. Detect age groups,

including children, seniors.

  • Shopping behavior tracking. Groups, couples,

individuals

  • Traffic patterns. Identify Path to Purchase, hot

zones, cold zones, dwell points. Helps retailers make decisions on item placement or promotions

  • Can integrate into app-based recommendation
  • logic. Ability to launch targeted promotions based
  • n proximity, past purchase, and consumer profile

Multiple camera signals can be stitched together to detect patterns within the store. Exterior cameras can determine shopper density based

  • n parking. Origin tracking can identify external

traffic sources, and/or co-marketing opportunities

slide-32
SLIDE 32

32

YouTube Link: https://youtu.be/AMDiR61f86Y

Warehouse / Distribution Center Optimization

Warehouse and DC’s are not built for consumer traffic and can pose health and safety challenges for workers. During peak season, shelves fill up and working space is reduced to a minimum, making it harder for humans to navigate safely and accurately measure inventory. IFM uses NVIDIA Jetson technology mounted on a drone to autonomously monitor inventory positions in the DC or Warehouse. As an Inception partner, IFM is closely aligned with NVIDIA and is poised to deliver incredible impact on retail and supply chain business processes

slide-33
SLIDE 33

33

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

33

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

YouTube Link: https://youtu.be/l7NPmJP462M

Shelf Scanning Robotics

Store Associates are representatives of the brand, and the face of the retail organization. It makes sense to reduce the time spent performing tasks that are not consumer-facing. Performing inventory counts, replacing misplaced items,

  • r scanning for out-of-stock situations are examples of

basic, repetitive, and non-impactful operations for store associates. Fellow Robots has created a solution to scan shelves, monitor misplaced items, and act as a wayfinder kiosk for consumers. This allows associates to interact with the shopping public, improving consumer satisfaction and raising revenue through larger shopping baskets. As an Inception partner, Fellow is closely aligned with NVIDIA and is poised to deliver incredible impact on retail business processes

slide-34
SLIDE 34

34

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

34

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

TRAFFIC PATTERNS LOSS PREVENTION

Using existing cameras, a retailer can install highly effective computer vision algorithms to detect shopper traffic patterns and prevent loss. In the US, LP is a $48B problem impacting all retailers. At the same time, investment in LP staff is flat of shrinking. While average cost of shoplifting incident is doubling to $798, 30% of inventory shrinkage is an inside job. Using computer vision can identify theft, shrinkage, and shoplifting incidents. This new technology can invigorate a longstanding problem for retail.

slide-35
SLIDE 35

35

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

35

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

COLLABORATIVE DESIGN

Photorealistic Models Interactive Physics Design Flow Integration Collaboration

slide-36
SLIDE 36
slide-37
SLIDE 37

ART OF THE POSSIBLE

Paul Hendricks Solutions Architect phendricks@nvidia.com

The State of AI in Retail

slide-38
SLIDE 38

38

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

38

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

  • Paul Hendricks is a Solutions Architect at NVIDIA, helping

enterprise customers with their deep learning and AI initiatives

  • Paul's background is primarily in retail, and has spent the

past 5 years working with many Fortune 500 retail companies to implement data science and AI solutions.

  • Prior to joining NVIDIA, Paul worked at Victoria’s Secret as a

Data Scientist building models to understand customer propensity to purchase and how to optimize assortment in stores.

  • Currently, Paul's research at NVIDIA focuses on using deep

learning in intelligent video analytics and recommendation systems.

INTRODUCTION

slide-39
SLIDE 39

39

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

39

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

  • Paul Hendricks is a Solutions Architect at NVIDIA, helping

enterprise customers with their deep learning and AI initiatives

  • Paul's background is primarily in retail, and has spent the

past 5 years working with many Fortune 500 retail companies to implement data science and AI solutions.

  • Prior to joining NVIDIA, Paul worked at Victoria’s Secret as a

Data Scientist building models to understand customer propensity to purchase and how to optimize assortment in stores.

  • Currently, Paul's research at NVIDIA focuses on using deep

learning in intelligent video analytics and recommendation systems.

INTRODUCTION

slide-40
SLIDE 40

40

Intelligent Video Analytics

slide-41
SLIDE 41

41

Object Detection

  • Data: Images
  • Goal: Identify objects in an image, and output bounding boxes around the objects and their classes

Problem Background

slide-42
SLIDE 42

42

Frictionless Checkout

https://www.standardcognition.com/

slide-43
SLIDE 43

43

Localizing Algorithms

Sliding windows

  • If one of the windows only has half of the dog, the

activation may not be strong enough

  • Using small windows and small strides will be very

computationally intensive

slide-44
SLIDE 44

44

Localizing Algorithms

Sliding windows

  • If one of the windows only has half of the dog, the

activation may not be strong enough

  • Using small windows and small strides will be very

computationally intensive

Fully convolutional neural network

  • Since convolutions are basically sliding windows,

we can try replacing the fully connected layers with convolutional layers

  • Bounding boxes generated are not very accurate
slide-45
SLIDE 45

45

Localizing Algorithms

Sliding windows

  • If one of the windows only has half of the dog, the

activation may not be strong enough

  • Using small windows and small strides will be very

computationally intensive

Fully convolutional neural network

  • Since convolutions are basically sliding windows,

we can try replacing the fully connected layers with convolutional layers

  • Bounding boxes generated are not very accurate

Region proposals

  • Selects blob-like structures and proposes these as

the regions to be passed into a CNN

  • This concept is similar to sliding window
slide-46
SLIDE 46

46

Localizing Algorithms

Sliding windows

  • If one of the windows only has half of the dog, the

activation may not be strong enough

  • Using small windows and small strides will be very

computationally intensive

Fully convolutional neural network

  • Since convolutions are basically sliding windows,

we can try replacing the fully connected layers with convolutional layers

  • Bounding boxes generated are not very accurate

Region proposals

  • Selects blob-like structures and proposes these as

the regions to be passed into a CNN

  • This concept is similar to sliding window

Single shot detection

  • This algorithm predicts the coordinates of the

bounding boxes as well as the class of the objects

  • Fast since model looks at image once - YOLO
slide-47
SLIDE 47

47

Getting Started

DLI Courses

  • Object Detection with DIGITS - https://nvlabs.qwiklab.com/focuses/4125

Papers

  • Fully convolutional layers for semantic segmentation - https://arxiv.org/pdf/1605.06211.pdf
  • Rich hierarchies for accurate object detection and semantic segmentation - https://arxiv.org/pdf/1311.2524.pdf
  • Fast R-CNN - https://arxiv.org/pdf/1504.08083.pdf
  • Faster R-CNN: Towards real-time object detection with region proposal networks - https://arxiv.org/pdf/1506.01497.pdf
  • Yolo 9000: Better, Faster, Stronger - https://arxiv.org/pdf/1612.08242.pdf

Libraries

  • https://github.com/pjreddie/darknet
  • https://github.com/tensorflow/models/tree/master/research/object_detection

Datasets

  • COCO - http://cocodataset.org/
  • ImageNet - https://www.kaggle.com/c/imagenet-object-detection-challenge
slide-48
SLIDE 48

48

Anomaly Detection

slide-49
SLIDE 49

49

Anomaly Detection

  • Data: Image, sensor data (time series), text data
  • Goal: Detect if the data being generated is anomalous

Problem Background

slide-50
SLIDE 50

50

Anomaly Detection

  • Data: Image, sensor data (time series), text data
  • Goal: Detect if the data being generated is anomalous

Problem Background

slide-51
SLIDE 51

51

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

51

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

UNSUPERVISED LEARNING

Deep Autoencoder Network

Input layer

Size of data vector

Bottleneck layer

Summarized representation

▪ ‘embedding’ Output layer

Same dimensionality as input

Reconstruction error

High errors indicate potential anomaly

Anomaly detection using deep learning

Input: X Output: ෩

X

X − ෩ X

Reconstruction Error

slide-52
SLIDE 52

52

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

52

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

Time Series Signals

Split into sliding windows

Normalization and preprocessing

w 1 w 2 w 3 w N

… …

UNSUPERVISED LEARNING

DL anomaly detection in time series

slide-53
SLIDE 53

53

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

53

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

Input

UNSUPERVISED LEARNING

Detecting anomalies via reconstruction error

slide-54
SLIDE 54

54

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

54

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

Input

UNSUPERVISED LEARNING

Detecting anomalies via reconstruction error

slide-55
SLIDE 55

55

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

55

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

Output (Reconstruction) Input

UNSUPERVISED LEARNING

Detecting anomalies via reconstruction error

slide-56
SLIDE 56

56

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

56

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

Reconstruction error (RE) as a proxy to

  • utliers

Whenever RE is high, consider it a red flag

Threshold can be set using statistical bounds

Output (Reconstruction) Input Reconstruction vs Input

UNSUPERVISED LEARNING

Detecting anomalies via reconstruction error

slide-57
SLIDE 57

57

Getting Started

DLI Courses

  • Introduction to Autoencoders
  • Anomaly Detection with Variational Autoencoders - https://nvlabs.qwiklab.com/focuses/8362

Papers & Books

  • Autoencoders - https://papers.nips.cc/paper/798-autoencoders-minimum-description-length-and-helmholtz-free-energy.pdf
  • Deep Learning, Chapter 13 - http://a.co/1vbPNXr
  • Hands on Machine Learning with Scikit-Learn & TensorFlow, Chapter 13 - http://a.co/aImsrRT

Datasets

  • Fashion MNIST - https://github.com/zalandoresearch/fashion-mnist
  • Deep Fashion - http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html
  • UT Zappos 50k - http://vision.cs.utexas.edu/projects/finegrained/utzap50k/
slide-58
SLIDE 58

58

Recommendation Systems

slide-59
SLIDE 59

59

RECOMMENDATION SYSTEMS

  • Data: Matrix R [ rows are users, columns are items, cell values are ratings ]
  • Goal: Compute missing Values in R – top N unseen items are good recommendation candidates

Problem Background

X R

slide-60
SLIDE 60

60

MANY APPLICATIONS FROM SIMILAR PROBLEMS

Using autoencoders to generate recommendations

slide-61
SLIDE 61

61

MANY APPLICATIONS FROM SIMILAR PROBLEMS

Using autoencoders to generate recommendations

https://github.com/NVIDIA/DeepRecommender/

slide-62
SLIDE 62

62

MANY APPLICATIONS FROM SIMILAR PROBLEMS

Using autoencoders to generate recommendations

https://github.com/NVIDIA/DeepRecommender/

5 5 5 4 4 5 3 3 2 3 4 2 4 3 4 4 5 4

slide-63
SLIDE 63

63

Getting Started

DLI Courses

  • Deep Autoencoders for Recommender Systems

Papers

  • AutoRec – Autoencoders meet collaborative filtering - http://users.cecs.anu.edu.au/~u5098633/papers/www15.pdf
  • Training deep autoencoders for collaborative filtering- https://arxiv.org/pdf/1708.01715.pdf

Libraries

  • https://github.com/NVIDIA/DeepRecommender
  • https://github.com/geffy/tffm
  • https://github.com/apache/incubator-mxnet/tree/master/example/recommenders

Datasets

  • Netflix - https://netflixprize.com/
  • MovieLens – https://grouplens.org/datasets/movielens/
  • UC Irvine Online Retail Dataset - http://archive.ics.uci.edu/ml/datasets/online+retail
slide-64
SLIDE 64

64

NVIDIA Tools

slide-65
SLIDE 65

65

TESLA V100 32GB

WORLD’S MOST ADVANCED DATA CENTER GPU NOW WITH 2X THE MEMORY 5,120 CUDA cores 640 NEW Tensor cores 7.8 FP64 TFLOPS | 15.7 FP32 TFLOPS | 125 Tensor TFLOPS 20MB SM RF | 16MB Cache 32GB HBM2 @ 900GB/s | 300GB/s NVLink

slide-66
SLIDE 66

66

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

66

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FASTER RESULTS ON COMPLEX DL AND HPC

Up to 50% Faster Results With 2x The Memory

Unsupervised Image Translation Input winter photo AI converts it to summer

Dual E5-2698v4 server, 512GB DDR4, Ubuntu 16.04, CUDA9, cuDNN7| NMT is GNMT-like and run with TensorFlow NGC Container 18.01 (Batch Size= 128 (for 16GB) and 256 (for 32GB) | FFT is with cufftbench 1k x 1k x 1k and comparing 2 V100 16GB (DGX1V) vs. 2 V100 32GB (DGX1V)

Neural Machine Translation (NMT) 3D FFT 1k x 1k x 1k

1.5X Faster Calculations 1.5X Faster Language Translation

1.2 step/sec 0.8 step/sec 2.5TF 3.8TF

GAN Image to ImageGen

1024x1024 res images 512x512 res images

4X Higher resolution

Accuracy (16 layers) Accuracy (152 layers)

HIGHER ACCURACY HIGHER RESOLUTION FASTER RESULTS

R-CNN for object detection at 1080P with Caffe | V100 16GB uses VGG16| V100 32GB uses Resnet-152

V100 16GB V100 32GB VGG-16 RN-152 40% Lower Error Rate

GAN by NVRESEARCH (https://arxiv.org/pdf/1703.00848.pdf) | V100 16GB and V100 32GB with FP32

slide-67
SLIDE 67

67

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

67

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

NEW TENSOR CORE BUILT FOR AI

Delivering 120 TFLOPS of DL Performance TENSOR CORE

ALL MAJOR FRAMEWORKS VOLTA-OPTIMIZED cuDNN

MATRIX DATA OPTIMIZATION: Dense Matrix of Tensor Compute TENSOR-OP CONVERSION: FP32 to Tensor Op Data for Frameworks

TENSOR CORE

VOLTA TENSOR CORE

4x4 matrix processing array D[FP32] = A[FP16] * B[FP16] + C[FP32] Optimized For Deep Learning

slide-68
SLIDE 68

68

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

68

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

NVIDIA DGX

AI Supercomputer-in-a-Box

960 TFLOPS | 8x Tesla V100 16GB | NVLink Hybrid Cube Mesh 2x Xeon | 8 TB RAID 0 | Quad IB 100Gbps, Dual 10GbE | 3U — 3200W

slide-69
SLIDE 69

69

THE WORLD’S MOST POWERFUL AI SYSTEM FOR THE MOST COMPLEX AI CHALLENGES

  • DGX-2 is the newest addition to the DGX

family, powered by DGX software

  • Deliver accelerated AI-at-scale deployment

and simplified operations

  • Step up to DGX-2 for unrestricted model

parallelism and faster time-to-solution

INTRODUCING NVIDIA DGX-2

THE WORLD’S FIRST 2 PETAFLOPS SYSTEM

slide-70
SLIDE 70

70

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

70

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

10X PERFORMANCE GAIN LESS THAN A YEAR

DGX-1, SEP’17 DGX-2, Q3‘18

PyTorch Stack: Time to Train FAIRSEQ

software improvements across the stack including NCCL, cuDNN, etc.

5 10 15 DGX-1V DGX-2

15 days 1.5 days

slide-71
SLIDE 71

71

NVSWITCH

WORLD’S HIGHEST BANDWIDTH ON-NODE SWITCH 7.2 Terabits/sec or 900 GB/sec 18 NVLINK ports | 50GB/s per port bi-directional Fully-connected crossbar 2 billion transistors | 47.5mm x 47.5mm package

slide-72
SLIDE 72

72

NVSWITCH

ENABLES THE WORLD’S LARGEST GPU 16 Tesla V100 32GB Connected by New NVSwitch 2 petaFLOPS of DL Compute Unified 512GB HBM2 GPU Memory Space 300GB/sec Every GPU-to-GPU 2.4TB/sec of Total Cross-section Bandwidth

slide-73
SLIDE 73

73

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

73

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

CHALLENGES WITH DEEP LEARNING

Current DIY deep learning environments are complex and time consuming to build, test and maintain Requires high level of expertise to manage driver, library, framework dependencies Development of frameworks by the community is moving very fast

NVIDIA Libraries NVIDIA Docker NVIDIA Driver NVIDIA GPU Open Source Frameworks

slide-74
SLIDE 74

74

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

NVIDIA GPU CLOUD

Innovate in minutes, not weeks Removes all the DIY complexity of deep learning software integration Always up to date Monthly updates by NVIDIA to ensure maximum performance Deep learning across platforms Containers run locally on DGX Systems and TITAN PCs, or on cloud service provider GPU instances

Deep Learning Everywhere, For Everyone

NVIDIA GPU Cloud integrates GPU-optimized deep learning frameworks, runtimes, libraries, and OS into a ready-to-run container, available at no charge

slide-75
SLIDE 75

75

DEEP LEARNING ACROSS PLATFORMS

NVIDIA Volta or NVIDIA Pascal-powered TITAN GPU NVIDIA DGX-1 and DGX Station Amazon EC2 P3 instances with NVIDIA Volta

slide-76
SLIDE 76

76

KUBERNETES on NVIDIA GPUs

  • Scale-up Thousands of GPUs Instantly
  • Self-healing Cluster Orchestration
  • GPU Optimized Out-of-the-Box
  • Powered by NVIDIA Container Runtime
  • Included with Enterprise Support on DGX
  • Available end of April 2018

76

Container Orchestration for DL Training & Inference

NVIDIA GPUs AWS-EC2 | GCP | Azure | DGX NVIDIA CONTAINER RUNTIME KUBERNETES NVIDIA GPU CLOUD

slide-77
SLIDE 77

77

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

77

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

TENSORRT DEPLOYMENT WORKFLOW

TensorRT Optimizer TensorRT Runtime Engine Trained Neural Network

Step 1: Optimize trained model

Plan 1 Plan 2 Plan 3

Optimized Plans

Step 2: Deploy optimized plans with runtime

Embedded Automotive Data center

Import Model Serialize Engine

Plan 1 Plan 2 Plan 3

Optimized Plans De-serialize Engine Deploy Runtime

slide-78
SLIDE 78

78

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

78

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

TensorRT INTEGRATED WITH TensorFlow

Delivers 8x Faster Inference with TensorFlow + TRT

Available in TensorFlow 1.7

https://github.com/tensorflow

CPU: Skylake Gold 6140, 2.5GHz, Ubuntu 16.04; 18 CPU threads. Volta V100 SXM; CUDA (384.111; v9.0.176); Batch size: CPU=1, TF_GPU=2, TF-TRT=16 w/ latency=6ms

* Best CPU latency measured at 83 ms

11 325 2,657 500 1,000 1,500 2,000 2,500 3,000 Images/sec @ 7ms Latency ResNet-50 on TensorFlow

* CPU (FP32) V100 (TensorFlow, FP32) V100 Tensor Cores (TensorFlow+ TensorRT)

slide-79
SLIDE 79

79

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

79

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

NVIDIA TensorRT 4 RC NOW AVAILABLE

Maximize RNN and MLP Throughput

RNN and MLP Layers  ONNX Import  NVIDIA DRIVE Support

Free download to members of NVIDIA Developer Program developer.nvidia.com/tensorrt Optimize and Deploy ONNX Models

Easily import and accelerate inference for ONNX frameworks (PyTorch, Caffe 2, CNTK, MxNet and Chainer) Speed up speech, audio and recommender app inference performance through new layers and optimizations

0X 10X 20X 30X 40X 50X

CPU TensorRT

Recommendation Engine 45X Speedup

Deploy optimized deep learning inference models NVIDIA DRIVE Xavier

Support for NVIDIA DRIVE Xavier

slide-80
SLIDE 80

80

NVIDIA DIGITS

Interactive Deep Learning GPU Training System

developer.nvidia.com/digits

Interactive deep learning training application for engineers and data scientists

Simplify deep neural network training with an interactive interface to train and validate, and visualize results Built-in workflows for image classification, object detection and image segmentation Improve model accuracy with pre-trained models from the DIGITS Model Store Faster time to solution with multi-GPU acceleration

slide-81
SLIDE 81

81

OBJECT DETECTION IMAGE CLASSIFICATION

DIGITS DEEP LEARNING WORKFLOWS

Classify images into classes or categories Object of interest could be anywhere in the image Find instances of objects in an image Objects are identified with bounding boxes

98% Dog 2% Cat

Partition image into multiple regions Regions are classified at the pixel level

IMAGE SEGMENTATION

slide-82
SLIDE 82

82

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

82

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

WHAT’S NEW IN DIGITS 6?

TENSORFLOW SUPPORT NEW PRE-TRAINED MODELS

Train TensorFlow Models Interactively with DIGITS Image Classification: VGG-16, ResNet50 Object Detection: DetectNet

slide-83
SLIDE 83