How Orange Successfully Deploys GPU Infrastructure for AI AI - PowerPoint PPT Presentation

How Orange Successfully Deploys GPU Infrastructure for AI AI WEBINAR Date/Time: Tuesday, June 23 | 9 am PST

What’s next in technology and innovation? How Orange Successfully Deploys GPU Infrastructure for AI AI WEBINAR Presenter: Your Host: Stéphane Maillan Tom Leyden Orange AI Infrastructure VP Marketing

How Orange Intend to deploy GPU Infrastructure for Data / IA S.Maillan

SUMMARY About me q GPU q AI PHASES q FIRST CONSIDERATION q 2 Interne Orange

About me 3 Interne Orange

GPU / ACCELERATOR GPU : Very High parallel processing capability (limited memory) q CPU : High parallel processing capability (2TB memory) q FPGA : Very High parallel processing capability (programmable) q ASIC/AI chips : Extreme parallel processing capability q 4 Interne Orange

GPU / ACCELERATOR & AI PHASES TRAINING : DATA + + + + + q INFERENCE : DATA + + / very low - Real Time response time q (ANALYTIC) : DATA + + + + + + + q 5 Interne Orange

GPU / ACCELERATOR & AI PHASES 6 Interne Orange

EXECUTING WORKLOAD : AT FIRST CODE q DATA q COMPUTING RESSOURCES q 7 Interne Orange

RESSOURCES ADDRESSING Efficiently sharing GPU Dedicated : local GPU machines q Shared : Single server q Distributed : Cluster q 8 Interne Orange

RESSOURCES ADDRESSING Parallel processing 9 Interne Orange

PARALLEL RESSOURCES ADDRESSING Architecture 10 Interne Orange

Composed / Disagregated - Distributed /Composable RDMA Fabric PCI Fabric NVSWITCH Fabric 11 Interne Orange

Composed / Rack Appliance • Best In Class Extreme Low Latency • Best In Class High Bandwidth (300Gb/s) • Extreme Performance • Last DGX A100 allow all phase with GPU sharing / slicing capability ! • Acquisition Cost • Rack Scale • Proprietary Box NVSWITCH Fabric 12 Interne Orange

Composed / Rack Appliance Last DGX A100 • All AI phase : GPU slicing capability ! • 1TB memory • PCI4 + AMD ROME • Mellanox ConnectX-6 • 1/10 the cost • 1/20 power • Acquisition Cost • Rack Scale • Proprietary Box NVSWITCH Fabric 13 Interne Orange

Disagregated /Composable • Extreme Low Latency • High Bandwidth • Composable PCI • « Local » Framework • PCI Fabric Cloud Compliant • No CPU and RAM composable • Proprietary Hardware • Proprietary Soft 14 Interne Orange

Disagregated / Distributed /Composable • Low Latency • High Bandwidth • Commodity Hardware • Cover All Use Case RDMA Fabric • Composable Storage • Distributed Framework • DC Scale • Cloud Compliant • Distributed Framework • Latency 15 Interne Orange

RDMA 16 Interne Orange

Disagregated / Distributed /Composable GPU DISAGREGATION 17 Interne Orange

DATA ? Architecture 18 Interne Orange

The FIRST Keys : SDS Data Disagregation : Low latency Software Defined Storage 19 Interne Orange

DATA ? Software Defined Storage Promises No CPU/RAM Boottleneck Commodity Hardware SDS Progressive cost Full Scale up/out 20 Interne Orange

DATA fabric ? In Network Computing Fabriq Interconnect CPU CPU High Bandwidth Low Latency OFFLOAD RDMA GPU GPU RDMA NVMe over Fabrics GPUDirect FPGA FPGA FPGA MPI R/CUDA Security SHARP PMEM PMEM IPSEC Offload TLS Offload NVME NVME NVME 21 In-Network Computing Key for Efficiency

Distributing GPU Workload q GPU Scheduler is a key of efficiency 22 Interne Orange

Distributing GPU Workload Interresting GPU Scheduler Run.ai q slurm q 23 Interne Orange

Distributing GPU Workload feature GPU Réservation and Quota q GPU Job migration q 24 Interne Orange

The way i feel it : 25 Interne Orange

Disagregated / Distributed /Composable • Low Latency • High Bandwidth • Commodity Hardware • Cover All Use Case RDMA Fabric • Composable Storage • Distributed Framework • DC Scale • Cloud Compliant • Distributed Framework • Latency 26 Interne Orange

Disagregated / Distributed /Composable SW Distributed HW PCIe ROCE 27 Interne Orange

Distribution Layer/Composable rCUDA - http://www.rcuda.net/ • Remote CUDA - + • Distributed Ressources Pools • Limited to CUDA Calls • Performance & Efficiency • University project • Transparent usage (tbc) • Tensorflow support 28 Interne Orange

rCUDA GPU DISAGREGATION 29 Interne Orange

DATA fabric ? In Network Computing 30 Interne Orange

Low Latency Software Defined Storage 31 Interne Orange

Low Latency Software Defined Storage Imbetable performance GPU Direct Storage Imbetable performance API Flexibility Scale up/out Transport: + 5µs Protection Levels Disk based Licensing Model Volume Latency 40µs-300µs Financial Efficiency RDDA : 0% CPU sur les server de Stockage …. !!! RDMA & TCP RAID 0 / 1 / 10 / Erasure coding 32 Interne Orange

GPUDirect Storage 33 Interne Orange

THANKS

Thank you!

How Orange Successfully Deploys GPU Infrastructure for AI AI - PowerPoint PPT Presentation

How Orange Successfully Deploys GPU Infrastructure for AI AI WEBINAR Date/Time: Tuesday, June 23 | 9 am PST Whats next in technology and innovation? How Orange Successfully Deploys GPU Infrastructure for AI AI WEBINAR Presenter: Your

Registrar of Voters Orange County Registrar of Voters Orange County Registrar of Voters Orange

Orange County Public Schools Leadership Orange VIII October 12, 2017 West Orange High School

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Sketch Model Presentation: Orange A Portable CNC Router 2.009 Fall 2013 Orange A | Our Product

GPU Virtualization, 5G & MEC: Making Cloud XR a Reality James Li Principal, Orange Next

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

West Orange Public Schools West Orange Public Schools MAP Information Night Thursday, May 16,

Orange Smart cities on the eve of breakthrough Gert Pauwels 1 Orange Restricted Why Orange as

Use Tesla to provide first GPU VM Service in China Feng Zhu

THEIA GPU Open Source multicore programmable GPU Problem Statement Develop an open source 3D

Performance Evaluation of a Multithreaded GPU Using CUDA GPU architecture GeForce 8800 GPU

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of

GPU Architecture and chitecture and GPU Ar The good The good The bad The bad

Kubernetes & AI with Run:AI, Red Hat & Excelero AI WEBINAR Date/Time: Tuesday, June 9 |

The Long and Short of Passwords Rich Shay November 5, 2009 1 / 34 The Long and Short of

Nieuw Leyden Alexander de Vries Director Nieuw Leyden Expertteam Self build Netherlands

OBJECTIVES 2 What is EndNote? EndNote X7 ( Installation & technical issues)

Leiden University Efficient Frequent Query Discovery in F ARMER Siegfried Nijssen and Joost N.

AND FIELD PLACEMENT PROGRAMS AT UCONN LAW SCHOOL 2015-16 Practice-Based Learning

Justifications and Wrong Judgements Giuseppe Primiero FWO - Research Foundation Flanders Centre

Internet Architecture WG : DoS-resistant Internet Subgroup Report Mark Handley University

How Orange Successfully Deploys GPU Infrastructure for AI AI - PowerPoint PPT Presentation

How Orange Successfully Deploys GPU Infrastructure for AI AI WEBINAR Date/Time: Tuesday, June 23 | 9 am PST Whats next in technology and innovation? How Orange Successfully Deploys GPU Infrastructure for AI AI WEBINAR Presenter: Your

Registrar of Voters Orange County Registrar of Voters Orange County Registrar of Voters Orange

Orange County Public Schools Leadership Orange VIII October 12, 2017 West Orange High School

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Sketch Model Presentation: Orange A Portable CNC Router 2.009 Fall 2013 Orange A | Our Product

GPU Virtualization, 5G &amp; MEC: Making Cloud XR a Reality James Li Principal, Orange Next

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

West Orange Public Schools West Orange Public Schools MAP Information Night Thursday, May 16,

Orange Smart cities on the eve of breakthrough Gert Pauwels 1 Orange Restricted Why Orange as

Use Tesla to provide first GPU VM Service in China Feng Zhu

THEIA GPU Open Source multicore programmable GPU Problem Statement Develop an open source 3D

Performance Evaluation of a Multithreaded GPU Using CUDA GPU architecture GeForce 8800 GPU

Super GPU &amp; Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of

GPU Architecture and chitecture and GPU Ar The good The good The bad The bad

Kubernetes &amp; AI with Run:AI, Red Hat &amp; Excelero AI WEBINAR Date/Time: Tuesday, June 9 |

The Long and Short of Passwords Rich Shay November 5, 2009 1 / 34 The Long and Short of

Nieuw Leyden Alexander de Vries Director Nieuw Leyden Expertteam Self build Netherlands

OBJECTIVES 2 What is EndNote? EndNote X7 ( Installation &amp; technical issues)

Leiden University Efficient Frequent Query Discovery in F ARMER Siegfried Nijssen and Joost N.

AND FIELD PLACEMENT PROGRAMS AT UCONN LAW SCHOOL 2015-16 Practice-Based Learning

Justifications and Wrong Judgements Giuseppe Primiero FWO - Research Foundation Flanders Centre

Internet Architecture WG : DoS-resistant Internet Subgroup Report Mark Handley University

GPU Virtualization, 5G & MEC: Making Cloud XR a Reality James Li Principal, Orange Next

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

Kubernetes & AI with Run:AI, Red Hat & Excelero AI WEBINAR Date/Time: Tuesday, June 9 |

OBJECTIVES 2 What is EndNote? EndNote X7 ( Installation & technical issues)