How Orange Successfully Deploys GPU Infrastructure for AI AI - - PowerPoint PPT Presentation

how orange successfully deploys gpu infrastructure for ai
SMART_READER_LITE
LIVE PREVIEW

How Orange Successfully Deploys GPU Infrastructure for AI AI - - PowerPoint PPT Presentation

How Orange Successfully Deploys GPU Infrastructure for AI AI WEBINAR Date/Time: Tuesday, June 23 | 9 am PST Whats next in technology and innovation? How Orange Successfully Deploys GPU Infrastructure for AI AI WEBINAR Presenter: Your


slide-1
SLIDE 1

AI WEBINAR

Date/Time: Tuesday, June 23 | 9 am PST

How Orange Successfully Deploys GPU Infrastructure for AI

slide-2
SLIDE 2

Presenter: Stéphane Maillan

Orange AI Infrastructure

Your Host: Tom Leyden VP Marketing AI WEBINAR

What’s next in technology and innovation?

How Orange Successfully Deploys GPU Infrastructure for AI

slide-3
SLIDE 3

How Orange Intend to deploy GPU Infrastructure for Data / IA

S.Maillan

slide-4
SLIDE 4

2 Interne Orange

q

About me

q

GPU

q

AI PHASES

q

FIRST CONSIDERATION

SUMMARY

slide-5
SLIDE 5

3 Interne Orange

About me

slide-6
SLIDE 6

4 Interne Orange

q

GPU : Very High parallel processing capability (limited memory)

q

CPU : High parallel processing capability (2TB memory)

q

FPGA : Very High parallel processing capability (programmable)

q

ASIC/AI chips : Extreme parallel processing capability

GPU / ACCELERATOR

slide-7
SLIDE 7

5 Interne Orange

q

TRAINING : DATA + + + + +

q

INFERENCE : DATA + + / very low - Real Time response time

q

(ANALYTIC) : DATA + + + + + + +

GPU / ACCELERATOR & AI PHASES

slide-8
SLIDE 8

6 Interne Orange

GPU / ACCELERATOR & AI PHASES

slide-9
SLIDE 9

7 Interne Orange

EXECUTING WORKLOAD : AT FIRST

q

CODE

q

DATA

q

COMPUTING RESSOURCES

slide-10
SLIDE 10

8 Interne Orange

RESSOURCES ADDRESSING

Efficiently sharing GPU

q

Dedicated : local GPU machines

q

Shared : Single server

q

Distributed : Cluster

slide-11
SLIDE 11

9 Interne Orange

RESSOURCES ADDRESSING

Parallel processing

slide-12
SLIDE 12

10 Interne Orange

PARALLEL RESSOURCES ADDRESSING

Architecture

slide-13
SLIDE 13

11 Interne Orange

Composed / Disagregated - Distributed /Composable

PCI Fabric NVSWITCH Fabric RDMA Fabric

slide-14
SLIDE 14

12 Interne Orange

Composed / Rack Appliance

NVSWITCH Fabric

  • Best In Class Extreme Low Latency
  • Best In Class High Bandwidth (300Gb/s)
  • Extreme Performance
  • Last DGX A100 allow all phase with GPU

sharing / slicing capability !

  • Acquisition Cost
  • Rack Scale
  • Proprietary Box
slide-15
SLIDE 15

13 Interne Orange

Composed / Rack Appliance

NVSWITCH Fabric

Last DGX A100

  • All AI phase : GPU slicing capability !
  • 1TB memory
  • PCI4 + AMD ROME
  • Mellanox ConnectX-6
  • 1/10 the cost
  • 1/20 power
  • Acquisition Cost
  • Rack Scale
  • Proprietary Box
slide-16
SLIDE 16

14 Interne Orange

Disagregated /Composable

PCI Fabric

  • Extreme Low Latency
  • High Bandwidth
  • Composable PCI
  • « Local » Framework
  • Cloud Compliant
  • No CPU and RAM composable
  • Proprietary Hardware
  • Proprietary Soft
slide-17
SLIDE 17

15 Interne Orange

RDMA Fabric

  • Low Latency
  • High Bandwidth
  • Commodity Hardware
  • Cover All Use Case
  • Composable Storage
  • Distributed Framework
  • DC Scale
  • Cloud Compliant
  • Distributed Framework
  • Latency

Disagregated / Distributed /Composable

slide-18
SLIDE 18

16 Interne Orange

RDMA

slide-19
SLIDE 19

17 Interne Orange

Disagregated / Distributed /Composable

GPU DISAGREGATION

slide-20
SLIDE 20

18 Interne Orange

DATA ?

Architecture

slide-21
SLIDE 21

19 Interne Orange

The FIRST Keys : SDS

Data Disagregation : Low latency Software Defined Storage

slide-22
SLIDE 22

20 Interne Orange

SDS

Commodity Hardware No CPU/RAM Boottleneck Progressive cost Full Scale up/out

Promises

DATA ?

Software Defined Storage

slide-23
SLIDE 23

21

NVME

CPU GPU FPGA PMEM NVME CPU GPU FPGA PMEM NVME

RDMA High Bandwidth Low Latency SHARP FPGA TLS Offload

Security

GPUDirect IPSEC Offload NVMe over Fabrics

Fabriq Interconnect

MPI R/CUDA

OFFLOAD

RDMA

In-Network Computing Key for Efficiency

DATA fabric ?

In Network Computing

slide-24
SLIDE 24

22 Interne Orange

Distributing GPU Workload

q GPU Scheduler is a key of efficiency

slide-25
SLIDE 25

23 Interne Orange

Distributing GPU Workload

Interresting GPU Scheduler

q

Run.ai

q

slurm

slide-26
SLIDE 26

24 Interne Orange

Distributing GPU Workload

feature

q

GPU Réservation and Quota

q

GPU Job migration

slide-27
SLIDE 27

25 Interne Orange

The way i feel it :

slide-28
SLIDE 28

26 Interne Orange

RDMA Fabric

  • Low Latency
  • High Bandwidth
  • Commodity Hardware
  • Cover All Use Case
  • Composable Storage
  • Distributed Framework
  • DC Scale
  • Cloud Compliant
  • Distributed Framework
  • Latency

Disagregated / Distributed /Composable

slide-29
SLIDE 29

27 Interne Orange

Disagregated / Distributed /Composable

PCIe ROCE Distributed

HW SW

slide-30
SLIDE 30

28 Interne Orange

rCUDA - http://www.rcuda.net/

  • Remote CUDA
  • Limited to CUDA Calls
  • University project

+

  • Distributed Ressources Pools
  • Performance & Efficiency
  • Transparent usage (tbc)
  • Tensorflow support

Distribution Layer/Composable

slide-31
SLIDE 31

29 Interne Orange

rCUDA

GPU DISAGREGATION

slide-32
SLIDE 32

30 Interne Orange

DATA fabric ?

In Network Computing

slide-33
SLIDE 33

31 Interne Orange

Low Latency Software Defined Storage

slide-34
SLIDE 34

32 Interne Orange

GPU Direct Storage Imbetable performance Imbetable performance API Transport: + 5µs Protection Levels Flexibility Financial Efficiency Scale up/out RAID 0 / 1 / 10 / Erasure coding Volume Latency 40µs-300µs RDDA : 0% CPU sur les server de Stockage …. !!! RDMA & TCP Disk based Licensing Model

Low Latency Software Defined Storage

slide-35
SLIDE 35

33 Interne Orange

GPUDirect Storage

slide-36
SLIDE 36

THANKS

slide-37
SLIDE 37

Thank you!