how orange successfully deploys gpu infrastructure for ai
play

How Orange Successfully Deploys GPU Infrastructure for AI AI - PowerPoint PPT Presentation

How Orange Successfully Deploys GPU Infrastructure for AI AI WEBINAR Date/Time: Tuesday, June 23 | 9 am PST Whats next in technology and innovation? How Orange Successfully Deploys GPU Infrastructure for AI AI WEBINAR Presenter: Your


  1. How Orange Successfully Deploys GPU Infrastructure for AI AI WEBINAR Date/Time: Tuesday, June 23 | 9 am PST

  2. What’s next in technology and innovation? How Orange Successfully Deploys GPU Infrastructure for AI AI WEBINAR Presenter: Your Host: Stéphane Maillan Tom Leyden Orange AI Infrastructure VP Marketing

  3. How Orange Intend to deploy GPU Infrastructure for Data / IA S.Maillan

  4. SUMMARY About me q GPU q AI PHASES q FIRST CONSIDERATION q 2 Interne Orange

  5. About me 3 Interne Orange

  6. GPU / ACCELERATOR GPU : Very High parallel processing capability (limited memory) q CPU : High parallel processing capability (2TB memory) q FPGA : Very High parallel processing capability (programmable) q ASIC/AI chips : Extreme parallel processing capability q 4 Interne Orange

  7. GPU / ACCELERATOR & AI PHASES TRAINING : DATA + + + + + q INFERENCE : DATA + + / very low - Real Time response time q (ANALYTIC) : DATA + + + + + + + q 5 Interne Orange

  8. GPU / ACCELERATOR & AI PHASES 6 Interne Orange

  9. EXECUTING WORKLOAD : AT FIRST CODE q DATA q COMPUTING RESSOURCES q 7 Interne Orange

  10. RESSOURCES ADDRESSING Efficiently sharing GPU Dedicated : local GPU machines q Shared : Single server q Distributed : Cluster q 8 Interne Orange

  11. RESSOURCES ADDRESSING Parallel processing 9 Interne Orange

  12. PARALLEL RESSOURCES ADDRESSING Architecture 10 Interne Orange

  13. Composed / Disagregated - Distributed /Composable RDMA Fabric PCI Fabric NVSWITCH Fabric 11 Interne Orange

  14. Composed / Rack Appliance • Best In Class Extreme Low Latency • Best In Class High Bandwidth (300Gb/s) • Extreme Performance • Last DGX A100 allow all phase with GPU sharing / slicing capability ! • Acquisition Cost • Rack Scale • Proprietary Box NVSWITCH Fabric 12 Interne Orange

  15. Composed / Rack Appliance Last DGX A100 • All AI phase : GPU slicing capability ! • 1TB memory • PCI4 + AMD ROME • Mellanox ConnectX-6 • 1/10 the cost • 1/20 power • Acquisition Cost • Rack Scale • Proprietary Box NVSWITCH Fabric 13 Interne Orange

  16. Disagregated /Composable • Extreme Low Latency • High Bandwidth • Composable PCI • « Local » Framework • PCI Fabric Cloud Compliant • No CPU and RAM composable • Proprietary Hardware • Proprietary Soft 14 Interne Orange

  17. Disagregated / Distributed /Composable • Low Latency • High Bandwidth • Commodity Hardware • Cover All Use Case RDMA Fabric • Composable Storage • Distributed Framework • DC Scale • Cloud Compliant • Distributed Framework • Latency 15 Interne Orange

  18. RDMA 16 Interne Orange

  19. Disagregated / Distributed /Composable GPU DISAGREGATION 17 Interne Orange

  20. DATA ? Architecture 18 Interne Orange

  21. The FIRST Keys : SDS Data Disagregation : Low latency Software Defined Storage 19 Interne Orange

  22. DATA ? Software Defined Storage Promises No CPU/RAM Boottleneck Commodity Hardware SDS Progressive cost Full Scale up/out 20 Interne Orange

  23. DATA fabric ? In Network Computing Fabriq Interconnect CPU CPU High Bandwidth Low Latency OFFLOAD RDMA GPU GPU RDMA NVMe over Fabrics GPUDirect FPGA FPGA FPGA MPI R/CUDA Security SHARP PMEM PMEM IPSEC Offload TLS Offload NVME NVME NVME 21 In-Network Computing Key for Efficiency

  24. Distributing GPU Workload q GPU Scheduler is a key of efficiency 22 Interne Orange

  25. Distributing GPU Workload Interresting GPU Scheduler Run.ai q slurm q 23 Interne Orange

  26. Distributing GPU Workload feature GPU Réservation and Quota q GPU Job migration q 24 Interne Orange

  27. The way i feel it : 25 Interne Orange

  28. Disagregated / Distributed /Composable • Low Latency • High Bandwidth • Commodity Hardware • Cover All Use Case RDMA Fabric • Composable Storage • Distributed Framework • DC Scale • Cloud Compliant • Distributed Framework • Latency 26 Interne Orange

  29. Disagregated / Distributed /Composable SW Distributed HW PCIe ROCE 27 Interne Orange

  30. Distribution Layer/Composable rCUDA - http://www.rcuda.net/ • Remote CUDA - + • Distributed Ressources Pools • Limited to CUDA Calls • Performance & Efficiency • University project • Transparent usage (tbc) • Tensorflow support 28 Interne Orange

  31. rCUDA GPU DISAGREGATION 29 Interne Orange

  32. DATA fabric ? In Network Computing 30 Interne Orange

  33. Low Latency Software Defined Storage 31 Interne Orange

  34. Low Latency Software Defined Storage Imbetable performance GPU Direct Storage Imbetable performance API Flexibility Scale up/out Transport: + 5µs Protection Levels Disk based Licensing Model Volume Latency 40µs-300µs Financial Efficiency RDDA : 0% CPU sur les server de Stockage …. !!! RDMA & TCP RAID 0 / 1 / 10 / Erasure coding 32 Interne Orange

  35. GPUDirect Storage 33 Interne Orange

  36. THANKS

  37. Thank you!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend