Use Tesla to provide first GPU VM Service in China Feng Zhu - - PowerPoint PPT Presentation

use tesla to provide first gpu vm service in china
SMART_READER_LITE
LIVE PREVIEW

Use Tesla to provide first GPU VM Service in China Feng Zhu - - PowerPoint PPT Presentation

Use Tesla to provide first GPU VM Service in China Feng Zhu Outline UCloud Introduction K80 GPU VM P40 GPU VM UCloud GPU PaaS Service: UAI-Service UCloud GPU ecosystem 2 About UCloud Top 3


slide-1
SLIDE 1

专注 • 服务 • 中立

Use Tesla to provide first GPU VM Service in China

Feng Zhu

slide-2
SLIDE 2

Outline

  • UCloud Introduction
  • K80 GPU VM
  • P40 GPU VM
  • UCloud GPU PaaS Service: UAI-Service
  • UCloud GPU ecosystem

2

slide-3
SLIDE 3

About UCloud

  • Top 3 IaaS Provider in China
  • Found in 2012
  • HQ in Shanghai
  • Served 50,000+ Enterprise

3

slide-4
SLIDE 4

Data Centers

4 LA DC Frankfurt SG Bangkok Seoul HK SH1 ZJ GZ BJ1 TW BJ2 SH2

14 Global Regions

slide-5
SLIDE 5

5

  • !!
  • "

#$ # % &"

  • '
  • ()
  • %

%# %#

  • %

" %$! !) !

  • "!
  • '

*$ *$ " +!!

UCloud Product Line

slide-6
SLIDE 6

GPU Timeline

6

2012 UCloud founded 2015.11 K80 GPU VM 2016.2 K80 GPU Physical Machine 2017.5 P40 GPU VM 2017.? P40 GPU Physical Machine

slide-7
SLIDE 7

GPU Decision: Virtualization

PCI Pass through Grid ! $"' ',$$" )! ! !!)! '!((-./01 ) !$)' $! ! √ 2

slide-8
SLIDE 8

VM Advantage

  • Flexibility for VM configuration
  • CPU、Memory、Disk size、GPU number are all flexible
  • SDN network flexible
  • Main OS all supported, Win/Linux
  • CentOS 6.5/CentOS 7.0/Ubuntu 14.04/Ubuntu 12.04/Gentoo 2.2/Win 2008/Win 2012
  • Fast Deployment
  • Based on self-defined image, can deploy 1000 VMs in 1 minute
slide-9
SLIDE 9

VM Performance Degrade

  • Using Pass-through Technology, almost no degradation

Degradation Virtualization Bare Metal ! 33456 7..6 !, 3-6 7..6

384..6 3-4..6 334..6 7..4..6 7.74..6

  • $9!

"$

slide-10
SLIDE 10

UCloud GPU Virtualization – DL test

10

  • Caffe Performance (Ubuntu)

Cases iters GPU(secs) CPU(secs) Speedup !"+ 7.... 0:543 3..45 3.5 )7. :... 58;40 03574. 7.8 #!*!! )$ !! 7.... 575-43 78<5;4- 5.6 +!! :.... 0:-;40

  • 35847

3.5 . :... 7.... 7:... 0....

slide-11
SLIDE 11

UCloud GPU Virtualization – DL test(2)

11

  • Theano/Keras (Ubuntu)
  • => =>

Speedup ?$4 0.... ;8 053 5.1 ?!!4 0.... 77 :<5 51.2 ?!!?$4 0.... 08 05< 8.7 !?!!4 ;:... : 50 6.4 )7.?!!4 :.... 73- 0<8. 13.5 ?!!4 3:. 3 00 2.4 !?!!4 <.... 05 7550 57.9 !?!!4 <.... 08. ;:7 1.7 !?$4 <.... 7 : 5.0

slide-12
SLIDE 12

K80 Physical Machine

12

Hardware Specification

  • +$(-.
  • !$: 0<5.10

" 730

  • 0+

! 7.1;

slide-13
SLIDE 13

VM Configuration - K80

13 VM GPU GPU VM GPU ;

  • 7<
  • 7<

50 <;

CPU Memory Disk

7..* 7+

slide-14
SLIDE 14

Flexible VM Save Cost

Configuration Fixed Flexible CPU 7< ; Memory 3<

  • Disk

7+ 7.. GPU 7 7 Price ;@5..A&!, 0@0..A&!, USD Price B<7:& B57:&

14

. 0... ;... <...

  • ...

7.... 7 #$C$ #C

slide-15
SLIDE 15

GPU VM Features

15 VM VPC Image Snapshot DataArk Hotfix Re scale

Networking Self-defined images, deploy 1000 VMs in 1 min Data backup 24 continuous data protection, call rollback to any second Kernel patch without system shutdown Resize CPU/Memory/Disk anytime

slide-16
SLIDE 16

Storage Solution

16 VM Disk UDisk UFS UFile UArchive

Local SSD disk NAS, no limit on device numbers NFS file system Object storage Low cost cloud archive

slide-17
SLIDE 17

Create GPU VM

17

slide-18
SLIDE 18

Create GPU VM

18

slide-19
SLIDE 19

Create GPU VM

19

slide-20
SLIDE 20

P40 Physical Machine

20

Hardware Specification

  • +$;.1;
  • !$: 0<:.10

" 0:<

  • 5+

! 7.1;

slide-21
SLIDE 21

VM Configuration - P40

21 VM GPU GPU GPU VM GPU GPU VM GPU VM GPU GPU GPU GPU ;

  • 7<

0; 50

  • 7<

50 <; 3< 70-

CPU Memory Disk

7..* 7+

slide-22
SLIDE 22

P40 Price

Configuration Spec 1 Spec 2 CPU ; 50 Memory

  • 70-

Disk 7.. 7+ GPU 7 ; Price ;@...A&!, 7-@0..A&!, USD Price B:8.& B0@<..&

22

. :... 7.... 7:... 0.... "! "C (-. ;.

slide-23
SLIDE 23

UAI-Service Overview

23

(& +!#$ ( "2 +!' % !) "$ +!! D!$!' % Resources #%

slide-24
SLIDE 24

Distributed Training Layout

24

Resources

  • Storage

(& +!#$ ( +!' "2 EF !)! $ '$ Features &7& & &$ &$

slide-25
SLIDE 25

Distributed Training Process

25

Resources

  • Storage

&7& & & &$

  • (&

$ +!' + 74$ 04+E'$ 54$ " )

slide-26
SLIDE 26

Online Inference Layout

26

Resources FPGA GPU CPU User SDK/Web TensorFlow Keras Test Env MXNet Eval Deploy Running Online Inference System Storage /task1/code /data /ckpt /log Docker Images

slide-27
SLIDE 27

Online Inference Process

27

Storage /task1/code /ckpt ULB Docker Docker Docker SDK/Web Deploy Test Env Tester 1.Upload 2.Test & Eval 3.Deploy Docker Perf Docker Resource

slide-28
SLIDE 28

Online Inference API/SDK

28

ULB Service Docker Deploy Docker Resource AB Test Scalable Perf report Model update Rollback Service Docker Service Docker User

slide-29
SLIDE 29

GPU Scenario

HPC Rendering Deep Learning Gene Sequencing Weather Forecasting Picture\Film\ACG Rendering Online Rendering Simulator Advertisement CTR Face Recognition Voice Recognition Maya 3Dmax Unity

slide-30
SLIDE 30

GPU Scenario

Advertisement CTR Face Recognition Voice Recognition

Neural Network Model

Big Data Neural Network Model

Neural Network Model

User Input Output

Compute-Intensive ( ( ( (GPU) ) ) ) Compute-Sensitive ( ( ( (GPU) ) ) )

Training Online Service

slide-31
SLIDE 31

GPU Scenario: Example

  • CTR click through rate estimation
  • %'G! !$,$
  • %! E!! :+F $ 1
  • 、 !!:,!
slide-32
SLIDE 32

GPU Scenario: Example

  • CTR click through rate estimation

x=[Weekday=Wednesday, Gender=Male, City=Shanghai] x=[0,0,1,0,0,0,0 0,1 0,0,1,0…0]

CTR Estimate Model

Percent of Click: : : :25%

slide-33
SLIDE 33

Thank You

www.ucloud.cn

33