MARS RAPIDS GPU - PowerPoint PPT Presentation

当 MARS 遇上 RAPIDS ：使用 GPU 加速分布式海量数据处理的原理和实战阿里云智能秦续业何开圣

目录背景 Mars+RAPIDS 能做什么 CONTENT Mars+RAPIDS 怎么做性能和展望

机器学习生命周期特征工程 / 模型训练 { 新的数据 } Data 数据处理 / 模型部署 / 维训练的模型数据分析护 / 改进 { 预测 } 往往要占用 80% 的时间

Google 趋势（全球）

日益增长的数据科学技术栈

Data Engineer Data Scientist

Mars ： numpy 、 pandas 、 scikit-learn 的并行和分布式加速器，处理更多数据

Numpy Mars tensor 1 import numpy as np 1 import mars.tensor as mt 2 from scipy.special import erf 2 from mars.tensor.special import erf 3 3 4 4 5 def black_scholes(P, S, T, rate, vol): 5 def black_scholes(P, S, T, rate, vol): 6 a = np.log(P / S) 6 a = mt.log(P / S) 7 b = T * -rate 7 b = T * -rate 8 8 9 z = T * (vol * vol * 2) 9 z = T * (vol * vol * 2) 10 c = 0.25 * z 10 c = 0.25 * z 11 y = 1.0 / np.sqrt(z) 11 y = 1.0 / mt.sqrt(z) 12 12 13 w1 = (a - b + c) * y 13 w1 = (a - b + c) * y 14 w2 = (a - b - c) * y 14 w2 = (a - b - c) * y 15 15 运行时间： 5.48 s 16 d1 = 0.5 + 0.5 * erf(w1) 16 d1 = 0.5 + 0.5 * erf(w1) 运行时间： 11.9 s 峰值内存： 1647.85 17 d2 = 0.5 + 0.5 * erf(w2) 17 d2 = 0.5 + 0.5 * erf(w2) 峰值内存： 5479.47 MiB MiB 18 18 19 Se = np.exp(b) * S 19 Se = mt.exp(b) * S 20 20 21 call = P * d1 - Se * d2 21 call = P * d1 - Se * d2 22 put = call - P + Se 22 put = call - P + Se 23 23 24 return call, put 24 return call, put 25 25 26 26 27 N = 50000000 27 N = 50000000 28 price = np.random.uniform(10.0, 50.0, N) 28 price = mt.random.uniform(10.0, 50.0, N) 29 strike = np.random.uniform(10.0, 50.0, N) 29 strike = mt.random.uniform(10.0, 50.0, N) 30 t = np.random.uniform(1.0, 2.0, N) 30 t = mt.random.uniform(1.0, 2.0, N) 31 print (black_scholes(price, strike, t, 0.1, 0.2)) 31 print (mt.ExecutableTuple(black_scholes(price, 32 strike, t, 0.1, 0.2)).execute())

Pandas Mars DataFrame 1 import numpy as np 1 import mars.tensor as mt 2 import pandas as pd 2 import mars.dataframe as md 3 3 运行时间： 5.25 s 运行时间： 18.7 s 峰值内存： 2007.92 MiB 4 df = pd.DataFrame(np.random.rand(100000000, 4), 4 df = md.DataFrame(mt.random.rand(100000000, 4), 峰值内存： 3430.29 MiB 5 columns= list ('abcd')) 5 columns= list ('abcd')) 6 print (df. sum ()) 6 print (df. sum ().execute())

Scikit-learn Mars learn 1 from sklearn.datasets import make_blobs 1 from sklearn.datasets import make_blobs 2 from sklearn.decomposition.pca import PCA 2 from mars.learn.decomposition import PCA 3 3 4 X, y = make_blobs( 4 X, y = make_blobs( 5 n_samples=100000000, n_features=3, 5 n_samples=100000000, n_features=3, 6 centers=[[3, 3, 3], [0, 0, 0], 6 centers=[[3, 3, 3], [0, 0, 0], 运行时间： 19.1 s 运行时间： 12.8 s 7 [1, 1, 1], [2, 2, 2]], 7 [1, 1, 1], [2, 2, 2]], 峰值内存： 7314.82 MiB 峰值内存： 3814.32 MiB 8 cluster_std=[0.2, 0.1, 0.2, 0.2], 8 cluster_std=[0.2, 0.1, 0.2, 0.2], 9 random_state=9) 9 random_state=9) 10 pca = PCA(n_components=3) 10 pca = PCA(n_components=3) 11 pca.fit(X) 11 pca.fit(X) 12 print (pca.explained_variance_ratio_) 12 print (pca.explained_variance_ratio_.execute()) 13 print (pca.explained_variance_) 13 print (pca.explained_variance_.execute())

机器学习生命周期支持 GPU 加速特征工程 / 模型训练 GPU??? { 新的数据 } Data 数据处理 / 模型部署 / 维训练的模型数据分析护 / 改进 { 预测 } 往往要占用 80% 的支持 GPU 加速时间

Numpy Cupy In [2]: import cupy as cp In [1]: import numpy as np In [4]: %%time In [5]: %%time ...: a = np.random.rand(8000, 10) ...: a = cp.random.rand(8000, 10) ...: _ = ((a[:, np.newaxis, :] - a) ** 2). sum (axis=-1) ...: _ = ((a[:, cp.newaxis, :] - a) ** 2). sum (axis=-1) ...: ...: CPU times: user 590 ms, sys: 292 ms, total: 882 ms CPU times: user 17 s, sys: 1.84 s, total: 18.8 s Wall time: 5.23 s Wall time: 880 ms

Pandas RAPIDS cuDF In [6]: %%time In [7]: %%time ...: import pandas as pd ...: import cudf ...: ratings = pd.read_csv('ml-20m/ratings.csv') ...: ratings = cudf.read_csv('ml-20m/ratings.csv') ...: ratings.groupby('userId').agg({'rating': [ ...: ratings.groupby('userId').agg({'rating': [ 'sum', 'mean', 'max', 'sum', 'mean', 'max', 'min']} 'min']}) ) ...: ...: CPU times: user 10.5 s, sys: 1.58 s, total: 12.1 s CPU times: user 1.2 s, sys: 409 ms, total: 1.61 s Wall time: 18 s Wall time: 1.66 s

Scikit-learn RAPIDS cuML In [4]: import pandas as pd In [1]: import cudf In [5]: from sklearn.neighbors import NearestNeighbors In [2]: from cuml.neighbors import NearestNeighbors In [6]: %%time In [3]: %%time ...: df = pd.read_csv('data.csv') ...: df = cudf.read_csv('data.csv') ...: nn = NearestNeighbors(n_neighbors=10) ...: nn = NearestNeighbors(n_neighbors=10) ...: nn.fit(df) ...: nn.fit(df) ...: neighbors = nn.kneighbors(df) ...: neighbors = nn.kneighbors(df) ...: ...: CPU times: user 3 min 34s, sys: 1.73 s, total: 3 min 36s CPU times: user 41.6 s, sys: 2.84 s, total: 44.4s Wall time: 1 min 52s Wall time: 17.8 s

Mars+RAPIDS ：更快地处理更多数据

Mars tensor ：实现了 70% 常见 Numpy 接口 • Tensor creation • Indexing • Basic manipulations • ones • Slice • astype • empty • Boolean indexing • transpose • zeros • Fancy indexing • broadcast_to • ones_like • newaxis • sort • … • Ellipsis • … • Random sampling • Discrete Fourier transform • Aggregation • rand • Linear Algebra • sum • randint • QR • nansum • beta • SVD • max • binomial • Cholesky • all • … • inv • mean • norm • … • …

Mars DataFrame 和 learn • DataFrame 实现接口： https://github.com/mars-project/mars/issues/495 • 创建 DataFrame ： DataFrame 、 from_records • IO ： read_csv • Basic arithmetic ：基本算数运算 • Math ：数学运算 • Indexing: iloc ，列选择， set_index • Reduction ：聚合 • Groupby ：分组聚合 • merge/join • Learn ： • Decomposition ： PCA ， TruncatedSVD • TensorFlow ： run_tensorflow_script ， MarsDataset 进行中 • XGBoost ： XGBClassifier 、 XGBRegressor • PyTorch ：进行中

Scale up In [4]: %%time In [4]: %%time ...: a = mt.random.uniform(-1, 1, size=( ...: a = mt.random.uniform(-1, 1, size=( 2000000000, 2), gpu= True ) 2000000000, 2), gpu= True ) ...: print (((mt.linalg.norm(a, axis=1) < 1). ...: print (((mt.linalg.norm(a, axis=1) < 1). sum () * 4 / 2000000000).execute( sum () * 4 / 2000000000).execute( n_parallel=1)) n_parallel=4)) ...: ...: 3.14157076 3.14156894 CPU times: user 2.72 s, sys: 1.27 s, total: 3.99s CPU times: user 1.64 s, sys: 918 ms, total: 2.56 s Wall time: 3.98 s Wall time: 2.4 s Scale out 1 x Tesla V100 4 x Tesla V100 24core 4 x 24core In [4]: from mars.session import new_session In [3]: %%time In [5]: new_session('http://192.168.0.111:40002').as_default() ...: a = mt.random.uniform(-1, 1, size=( 2000000000, 2)) In [6]: %%time ...: print (((mt.linalg.norm(a, axis=1) < 1). ...: a = mt.random.uniform(-1, 1, size=( sum () * 4 / 2000000000).execute()) 2000000000, 2)) ...: ...: print (((mt.linalg.norm(a, axis=1) < 1). 3.14160312 sum () * 4 / 2000000000).execute()) CPU times: user 3 min 31s, sys: 1 min 42s, total: 5 min 14s ...: Wall time: 25.8 s ...: 3.141611406 CPU times: user 12.2 ms, sys: 2.02 ms, total: 14.3 ms Wall time: 7.66 s 蒙特卡洛求解 PI

Mars 如何作到并行和分布式？让我们看看 Mars 背后的设计哲学

粗粒度计算图哲学 1 ：分而治之 data In [1]: import mars.tensor as mt Series(s) SeriesData In [2]: import mars.dataframe as md In [3]: a = mt.ones((10, 10), chunk_size=5) Sum In [4]: a[5, 5] = 8 data DataFrame(df) DatFrameData In [5]: df = md.DataFrame(a) In [6]: s = df. sum () FromT In [7]: s.execute() ensor Out[7]: 0 10.0 1 10.0 TensorData 2 10.0 3 10.0 4 10.0 5 17.0 IndexS 6 10.0 etValue indexes: (5, 5) 7 10.0 value: 8 8 10.0 data Tileable 9 10.0 tensor(a) TensorData dtype: float64 TileableData Ones Operand

SeriesChunkData SeriesChunkData 粗粒度计算图细粒度计算图 Sum Sum SeriesData DatFrameChunkDat DatFrameChunkDat a DatFrameChunkDat a a Sum Sum Conc Conc at at DatFrameData DatFrameChunkDat DatFrameChunkDat DatFrameChunkDat DatFrameChunkDat a a a a FromT ensor Tile FromT Sum Sum Sum ensor TensorData DatFrameChunkDat DatFrameChunkDat DatFrameChunkDat TensorChunkData a a a IndexS etValue indexes: (5, 5) value: 8 FromT FromT FromT IndexS ensor ensor ensor etValue indexes: (0, 0) TensorData value: 8 TensorChunkData TensorChunkData TensorChunkData TensorChunkData (0,0) (1,0) (0,1) (1,1) Ones Ones Ones Ones Ones

MARS RAPIDS GPU - PowerPoint PPT Presentation

MARS RAPIDS GPU Mars+RAPIDS CONTENT Mars+RAPIDS

MARS DOES MARS HAVE MOONS? Mars Moons Phobos and Deimos THE DISCOVERY OF MARS MOONS

Part 1: Observation of Mars Size of Mars Mars is too small to see as anything other than a

The Planet Mars The Planet Mars (Red Planet) Ivan Ivanov SSH3-13b Ivan Ivanov SSH3-23b

Knowledge Federation Dino Karabeg onsdag 5. mars 14 Text onsdag 5. mars 14 Text onsdag 5.

RAPIDS: Deep Dive Into How the Platform Works Paul Mahler, 3/18/19 Introduction to RAPIDS 2

RAPIDS: PLATFORM INSIDE AND OUT Joshua Patterson 3-19-2019 RAPIDS End to End Accelerate GPU Data

Welcome Perham to Pelican Rapids Regional Trail Perham to Pelican Rapids Regional Trail Status

Webinar Series CITY OF GRAND RAPIDS' CANNABIS LICENSING, SOCIAL EQUITY, AND ZONING REGULATIONS

Mars Attacks! Revisited. Differential Attack 12 Rounds of the MARS Core and Defeating the Complex

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Mars Desert Research Station MDRS Presentation Summary of Mars Desert Research Station HAB

Voyage to Mars Space Simulation Your class is divided into two crews Mars Control Spacecraft

Advance Space Exploration : Mars Science Laboratory/Curiosity J. Douglas McCuistion Director,

MARS detector technology and the SOLiD experiment A. Vacheret, A. Weber, Y. Shitov, P . Scovell

Mars Image-based Modeling and Rendering Mars Rovers have 3 cameras Panoramic Camera

OEI Assistance with Local Government Advanced Energy Goals Megan Levy, Local Energy Programs

LAND RESTORATION FUND CAIRNS 17 March 2020 Presentation overview About the Land

Long-range/short-range energy decomposition in density functional theory Julien Toulouse Franc

By Colin Smith - Operational Manager Neighbourhood Services [Operations] WASTE MANAGEMENT UPDATE

ZONING PRESENTATION MATJHABENG LOCAL MUNICIPALITY BACKGROUND PART OF LAND USE MANAGEMENT

NELSON MANDELA BAY PORTS AGENDA Transnet : Vision & Mission Transnet at the Glance

Weak Ergodicity Breaking on the Nano-Scale Eli Barkai Bar-Ilan University Bel, Burov, Margolin,

Stationary analysis of a queueing model with local choice Hanene MOHAMED joint work with :

MARS RAPIDS GPU - PowerPoint PPT Presentation

MARS RAPIDS GPU Mars+RAPIDS CONTENT Mars+RAPIDS

MARS DOES MARS HAVE MOONS? Mars Moons Phobos and Deimos THE DISCOVERY OF MARS MOONS

Part 1: Observation of Mars Size of Mars Mars is too small to see as anything other than a

The Planet Mars The Planet Mars (Red Planet) Ivan Ivanov SSH3-13b Ivan Ivanov SSH3-23b

Knowledge Federation Dino Karabeg onsdag 5. mars 14 Text onsdag 5. mars 14 Text onsdag 5.

RAPIDS: Deep Dive Into How the Platform Works Paul Mahler, 3/18/19 Introduction to RAPIDS 2

RAPIDS: PLATFORM INSIDE AND OUT Joshua Patterson 3-19-2019 RAPIDS End to End Accelerate GPU Data

Welcome Perham to Pelican Rapids Regional Trail Perham to Pelican Rapids Regional Trail Status

Webinar Series CITY OF GRAND RAPIDS' CANNABIS LICENSING, SOCIAL EQUITY, AND ZONING REGULATIONS

Mars Attacks! Revisited. Differential Attack 12 Rounds of the MARS Core and Defeating the Complex

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Mars Desert Research Station MDRS Presentation Summary of Mars Desert Research Station HAB

Voyage to Mars Space Simulation Your class is divided into two crews Mars Control Spacecraft

Advance Space Exploration : Mars Science Laboratory/Curiosity J. Douglas McCuistion Director,

MARS detector technology and the SOLiD experiment A. Vacheret, A. Weber, Y. Shitov, P . Scovell

Mars Image-based Modeling and Rendering Mars Rovers have 3 cameras Panoramic Camera

OEI Assistance with Local Government Advanced Energy Goals Megan Levy, Local Energy Programs

LAND RESTORATION FUND CAIRNS 17 March 2020 Presentation overview About the Land

Long-range/short-range energy decomposition in density functional theory Julien Toulouse Franc

By Colin Smith - Operational Manager Neighbourhood Services [Operations] WASTE MANAGEMENT UPDATE

ZONING PRESENTATION MATJHABENG LOCAL MUNICIPALITY BACKGROUND PART OF LAND USE MANAGEMENT

NELSON MANDELA BAY PORTS AGENDA Transnet : Vision &amp; Mission Transnet at the Glance

Weak Ergodicity Breaking on the Nano-Scale Eli Barkai Bar-Ilan University Bel, Burov, Margolin,

Stationary analysis of a queueing model with local choice Hanene MOHAMED joint work with :

NELSON MANDELA BAY PORTS AGENDA Transnet : Vision & Mission Transnet at the Glance