mars rapids gpu
play

MARS RAPIDS GPU - PowerPoint PPT Presentation

MARS RAPIDS GPU Mars+RAPIDS CONTENT Mars+RAPIDS


  1. 当 MARS 遇上 RAPIDS : 使用 GPU 加速分布式海量数据处理的原理和实战 阿里云智能 秦续业 何开圣

  2. 目录 背景 Mars+RAPIDS 能做什么 CONTENT Mars+RAPIDS 怎么做 性能和展望

  3. 机器学习生命周期 特征工程 / 模型训练 { 新的数据 } Data 数据处理 / 模型部署 / 维 训练的模型 数据分析 护 / 改进 { 预测 } 往往要占用 80% 的 时间

  4. Google 趋势(全球)

  5. 日益增长的数据科学技术栈

  6. Data Engineer Data Scientist

  7. Mars : numpy 、 pandas 、 scikit-learn 的并行和 分布式加速器,处理更多数据

  8. Numpy Mars tensor 1 import numpy as np 1 import mars.tensor as mt 2 from scipy.special import erf 2 from mars.tensor.special import erf 3 3 4 4 5 def black_scholes(P, S, T, rate, vol): 5 def black_scholes(P, S, T, rate, vol): 6 a = np.log(P / S) 6 a = mt.log(P / S) 7 b = T * -rate 7 b = T * -rate 8 8 9 z = T * (vol * vol * 2) 9 z = T * (vol * vol * 2) 10 c = 0.25 * z 10 c = 0.25 * z 11 y = 1.0 / np.sqrt(z) 11 y = 1.0 / mt.sqrt(z) 12 12 13 w1 = (a - b + c) * y 13 w1 = (a - b + c) * y 14 w2 = (a - b - c) * y 14 w2 = (a - b - c) * y 15 15 运行时间: 5.48 s 16 d1 = 0.5 + 0.5 * erf(w1) 16 d1 = 0.5 + 0.5 * erf(w1) 运行时间: 11.9 s 峰值内存: 1647.85 17 d2 = 0.5 + 0.5 * erf(w2) 17 d2 = 0.5 + 0.5 * erf(w2) 峰值内存: 5479.47 MiB MiB 18 18 19 Se = np.exp(b) * S 19 Se = mt.exp(b) * S 20 20 21 call = P * d1 - Se * d2 21 call = P * d1 - Se * d2 22 put = call - P + Se 22 put = call - P + Se 23 23 24 return call, put 24 return call, put 25 25 26 26 27 N = 50000000 27 N = 50000000 28 price = np.random.uniform(10.0, 50.0, N) 28 price = mt.random.uniform(10.0, 50.0, N) 29 strike = np.random.uniform(10.0, 50.0, N) 29 strike = mt.random.uniform(10.0, 50.0, N) 30 t = np.random.uniform(1.0, 2.0, N) 30 t = mt.random.uniform(1.0, 2.0, N) 31 print (black_scholes(price, strike, t, 0.1, 0.2)) 31 print (mt.ExecutableTuple(black_scholes(price, 32 strike, t, 0.1, 0.2)).execute())

  9. Pandas Mars DataFrame 1 import numpy as np 1 import mars.tensor as mt 2 import pandas as pd 2 import mars.dataframe as md 3 3 运行时间: 5.25 s 运行时间: 18.7 s 峰值内存: 2007.92 MiB 4 df = pd.DataFrame(np.random.rand(100000000, 4), 4 df = md.DataFrame(mt.random.rand(100000000, 4), 峰值内存: 3430.29 MiB 5 columns= list ('abcd')) 5 columns= list ('abcd')) 6 print (df. sum ()) 6 print (df. sum ().execute())

  10. Scikit-learn Mars learn 1 from sklearn.datasets import make_blobs 1 from sklearn.datasets import make_blobs 2 from sklearn.decomposition.pca import PCA 2 from mars.learn.decomposition import PCA 3 3 4 X, y = make_blobs( 4 X, y = make_blobs( 5 n_samples=100000000, n_features=3, 5 n_samples=100000000, n_features=3, 6 centers=[[3, 3, 3], [0, 0, 0], 6 centers=[[3, 3, 3], [0, 0, 0], 运行时间: 19.1 s 运行时间: 12.8 s 7 [1, 1, 1], [2, 2, 2]], 7 [1, 1, 1], [2, 2, 2]], 峰值内存: 7314.82 MiB 峰值内存: 3814.32 MiB 8 cluster_std=[0.2, 0.1, 0.2, 0.2], 8 cluster_std=[0.2, 0.1, 0.2, 0.2], 9 random_state=9) 9 random_state=9) 10 pca = PCA(n_components=3) 10 pca = PCA(n_components=3) 11 pca.fit(X) 11 pca.fit(X) 12 print (pca.explained_variance_ratio_) 12 print (pca.explained_variance_ratio_.execute()) 13 print (pca.explained_variance_) 13 print (pca.explained_variance_.execute())

  11. 机器学习生命周期 支持 GPU 加速 特征工程 / 模型训练 GPU??? { 新的数据 } Data 数据处理 / 模型部署 / 维 训练的模型 数据分析 护 / 改进 { 预测 } 往往要占用 80% 的 支持 GPU 加速 时间

  12. Numpy Cupy In [2]: import cupy as cp In [1]: import numpy as np In [4]: %%time In [5]: %%time ...: a = np.random.rand(8000, 10) ...: a = cp.random.rand(8000, 10) ...: _ = ((a[:, np.newaxis, :] - a) ** 2). sum (axis=-1) ...: _ = ((a[:, cp.newaxis, :] - a) ** 2). sum (axis=-1) ...: ...: CPU times: user 590 ms, sys: 292 ms, total: 882 ms CPU times: user 17 s, sys: 1.84 s, total: 18.8 s Wall time: 5.23 s Wall time: 880 ms

  13. Pandas RAPIDS cuDF In [6]: %%time In [7]: %%time ...: import pandas as pd ...: import cudf ...: ratings = pd.read_csv('ml-20m/ratings.csv') ...: ratings = cudf.read_csv('ml-20m/ratings.csv') ...: ratings.groupby('userId').agg({'rating': [ ...: ratings.groupby('userId').agg({'rating': [ 'sum', 'mean', 'max', 'sum', 'mean', 'max', 'min']} 'min']}) ) ...: ...: CPU times: user 10.5 s, sys: 1.58 s, total: 12.1 s CPU times: user 1.2 s, sys: 409 ms, total: 1.61 s Wall time: 18 s Wall time: 1.66 s

  14. Scikit-learn RAPIDS cuML In [4]: import pandas as pd In [1]: import cudf In [5]: from sklearn.neighbors import NearestNeighbors In [2]: from cuml.neighbors import NearestNeighbors In [6]: %%time In [3]: %%time ...: df = pd.read_csv('data.csv') ...: df = cudf.read_csv('data.csv') ...: nn = NearestNeighbors(n_neighbors=10) ...: nn = NearestNeighbors(n_neighbors=10) ...: nn.fit(df) ...: nn.fit(df) ...: neighbors = nn.kneighbors(df) ...: neighbors = nn.kneighbors(df) ...: ...: CPU times: user 3 min 34s, sys: 1.73 s, total: 3 min 36s CPU times: user 41.6 s, sys: 2.84 s, total: 44.4s Wall time: 1 min 52s Wall time: 17.8 s

  15. Mars+RAPIDS :更快地处理更多数据

  16. Mars tensor :实现了 70% 常见 Numpy 接口 • Tensor creation • Indexing • Basic manipulations • ones • Slice • astype • empty • Boolean indexing • transpose • zeros • Fancy indexing • broadcast_to • ones_like • newaxis • sort • … • Ellipsis • … • Random sampling • Discrete Fourier transform • Aggregation • rand • Linear Algebra • sum • randint • QR • nansum • beta • SVD • max • binomial • Cholesky • all • … • inv • mean • norm • … • …

  17. Mars DataFrame 和 learn • DataFrame 实现接口: https://github.com/mars-project/mars/issues/495 • 创建 DataFrame : DataFrame 、 from_records • IO : read_csv • Basic arithmetic :基本算数运算 • Math :数学运算 • Indexing: iloc ,列选择, set_index • Reduction :聚合 • Groupby :分组聚合 • merge/join • Learn : • Decomposition : PCA , TruncatedSVD • TensorFlow : run_tensorflow_script , MarsDataset 进行中 • XGBoost : XGBClassifier 、 XGBRegressor • PyTorch :进行中

  18. Scale up In [4]: %%time In [4]: %%time ...: a = mt.random.uniform(-1, 1, size=( ...: a = mt.random.uniform(-1, 1, size=( 2000000000, 2), gpu= True ) 2000000000, 2), gpu= True ) ...: print (((mt.linalg.norm(a, axis=1) < 1). ...: print (((mt.linalg.norm(a, axis=1) < 1). sum () * 4 / 2000000000).execute( sum () * 4 / 2000000000).execute( n_parallel=1)) n_parallel=4)) ...: ...: 3.14157076 3.14156894 CPU times: user 2.72 s, sys: 1.27 s, total: 3.99s CPU times: user 1.64 s, sys: 918 ms, total: 2.56 s Wall time: 3.98 s Wall time: 2.4 s Scale out 1 x Tesla V100 4 x Tesla V100 24core 4 x 24core In [4]: from mars.session import new_session In [3]: %%time In [5]: new_session('http://192.168.0.111:40002').as_default() ...: a = mt.random.uniform(-1, 1, size=( 2000000000, 2)) In [6]: %%time ...: print (((mt.linalg.norm(a, axis=1) < 1). ...: a = mt.random.uniform(-1, 1, size=( sum () * 4 / 2000000000).execute()) 2000000000, 2)) ...: ...: print (((mt.linalg.norm(a, axis=1) < 1). 3.14160312 sum () * 4 / 2000000000).execute()) CPU times: user 3 min 31s, sys: 1 min 42s, total: 5 min 14s ...: Wall time: 25.8 s ...: 3.141611406 CPU times: user 12.2 ms, sys: 2.02 ms, total: 14.3 ms Wall time: 7.66 s 蒙特卡洛求解 PI

  19. Mars 如何作到并行和分布式? 让我们看看 Mars 背后的设计哲学

  20. 粗粒度计算图 哲学 1 :分而治之 data In [1]: import mars.tensor as mt Series(s) SeriesData In [2]: import mars.dataframe as md In [3]: a = mt.ones((10, 10), chunk_size=5) Sum In [4]: a[5, 5] = 8 data DataFrame(df) DatFrameData In [5]: df = md.DataFrame(a) In [6]: s = df. sum () FromT In [7]: s.execute() ensor Out[7]: 0 10.0 1 10.0 TensorData 2 10.0 3 10.0 4 10.0 5 17.0 IndexS 6 10.0 etValue indexes: (5, 5) 7 10.0 value: 8 8 10.0 data Tileable 9 10.0 tensor(a) TensorData dtype: float64 TileableData Ones Operand

  21. SeriesChunkData SeriesChunkData 粗粒度计算图 细粒度计算图 Sum Sum SeriesData DatFrameChunkDat DatFrameChunkDat a DatFrameChunkDat a a Sum Sum Conc Conc at at DatFrameData DatFrameChunkDat DatFrameChunkDat DatFrameChunkDat DatFrameChunkDat a a a a FromT ensor Tile FromT Sum Sum Sum ensor TensorData DatFrameChunkDat DatFrameChunkDat DatFrameChunkDat TensorChunkData a a a IndexS etValue indexes: (5, 5) value: 8 FromT FromT FromT IndexS ensor ensor ensor etValue indexes: (0, 0) TensorData value: 8 TensorChunkData TensorChunkData TensorChunkData TensorChunkData (0,0) (1,0) (0,1) (1,1) Ones Ones Ones Ones Ones

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend