London Tokyo California Ohio
277ms 63Mbps 17Mbps 183ms 140ms 23Mbps 35ms 13Mbps
Distributed Training Across the World
1
Distributed Training Across the World 183ms 23Mbps California - - PowerPoint PPT Presentation
London Ohio 140ms Distributed Training Across the World 183ms 23Mbps California 35ms Tokyo 17Mbps 63Mbps Ligeng Zhu , Yao Lu, Hongzhou Lin, Yujun Lin, Song Han 277ms 13Mbps Neurips 19 MLSys 1 Why Distributed Training? Model sizes
1
2
3
4
5
6
7
8
9
10
Delayed Distributed
……
Normal Distributed
……
11
n−1
i=0
12
n−1−t
i=0
Same as normal distributed training
n−1
i=n−t
Difference caused by local update
<latexit sha1_base64="a2L7QmEFTvK2EMp86o0ULMi8C2g=">ACvHicdVFNb9QwEHXCR8vytcCRi8UKCYl2lSAkyqGogh4FsG2lTZL5DiTXcdJ7InhZXlPwkn/g2T7UqFkay9DQf7808F61WDpPkVxTfuHnr9tb2ncHde/cfPBw+enzsms5KmMhGN/a0EA60MjBhRpOWwuiLjScFMsPf3kHKxTjfmCqxZmtZgbVSkpkFL58Oe3Juds8D3OaEk8Jc860wJtrBCgs9cV+de7Sfhqze76S4GnjXE18v57BA0in5OhRBynyF8R/9Z1MCF46axtdC8pBusKjqEkqMVyigzD/+TMSwFiKZS/IdfnZJf6iqCiwYCVyKzhFrseK6kSTVtaVAoNbhKBkn6+DXQboBI7aJo3z4Iysb2dVgUGrh3DRNWpx5YVFJDWGQkU4r5FLMYUrQ0IVu5tfmB/58vUTVWHoG+Tr754QXtXOruqDOWuDCXa31yX/Vph1WezOvTEveGXkhVHWaY8P7nyRnLUjUKwJCWkW7crkQ5CfSfw/IhPTqydfB8atxmozT69HB+83dmyzp+wZe8FS9oYdsI/siE2YjN5GebSIVPwuLuNlXF+0xtFm5gn7K+Lz3y+12+Y=</latexit><latexit sha1_base64="a2L7QmEFTvK2EMp86o0ULMi8C2g=">ACvHicdVFNb9QwEHXCR8vytcCRi8UKCYl2lSAkyqGogh4FsG2lTZL5DiTXcdJ7InhZXlPwkn/g2T7UqFkay9DQf7808F61WDpPkVxTfuHnr9tb2ncHde/cfPBw+enzsms5KmMhGN/a0EA60MjBhRpOWwuiLjScFMsPf3kHKxTjfmCqxZmtZgbVSkpkFL58Oe3Juds8D3OaEk8Jc860wJtrBCgs9cV+de7Sfhqze76S4GnjXE18v57BA0in5OhRBynyF8R/9Z1MCF46axtdC8pBusKjqEkqMVyigzD/+TMSwFiKZS/IdfnZJf6iqCiwYCVyKzhFrseK6kSTVtaVAoNbhKBkn6+DXQboBI7aJo3z4Iysb2dVgUGrh3DRNWpx5YVFJDWGQkU4r5FLMYUrQ0IVu5tfmB/58vUTVWHoG+Tr754QXtXOruqDOWuDCXa31yX/Vph1WezOvTEveGXkhVHWaY8P7nyRnLUjUKwJCWkW7crkQ5CfSfw/IhPTqydfB8atxmozT69HB+83dmyzp+wZe8FS9oYdsI/siE2YjN5GebSIVPwuLuNlXF+0xtFm5gn7K+Lz3y+12+Y=</latexit><latexit sha1_base64="a2L7QmEFTvK2EMp86o0ULMi8C2g=">ACvHicdVFNb9QwEHXCR8vytcCRi8UKCYl2lSAkyqGogh4FsG2lTZL5DiTXcdJ7InhZXlPwkn/g2T7UqFkay9DQf7808F61WDpPkVxTfuHnr9tb2ncHde/cfPBw+enzsms5KmMhGN/a0EA60MjBhRpOWwuiLjScFMsPf3kHKxTjfmCqxZmtZgbVSkpkFL58Oe3Juds8D3OaEk8Jc860wJtrBCgs9cV+de7Sfhqze76S4GnjXE18v57BA0in5OhRBynyF8R/9Z1MCF46axtdC8pBusKjqEkqMVyigzD/+TMSwFiKZS/IdfnZJf6iqCiwYCVyKzhFrseK6kSTVtaVAoNbhKBkn6+DXQboBI7aJo3z4Iysb2dVgUGrh3DRNWpx5YVFJDWGQkU4r5FLMYUrQ0IVu5tfmB/58vUTVWHoG+Tr754QXtXOruqDOWuDCXa31yX/Vph1WezOvTEveGXkhVHWaY8P7nyRnLUjUKwJCWkW7crkQ5CfSfw/IhPTqydfB8atxmozT69HB+83dmyzp+wZe8FS9oYdsI/siE2YjN5GebSIVPwuLuNlXF+0xtFm5gn7K+Lz3y+12+Y=</latexit><latexit sha1_base64="a2L7QmEFTvK2EMp86o0ULMi8C2g=">ACvHicdVFNb9QwEHXCR8vytcCRi8UKCYl2lSAkyqGogh4FsG2lTZL5DiTXcdJ7InhZXlPwkn/g2T7UqFkay9DQf7808F61WDpPkVxTfuHnr9tb2ncHde/cfPBw+enzsms5KmMhGN/a0EA60MjBhRpOWwuiLjScFMsPf3kHKxTjfmCqxZmtZgbVSkpkFL58Oe3Juds8D3OaEk8Jc860wJtrBCgs9cV+de7Sfhqze76S4GnjXE18v57BA0in5OhRBynyF8R/9Z1MCF46axtdC8pBusKjqEkqMVyigzD/+TMSwFiKZS/IdfnZJf6iqCiwYCVyKzhFrseK6kSTVtaVAoNbhKBkn6+DXQboBI7aJo3z4Iysb2dVgUGrh3DRNWpx5YVFJDWGQkU4r5FLMYUrQ0IVu5tfmB/58vUTVWHoG+Tr754QXtXOruqDOWuDCXa31yX/Vph1WezOvTEveGXkhVHWaY8P7nyRnLUjUKwJCWkW7crkQ5CfSfw/IhPTqydfB8atxmozT69HB+83dmyzp+wZe8FS9oYdsI/siE2YjN5GebSIVPwuLuNlXF+0xtFm5gn7K+Lz3y+12+Y=</latexit>13
14
15
16
Normal Distributed
Node 1 Node 2
…… ……
Temporal Sparse
Node 1 Node 2
17
n = un + ( n
i=np+1
n
i=np+1
n = wn + ( n1
i=np+1
n1
i=np
n = wn + ( n
i=np i
j=np
n
i=np i
j=np
18
19
20
Scalability
0.00 0.20 0.40 0.60 0.80
Network Latency (ms)
1 5 10 50 100 500 1000 5000
delay=4 delay=8 delay=12 delay=16 delay=20
21
22
23
Differentiable Model
tensor([[[[-5.3668e+01, -1.0342e+01, -3.1377e+00], [-7.5185e-01, 1.6444e+01, -2.1058e+01], [-8.7487e+00, -5.0473e+00, -5.5008e+00]],
24
Differentiable Model
tensor([[[[-5.3668e+01, -1.0342e+01, -3.1377e+00], [-7.5185e-01, 1.6444e+01, -2.1058e+01], [-8.7487e+00, -5.0473e+00, -5.5008e+00]],
25 tensor([[[[-5.3668e+01, -1.0342e+01, -3.1377e+00], [-7.5185e-01, 1.6444e+01, -2.1058e+01], [-8.7487e+00, -5.0473e+00, -5.5008e+00]],
[1] L. Melis, C. Song, E. D. Cristofaro, and V. Shmatikov. Exploiting unintended feature leakage in collaborative learning. [2] R. Shokri, M. Stronati, C. Song, and V. Shmatikov. Membership inference attacks against machine learning models.
26
Differentiable Model
27
Differentiable Model
Differentiable Model
MSE
28
29
30
31
32
200 400 600 800 1000 1200 Iterations 0.000 0.025 0.050 0.075 0.100 0.125 0.150 Gradient Match Loss
gaussian-10−4 gaussian-10−3 gaussian-10−2 gaussian-10−1
Deep Leakage Leak with artifacts No leak
200 400 600 800 1000 1200 Iterations 0.000 0.025 0.050 0.075 0.100 0.125 0.150 Gradient Match Loss
laplacian-10−4 laplacian-10−3 laplacian-10−2 laplacian-10−1
Deep Leakage Leak with artifacts No leak
33
200 400 600 800 1000 1200 Iterations 0.000 0.025 0.050 0.075 0.100 0.125 0.150 Gradient Match Loss
prune-ratio-1% prune-ratio-10% prune-ratio-20% prune-ratio-30% prune-ratio-50% prune-ratio-70%
Deep Leakage Leak with artifacts No leak
200 400 600 800 1000 1200 Iterations 0.000 0.025 0.050 0.075 0.100 0.125 0.150 Gradient Match Loss
IEEE-fp16 B-fp16
Deep Leakage
Advertisement: I am applying for Ph.D.