Distributed Training Across the World 183ms 23Mbps California - - PowerPoint PPT Presentation

distributed training across the world
SMART_READER_LITE
LIVE PREVIEW

Distributed Training Across the World 183ms 23Mbps California - - PowerPoint PPT Presentation

London Ohio 140ms Distributed Training Across the World 183ms 23Mbps California 35ms Tokyo 17Mbps 63Mbps Ligeng Zhu , Yao Lu, Hongzhou Lin, Yujun Lin, Song Han 277ms 13Mbps Neurips 19 MLSys 1 Why Distributed Training? Model sizes


slide-1
SLIDE 1

London Tokyo California Ohio

277ms 63Mbps 17Mbps 183ms 140ms 23Mbps 35ms 13Mbps

Distributed Training Across the World

1

Ligeng Zhu, Yao Lu, Hongzhou Lin, Yujun Lin, Song Han Neurips 19 MLSys

slide-2
SLIDE 2

Why Distributed Training?

2

  • Model sizes
  • AlexNet (7 layers) -> VGG (16 layers) -> ResNet (152 layers)
  • Dataset sizes
  • CIFAR (50k) -> ImageNet (1.2M) -> Google JFG (300M)

Even with modern GPU, says eight-V100 server, it still takes days and even weeks to train a model.

slide-3
SLIDE 3

What is Distributed Training?

3

Conventional SGD

  • 1. Sample (X, y) from dataset
  • 2. Forward to compute loss.
  • 3. Backward to compute gradients.
  • 4. Apply gradients to update model.

Distributed SGD

  • 1. Sample (X, y) from dataset
  • 2. Forward to compute loss.
  • 3. Backward to compute gradients.
  • 4. Synchronize gradients
  • 5. Apply gradients to update model.
slide-4
SLIDE 4

Why Learning across Geographical Locations?

4

It's not who has the best algorithm that wins. It’s who has the most data. —Andrew Ng However, it is always difficult to collect data (even illegal sometimes).

slide-5
SLIDE 5

Why Learning across Geographical Locations?

5

Collaborative / Federated Learning Data never leaves local device.

slide-6
SLIDE 6

Communication Limits Scalability

6

  • Infinity band: < 0.002 ms
  • Mobile network: ~50ms (4G) / ~10ms (5G)

Latency Bandwidth

  • Infinity band: up to 100 Gb/s
  • Mobile network: 100 Mb/s (4G), 1Gb/s (5G)

Shanghai - Boston:

  • 10 Mb/s with a high variance.
  • 78ms (ideal) / >700ms (real world)

What we need

  • Bandwidth as high 900 Mb/s.
  • Latency as low as 1ms.

Bandwidth is easy to increase. Latency is hard to improve : (

11,725km × 2/(3 × 108m/s) = 78.16ms

slide-7
SLIDE 7

Latency is critical

7

slide-8
SLIDE 8

London Tokyo California Ohio

277ms 63Mbps 17Mbps 183ms 140ms 23Mbps 35ms 13Mbps

Distributed Training Across the World

8

Ligeng Zhu, Yao Lu, Hongzhou Lin, Yujun Lin, Song Han Neurips 19 MLSys

slide-9
SLIDE 9

Delayed Update: sync stale gradients

9

Conventional Distributed SGD at step i

1. Sample (X, y) from dataset 2. Forward to compute loss. 3. Backward to compute gradients. 4. Synchronize step i’s gradients 5. Apply gradients to update model.

Delayed Update at step i

1. Sample (X, y) from dataset 2. Forward to compute loss. 3. Backward to compute gradients. 4. Synchronize step (i - t)’s gradients 5. Apply gradients to update model.

slide-10
SLIDE 10

Delayed Update: put off the sync barrier

10

Delayed Distributed

1 1 2 2 3 3 4 4

……

3 1 2

Normal Distributed

1 I 2 2 3 3 4 4 1 2 3

……

4 Gradients from time stamp i are synced before (i+1)th update. Gradients from time stamp (i-t) are synced before (i+1)^th update.

slide-11
SLIDE 11

Preserve Accuracy by Compensation

11

Sync gradients (i-t) are synced at step i update. Vanilla SGD wn = w0 − γ

n−1

i=0

vi w3 = w0 − γ(v0 + v1 + v2) For example, if t = 2 w3 = w0−γ(v0 + v1 + v2) −γ(v0 − v0) Delayed Update = w0−γ(v0 + v1 + v2)

slide-12
SLIDE 12

Preserve Accuracy by Compensation

12

wn,j = w0 +

n−1−t

X

i=0

∆wi | {z }

Same as normal distributed training

+

n−1

X

i=n−t

∆wi,j | {z }

Difference caused by local update

<latexit sha1_base64="a2L7QmEFTvK2EMp86o0ULMi8C2g=">ACvHicdVFNb9QwEHXCR8vytcCRi8UKCYl2lSAkyqGogh4FsG2lTZL5DiTXcdJ7InhZXlPwkn/g2T7UqFkay9DQf7808F61WDpPkVxTfuHnr9tb2ncHde/cfPBw+enzsms5KmMhGN/a0EA60MjBhRpOWwuiLjScFMsPf3kHKxTjfmCqxZmtZgbVSkpkFL58Oe3Juds8D3OaEk8Jc860wJtrBCgs9cV+de7Sfhqze76S4GnjXE18v57BA0in5OhRBynyF8R/9Z1MCF46axtdC8pBusKjqEkqMVyigzD/+TMSwFiKZS/IdfnZJf6iqCiwYCVyKzhFrseK6kSTVtaVAoNbhKBkn6+DXQboBI7aJo3z4Iysb2dVgUGrh3DRNWpx5YVFJDWGQkU4r5FLMYUrQ0IVu5tfmB/58vUTVWHoG+Tr754QXtXOruqDOWuDCXa31yX/Vph1WezOvTEveGXkhVHWaY8P7nyRnLUjUKwJCWkW7crkQ5CfSfw/IhPTqydfB8atxmozT69HB+83dmyzp+wZe8FS9oYdsI/siE2YjN5GebSIVPwuLuNlXF+0xtFm5gn7K+Lz3y+12+Y=</latexit><latexit sha1_base64="a2L7QmEFTvK2EMp86o0ULMi8C2g=">ACvHicdVFNb9QwEHXCR8vytcCRi8UKCYl2lSAkyqGogh4FsG2lTZL5DiTXcdJ7InhZXlPwkn/g2T7UqFkay9DQf7808F61WDpPkVxTfuHnr9tb2ncHde/cfPBw+enzsms5KmMhGN/a0EA60MjBhRpOWwuiLjScFMsPf3kHKxTjfmCqxZmtZgbVSkpkFL58Oe3Juds8D3OaEk8Jc860wJtrBCgs9cV+de7Sfhqze76S4GnjXE18v57BA0in5OhRBynyF8R/9Z1MCF46axtdC8pBusKjqEkqMVyigzD/+TMSwFiKZS/IdfnZJf6iqCiwYCVyKzhFrseK6kSTVtaVAoNbhKBkn6+DXQboBI7aJo3z4Iysb2dVgUGrh3DRNWpx5YVFJDWGQkU4r5FLMYUrQ0IVu5tfmB/58vUTVWHoG+Tr754QXtXOruqDOWuDCXa31yX/Vph1WezOvTEveGXkhVHWaY8P7nyRnLUjUKwJCWkW7crkQ5CfSfw/IhPTqydfB8atxmozT69HB+83dmyzp+wZe8FS9oYdsI/siE2YjN5GebSIVPwuLuNlXF+0xtFm5gn7K+Lz3y+12+Y=</latexit><latexit sha1_base64="a2L7QmEFTvK2EMp86o0ULMi8C2g=">ACvHicdVFNb9QwEHXCR8vytcCRi8UKCYl2lSAkyqGogh4FsG2lTZL5DiTXcdJ7InhZXlPwkn/g2T7UqFkay9DQf7808F61WDpPkVxTfuHnr9tb2ncHde/cfPBw+enzsms5KmMhGN/a0EA60MjBhRpOWwuiLjScFMsPf3kHKxTjfmCqxZmtZgbVSkpkFL58Oe3Juds8D3OaEk8Jc860wJtrBCgs9cV+de7Sfhqze76S4GnjXE18v57BA0in5OhRBynyF8R/9Z1MCF46axtdC8pBusKjqEkqMVyigzD/+TMSwFiKZS/IdfnZJf6iqCiwYCVyKzhFrseK6kSTVtaVAoNbhKBkn6+DXQboBI7aJo3z4Iysb2dVgUGrh3DRNWpx5YVFJDWGQkU4r5FLMYUrQ0IVu5tfmB/58vUTVWHoG+Tr754QXtXOruqDOWuDCXa31yX/Vph1WezOvTEveGXkhVHWaY8P7nyRnLUjUKwJCWkW7crkQ5CfSfw/IhPTqydfB8atxmozT69HB+83dmyzp+wZe8FS9oYdsI/siE2YjN5GebSIVPwuLuNlXF+0xtFm5gn7K+Lz3y+12+Y=</latexit><latexit sha1_base64="a2L7QmEFTvK2EMp86o0ULMi8C2g=">ACvHicdVFNb9QwEHXCR8vytcCRi8UKCYl2lSAkyqGogh4FsG2lTZL5DiTXcdJ7InhZXlPwkn/g2T7UqFkay9DQf7808F61WDpPkVxTfuHnr9tb2ncHde/cfPBw+enzsms5KmMhGN/a0EA60MjBhRpOWwuiLjScFMsPf3kHKxTjfmCqxZmtZgbVSkpkFL58Oe3Juds8D3OaEk8Jc860wJtrBCgs9cV+de7Sfhqze76S4GnjXE18v57BA0in5OhRBynyF8R/9Z1MCF46axtdC8pBusKjqEkqMVyigzD/+TMSwFiKZS/IdfnZJf6iqCiwYCVyKzhFrseK6kSTVtaVAoNbhKBkn6+DXQboBI7aJo3z4Iysb2dVgUGrh3DRNWpx5YVFJDWGQkU4r5FLMYUrQ0IVu5tfmB/58vUTVWHoG+Tr754QXtXOruqDOWuDCXa31yX/Vph1WezOvTEveGXkhVHWaY8P7nyRnLUjUKwJCWkW7crkQ5CfSfw/IhPTqydfB8atxmozT69HB+83dmyzp+wZe8FS9oYdsI/siE2YjN5GebSIVPwuLuNlXF+0xtFm5gn7K+Lz3y+12+Y=</latexit>

Global information Local gradients

O( 1 NJ ) + O t2J N

Theoretical Convergence: Convergence of SGD:

O( 1 NJ )

slide-13
SLIDE 13

Delayed Update: put off the sync barrier

13

Naive Distributed SGD Delayed Update

max(0, Tcommunicate − Toverlap − t × Tcompute)

<latexit sha1_base64="4XBt5or1+kztjsDpuWEVj1obY8s=">ACQXicbVC7SgNBFJ31GeMramkzGIQIGnZF0DJoY6lgHpCEMDuZJIPzWGbuSsKyv2bjH9jZ21goYmvjbBJQoxcGDufc+/cE0aCW/D9J29ufmFxaTm3kl9dW9/YLGxt16yODWVqoU2jZBYJrhiVeAgWCMyjMhQsHp4e5Hp9TtmLNfqBkYRa0vSV7zHKQFHdQqNFrAhJIM05J/iG86yYSgWspYZW0sTfHRt6DdNEGiMQm4BVwy+8sWxZnloFMo+mV/XPgvCKagiKZ1Sk8trqaxpIpoIJY2wz8CNoJMcCpYGm+FVsWEXpL+qzpoCJucTsZJ5Difcd0cU8b9xTgMfvTkRBp7UiGrlMSGNhZLSP/05ox9M7aCVfZVYpOFvVigUHjLE7c5YZRECMHCDXc/RXTATGEgs970IZk/+C2rH5cAvB9cnxcr5NI4c2kV7qIQCdIoq6BJdoSqi6B49o1f05j14L9679zFpnfOmnh30q7zPL7vusio=</latexit><latexit sha1_base64="4XBt5or1+kztjsDpuWEVj1obY8s=">ACQXicbVC7SgNBFJ31GeMramkzGIQIGnZF0DJoY6lgHpCEMDuZJIPzWGbuSsKyv2bjH9jZ21goYmvjbBJQoxcGDufc+/cE0aCW/D9J29ufmFxaTm3kl9dW9/YLGxt16yODWVqoU2jZBYJrhiVeAgWCMyjMhQsHp4e5Hp9TtmLNfqBkYRa0vSV7zHKQFHdQqNFrAhJIM05J/iG86yYSgWspYZW0sTfHRt6DdNEGiMQm4BVwy+8sWxZnloFMo+mV/XPgvCKagiKZ1Sk8trqaxpIpoIJY2wz8CNoJMcCpYGm+FVsWEXpL+qzpoCJucTsZJ5Difcd0cU8b9xTgMfvTkRBp7UiGrlMSGNhZLSP/05ox9M7aCVfZVYpOFvVigUHjLE7c5YZRECMHCDXc/RXTATGEgs970IZk/+C2rH5cAvB9cnxcr5NI4c2kV7qIQCdIoq6BJdoSqi6B49o1f05j14L9679zFpnfOmnh30q7zPL7vusio=</latexit><latexit sha1_base64="4XBt5or1+kztjsDpuWEVj1obY8s=">ACQXicbVC7SgNBFJ31GeMramkzGIQIGnZF0DJoY6lgHpCEMDuZJIPzWGbuSsKyv2bjH9jZ21goYmvjbBJQoxcGDufc+/cE0aCW/D9J29ufmFxaTm3kl9dW9/YLGxt16yODWVqoU2jZBYJrhiVeAgWCMyjMhQsHp4e5Hp9TtmLNfqBkYRa0vSV7zHKQFHdQqNFrAhJIM05J/iG86yYSgWspYZW0sTfHRt6DdNEGiMQm4BVwy+8sWxZnloFMo+mV/XPgvCKagiKZ1Sk8trqaxpIpoIJY2wz8CNoJMcCpYGm+FVsWEXpL+qzpoCJucTsZJ5Difcd0cU8b9xTgMfvTkRBp7UiGrlMSGNhZLSP/05ox9M7aCVfZVYpOFvVigUHjLE7c5YZRECMHCDXc/RXTATGEgs970IZk/+C2rH5cAvB9cnxcr5NI4c2kV7qIQCdIoq6BJdoSqi6B49o1f05j14L9679zFpnfOmnh30q7zPL7vusio=</latexit><latexit sha1_base64="4XBt5or1+kztjsDpuWEVj1obY8s=">ACQXicbVC7SgNBFJ31GeMramkzGIQIGnZF0DJoY6lgHpCEMDuZJIPzWGbuSsKyv2bjH9jZ21goYmvjbBJQoxcGDufc+/cE0aCW/D9J29ufmFxaTm3kl9dW9/YLGxt16yODWVqoU2jZBYJrhiVeAgWCMyjMhQsHp4e5Hp9TtmLNfqBkYRa0vSV7zHKQFHdQqNFrAhJIM05J/iG86yYSgWspYZW0sTfHRt6DdNEGiMQm4BVwy+8sWxZnloFMo+mV/XPgvCKagiKZ1Sk8trqaxpIpoIJY2wz8CNoJMcCpYGm+FVsWEXpL+qzpoCJucTsZJ5Difcd0cU8b9xTgMfvTkRBp7UiGrlMSGNhZLSP/05ox9M7aCVfZVYpOFvVigUHjLE7c5YZRECMHCDXc/RXTATGEgs970IZk/+C2rH5cAvB9cnxcr5NI4c2kV7qIQCdIoq6BJdoSqi6B49o1f05j14L9679zFpnfOmnh30q7zPL7vusio=</latexit>

Tcommunicate − Toverlap

<latexit sha1_base64="w7WDBTtzDEAJpjPm6TJnQTGwBQs=">ACFXicbVDLSgNBEJz1bXxFPXoZDIHDbsi6FH04jFCoEkhNlJxmcxzLTK4Zlf8KLv+LFgyJeBW/+jZMY8BELGoq7pnuihMpHIbhRzA1PTM7N7+wWFhaXldK65vXDqTWg41bqSx9Zg5kEJDQVKqCcWmIolXMXZ0P/6gasE0ZXcZBAS7GeFl3BGXqpXdyj1XbWRLjFjBulUj10IM/p/rdh/AOSJXneLpbCcjgCnSTRmJTIGJV28b3ZMTxVoJFL5lwjChNsZcyi4BLyQjN1kDB+zXrQ8FQzBa6Vja7K6Y5XOrRrC+NdKT+nMiYcm6gYt+pGPbdX28o/uc1UuwetzKhkxRB86+PuqmkaOgwItoRFjKgSeMW+F3pbzPLOPogyz4EK/J0+Sy4NyFJaji8PSyek4jgWyRbJLonIETkh56RCaoSTO/JAnshzcB8Bi/B61frVDCe2S/ELx9AhPoAY=</latexit><latexit sha1_base64="w7WDBTtzDEAJpjPm6TJnQTGwBQs=">ACFXicbVDLSgNBEJz1bXxFPXoZDIHDbsi6FH04jFCoEkhNlJxmcxzLTK4Zlf8KLv+LFgyJeBW/+jZMY8BELGoq7pnuihMpHIbhRzA1PTM7N7+wWFhaXldK65vXDqTWg41bqSx9Zg5kEJDQVKqCcWmIolXMXZ0P/6gasE0ZXcZBAS7GeFl3BGXqpXdyj1XbWRLjFjBulUj10IM/p/rdh/AOSJXneLpbCcjgCnSTRmJTIGJV28b3ZMTxVoJFL5lwjChNsZcyi4BLyQjN1kDB+zXrQ8FQzBa6Vja7K6Y5XOrRrC+NdKT+nMiYcm6gYt+pGPbdX28o/uc1UuwetzKhkxRB86+PuqmkaOgwItoRFjKgSeMW+F3pbzPLOPogyz4EK/J0+Sy4NyFJaji8PSyek4jgWyRbJLonIETkh56RCaoSTO/JAnshzcB8Bi/B61frVDCe2S/ELx9AhPoAY=</latexit><latexit sha1_base64="w7WDBTtzDEAJpjPm6TJnQTGwBQs=">ACFXicbVDLSgNBEJz1bXxFPXoZDIHDbsi6FH04jFCoEkhNlJxmcxzLTK4Zlf8KLv+LFgyJeBW/+jZMY8BELGoq7pnuihMpHIbhRzA1PTM7N7+wWFhaXldK65vXDqTWg41bqSx9Zg5kEJDQVKqCcWmIolXMXZ0P/6gasE0ZXcZBAS7GeFl3BGXqpXdyj1XbWRLjFjBulUj10IM/p/rdh/AOSJXneLpbCcjgCnSTRmJTIGJV28b3ZMTxVoJFL5lwjChNsZcyi4BLyQjN1kDB+zXrQ8FQzBa6Vja7K6Y5XOrRrC+NdKT+nMiYcm6gYt+pGPbdX28o/uc1UuwetzKhkxRB86+PuqmkaOgwItoRFjKgSeMW+F3pbzPLOPogyz4EK/J0+Sy4NyFJaji8PSyek4jgWyRbJLonIETkh56RCaoSTO/JAnshzcB8Bi/B61frVDCe2S/ELx9AhPoAY=</latexit><latexit sha1_base64="w7WDBTtzDEAJpjPm6TJnQTGwBQs=">ACFXicbVDLSgNBEJz1bXxFPXoZDIHDbsi6FH04jFCoEkhNlJxmcxzLTK4Zlf8KLv+LFgyJeBW/+jZMY8BELGoq7pnuihMpHIbhRzA1PTM7N7+wWFhaXldK65vXDqTWg41bqSx9Zg5kEJDQVKqCcWmIolXMXZ0P/6gasE0ZXcZBAS7GeFl3BGXqpXdyj1XbWRLjFjBulUj10IM/p/rdh/AOSJXneLpbCcjgCnSTRmJTIGJV28b3ZMTxVoJFL5lwjChNsZcyi4BLyQjN1kDB+zXrQ8FQzBa6Vja7K6Y5XOrRrC+NdKT+nMiYcm6gYt+pGPbdX28o/uc1UuwetzKhkxRB86+PuqmkaOgwItoRFjKgSeMW+F3pbzPLOPogyz4EK/J0+Sy4NyFJaji8PSyek4jgWyRbJLonIETkh56RCaoSTO/JAnshzcB8Bi/B61frVDCe2S/ELx9AhPoAY=</latexit>

Delayed update speeds up training and also tolerate high latency! Wait time:

slide-14
SLIDE 14

Delayed Update: put off the sync barrier

14

Accuracy: promised Latency issue: solved. 20 delay -> tolerate 6s latency. Remaining issues: Bandwidth / Congestions

slide-15
SLIDE 15

Temporally Sparse Update: periodically sync

15

Conventional Distributed SGD at step i

1. Sample (X, y) from dataset 2. Forward to compute loss. 3. Backward to compute gradients. 4. Synchronize step i’s gradients 5. Apply gradients to update model.

Temporally Update at step i

1. Sample (X, y) from dataset 2. Forward to compute loss. 3. Backward to compute gradients. 4. Synchronize step [i - d, i)’s gradients if i mod d == 0. 5. Apply gradients to update model.

slide-16
SLIDE 16

Temporally Sparse Update: reduce sync frequency

16

Normal Distributed

Node 1 Node 2

…… ……

Temporal Sparse

Node 1 Node 2

Gradients from time stamp i are synced before (i+1)^th update. Gradients from time stamp (i-p, i] are synced before (i+1)^th update.

slide-17
SLIDE 17

Temporally Sparse Update: reduce sync frequency

17

Gradients from time stamp (i-p, i] are synced before (i+1)^th update. Similarly, we compute the compensation.

(vn−p+1,j, vn−p+2,j, ..., vn,j) ← (vn−p+1, vn−p+2, ..., vn)

<latexit sha1_base64="sOWEXS8MzaTz0P7uQzv/1XdUk=">ACY3icbVFbS8MwGE3rbdZbnb6JEBzCxFnaIej0BcfFZwK6xhplmpmpQknczSP+mb74P0y3Cur8IHC+c75LchKljCrt+WvbC4tLxSW3XW1jc2t9zt+p0SmcSkiwUT8iFCijDKSVdTzchDKglKIkbuo+fLUr8fE6mo4Ld6kpJ+gh45jSlG2lAD97U5HuT8JD0OWqOiBaukPU08z5sxLTgqjqATMhJrJKV4gU4zFGZsuTX/HlCYljm2XwP+i0VxdHAbfiePw04D4IKNEAV1wP3LRwKnCWEa8yQUr3AT3U/R1JTzEjhJkiKcLP6JH0DOQoIaqfTz0q4KFhjAW0hyu4ZT92ZGjRKlJEpnKBOkn9Vcryf+0Xqbj835OeZpwvFsUZwxqAUsDYdDKgnWbGIAwpKau0L8hCTC2nyLY0wI/j5Hty1vcD3gpvTRueisqMG9sABaIAnIEOuALXoAsw+LCWrS3LtT7tNbtu785Kbavq2QG/wt7/AnfHs8w=</latexit><latexit sha1_base64="sOWEXS8MzaTz0P7uQzv/1XdUk=">ACY3icbVFbS8MwGE3rbdZbnb6JEBzCxFnaIej0BcfFZwK6xhplmpmpQknczSP+mb74P0y3Cur8IHC+c75LchKljCrt+WvbC4tLxSW3XW1jc2t9zt+p0SmcSkiwUT8iFCijDKSVdTzchDKglKIkbuo+fLUr8fE6mo4Ld6kpJ+gh45jSlG2lAD97U5HuT8JD0OWqOiBaukPU08z5sxLTgqjqATMhJrJKV4gU4zFGZsuTX/HlCYljm2XwP+i0VxdHAbfiePw04D4IKNEAV1wP3LRwKnCWEa8yQUr3AT3U/R1JTzEjhJkiKcLP6JH0DOQoIaqfTz0q4KFhjAW0hyu4ZT92ZGjRKlJEpnKBOkn9Vcryf+0Xqbj835OeZpwvFsUZwxqAUsDYdDKgnWbGIAwpKau0L8hCTC2nyLY0wI/j5Hty1vcD3gpvTRueisqMG9sABaIAnIEOuALXoAsw+LCWrS3LtT7tNbtu785Kbavq2QG/wt7/AnfHs8w=</latexit><latexit sha1_base64="sOWEXS8MzaTz0P7uQzv/1XdUk=">ACY3icbVFbS8MwGE3rbdZbnb6JEBzCxFnaIej0BcfFZwK6xhplmpmpQknczSP+mb74P0y3Cur8IHC+c75LchKljCrt+WvbC4tLxSW3XW1jc2t9zt+p0SmcSkiwUT8iFCijDKSVdTzchDKglKIkbuo+fLUr8fE6mo4Ld6kpJ+gh45jSlG2lAD97U5HuT8JD0OWqOiBaukPU08z5sxLTgqjqATMhJrJKV4gU4zFGZsuTX/HlCYljm2XwP+i0VxdHAbfiePw04D4IKNEAV1wP3LRwKnCWEa8yQUr3AT3U/R1JTzEjhJkiKcLP6JH0DOQoIaqfTz0q4KFhjAW0hyu4ZT92ZGjRKlJEpnKBOkn9Vcryf+0Xqbj835OeZpwvFsUZwxqAUsDYdDKgnWbGIAwpKau0L8hCTC2nyLY0wI/j5Hty1vcD3gpvTRueisqMG9sABaIAnIEOuALXoAsw+LCWrS3LtT7tNbtu785Kbavq2QG/wt7/AnfHs8w=</latexit><latexit sha1_base64="sOWEXS8MzaTz0P7uQzv/1XdUk=">ACY3icbVFbS8MwGE3rbdZbnb6JEBzCxFnaIej0BcfFZwK6xhplmpmpQknczSP+mb74P0y3Cur8IHC+c75LchKljCrt+WvbC4tLxSW3XW1jc2t9zt+p0SmcSkiwUT8iFCijDKSVdTzchDKglKIkbuo+fLUr8fE6mo4Ld6kpJ+gh45jSlG2lAD97U5HuT8JD0OWqOiBaukPU08z5sxLTgqjqATMhJrJKV4gU4zFGZsuTX/HlCYljm2XwP+i0VxdHAbfiePw04D4IKNEAV1wP3LRwKnCWEa8yQUr3AT3U/R1JTzEjhJkiKcLP6JH0DOQoIaqfTz0q4KFhjAW0hyu4ZT92ZGjRKlJEpnKBOkn9Vcryf+0Xqbj835OeZpwvFsUZwxqAUsDYdDKgnWbGIAwpKau0L8hCTC2nyLY0wI/j5Hty1vcD3gpvTRueisqMG9sABaIAnIEOuALXoAsw+LCWrS3LtT7tNbtu785Kbavq2QG/wt7/AnfHs8w=</latexit>

Vanilla SGD Momentum SGD

u0

n = un + ( n

X

i=np+1

mnivi −

n

X

i=np+1

mnivi)

<latexit sha1_base64="CZBoh8gRXrObsASrE9H2cIQwW3Y=">ACXicfVHPS8MwGE2r01nHrw4CU41MnYaEXQy0D04lHBucE2S5plWzBJS34Io/Sf9KYX/xXTuoNO8YOEx3vy5e8RAmjSv+m+OurJbW1sb3mZlq7pd29l9VLGRmHRxzGLZj5AijArS1VQz0k8kQTxipBc93+R674VIRWPxoOcJGXE0FXRCMdKWCmsampMwFRn04FId6AplCZsDJXhYUo7opU0g+wpZ7ndWzSDw9gen09PX6wjy2AL/uMuPKdhre63/aLgbxAsQB0s6i6svQ7HMTacCI0ZUmoQ+IkepUhqihnJvKFRJEH4GU3JwEKBOFGjtEgng0eWGcNJLO0SGhbs94UcaXmPLJOjvRMLWs5+Zc2MHpyOUqpSIwmAn8NmhgGdQzqOGYSoI1m1uAsKT2rhDPkERY2w/xbAjB8pN/g8ezduC3g/vz+tX1Io4yOACHoAECcAGuwC24A12AwbsDnA3Hcz7ckltxq19W1n07IEf5e5/Auadr68=</latexit><latexit sha1_base64="CZBoh8gRXrObsASrE9H2cIQwW3Y=">ACXicfVHPS8MwGE2r01nHrw4CU41MnYaEXQy0D04lHBucE2S5plWzBJS34Io/Sf9KYX/xXTuoNO8YOEx3vy5e8RAmjSv+m+OurJbW1sb3mZlq7pd29l9VLGRmHRxzGLZj5AijArS1VQz0k8kQTxipBc93+R674VIRWPxoOcJGXE0FXRCMdKWCmsampMwFRn04FId6AplCZsDJXhYUo7opU0g+wpZ7ndWzSDw9gen09PX6wjy2AL/uMuPKdhre63/aLgbxAsQB0s6i6svQ7HMTacCI0ZUmoQ+IkepUhqihnJvKFRJEH4GU3JwEKBOFGjtEgng0eWGcNJLO0SGhbs94UcaXmPLJOjvRMLWs5+Zc2MHpyOUqpSIwmAn8NmhgGdQzqOGYSoI1m1uAsKT2rhDPkERY2w/xbAjB8pN/g8ezduC3g/vz+tX1Io4yOACHoAECcAGuwC24A12AwbsDnA3Hcz7ckltxq19W1n07IEf5e5/Auadr68=</latexit><latexit sha1_base64="CZBoh8gRXrObsASrE9H2cIQwW3Y=">ACXicfVHPS8MwGE2r01nHrw4CU41MnYaEXQy0D04lHBucE2S5plWzBJS34Io/Sf9KYX/xXTuoNO8YOEx3vy5e8RAmjSv+m+OurJbW1sb3mZlq7pd29l9VLGRmHRxzGLZj5AijArS1VQz0k8kQTxipBc93+R674VIRWPxoOcJGXE0FXRCMdKWCmsampMwFRn04FId6AplCZsDJXhYUo7opU0g+wpZ7ndWzSDw9gen09PX6wjy2AL/uMuPKdhre63/aLgbxAsQB0s6i6svQ7HMTacCI0ZUmoQ+IkepUhqihnJvKFRJEH4GU3JwEKBOFGjtEgng0eWGcNJLO0SGhbs94UcaXmPLJOjvRMLWs5+Zc2MHpyOUqpSIwmAn8NmhgGdQzqOGYSoI1m1uAsKT2rhDPkERY2w/xbAjB8pN/g8ezduC3g/vz+tX1Io4yOACHoAECcAGuwC24A12AwbsDnA3Hcz7ckltxq19W1n07IEf5e5/Auadr68=</latexit><latexit sha1_base64="CZBoh8gRXrObsASrE9H2cIQwW3Y=">ACXicfVHPS8MwGE2r01nHrw4CU41MnYaEXQy0D04lHBucE2S5plWzBJS34Io/Sf9KYX/xXTuoNO8YOEx3vy5e8RAmjSv+m+OurJbW1sb3mZlq7pd29l9VLGRmHRxzGLZj5AijArS1VQz0k8kQTxipBc93+R674VIRWPxoOcJGXE0FXRCMdKWCmsampMwFRn04FId6AplCZsDJXhYUo7opU0g+wpZ7ndWzSDw9gen09PX6wjy2AL/uMuPKdhre63/aLgbxAsQB0s6i6svQ7HMTacCI0ZUmoQ+IkepUhqihnJvKFRJEH4GU3JwEKBOFGjtEgng0eWGcNJLO0SGhbs94UcaXmPLJOjvRMLWs5+Zc2MHpyOUqpSIwmAn8NmhgGdQzqOGYSoI1m1uAsKT2rhDPkERY2w/xbAjB8pN/g8ezduC3g/vz+tX1Io4yOACHoAECcAGuwC24A12AwbsDnA3Hcz7ckltxq19W1n07IEf5e5/Auadr68=</latexit>

w0

n = wn + ( n1

X

i=np+1

vi −

n1

X

i=np

vi)

<latexit sha1_base64="XA+Ed8vwhnMcSv4jXajv/38YPSU=">ACTnicbVHNS8MwHE3n15xfVY9egkOdjI1WBL0Mhl48TnAfsM2SZtkWlqYlSTdG6V/oRbz5Z3jxoIhm3YS5+SDweO8lv+TFDRiVyrJejdTK6tr6Rnozs7W9s7tn7h/UpB8KTKrYZ75ouEgSRjmpKqoYaQSCIM9lpO4Obid+fUiEpD5/UOAtD3U47RLMVJackwyOnMiHsMXMBpCY4SJw9zLRl6TkRLvBDk7fgx4gU7hi1fnzsZGw21F8ewAOdyv6nEO3fMrFW0EsBlYs9IFsxQcyXVsfHoUe4wgxJ2bStQLUjJBTFjMSZVihJgPA9UhTU48ItRUkcMT7TSgV1f6MUVTNT5HRHypBx7rk56SPXlojcR/OaoepetyPKg1ARjqeDuiGDyoeTbmGHCoIVG2uCsKD6rhD3kUBY6R/I6BLsxScvk9pF0baK9v1ltnwzqyMNjsAxyAEbXIEyuAMVUAUYPIE38AE+jWfj3fgyvqfRlDHbcwj+IJX+ATZOr4k=</latexit><latexit sha1_base64="XA+Ed8vwhnMcSv4jXajv/38YPSU=">ACTnicbVHNS8MwHE3n15xfVY9egkOdjI1WBL0Mhl48TnAfsM2SZtkWlqYlSTdG6V/oRbz5Z3jxoIhm3YS5+SDweO8lv+TFDRiVyrJejdTK6tr6Rnozs7W9s7tn7h/UpB8KTKrYZ75ouEgSRjmpKqoYaQSCIM9lpO4Obid+fUiEpD5/UOAtD3U47RLMVJackwyOnMiHsMXMBpCY4SJw9zLRl6TkRLvBDk7fgx4gU7hi1fnzsZGw21F8ewAOdyv6nEO3fMrFW0EsBlYs9IFsxQcyXVsfHoUe4wgxJ2bStQLUjJBTFjMSZVihJgPA9UhTU48ItRUkcMT7TSgV1f6MUVTNT5HRHypBx7rk56SPXlojcR/OaoepetyPKg1ARjqeDuiGDyoeTbmGHCoIVG2uCsKD6rhD3kUBY6R/I6BLsxScvk9pF0baK9v1ltnwzqyMNjsAxyAEbXIEyuAMVUAUYPIE38AE+jWfj3fgyvqfRlDHbcwj+IJX+ATZOr4k=</latexit><latexit sha1_base64="XA+Ed8vwhnMcSv4jXajv/38YPSU=">ACTnicbVHNS8MwHE3n15xfVY9egkOdjI1WBL0Mhl48TnAfsM2SZtkWlqYlSTdG6V/oRbz5Z3jxoIhm3YS5+SDweO8lv+TFDRiVyrJejdTK6tr6Rnozs7W9s7tn7h/UpB8KTKrYZ75ouEgSRjmpKqoYaQSCIM9lpO4Obid+fUiEpD5/UOAtD3U47RLMVJackwyOnMiHsMXMBpCY4SJw9zLRl6TkRLvBDk7fgx4gU7hi1fnzsZGw21F8ewAOdyv6nEO3fMrFW0EsBlYs9IFsxQcyXVsfHoUe4wgxJ2bStQLUjJBTFjMSZVihJgPA9UhTU48ItRUkcMT7TSgV1f6MUVTNT5HRHypBx7rk56SPXlojcR/OaoepetyPKg1ARjqeDuiGDyoeTbmGHCoIVG2uCsKD6rhD3kUBY6R/I6BLsxScvk9pF0baK9v1ltnwzqyMNjsAxyAEbXIEyuAMVUAUYPIE38AE+jWfj3fgyvqfRlDHbcwj+IJX+ATZOr4k=</latexit><latexit sha1_base64="XA+Ed8vwhnMcSv4jXajv/38YPSU=">ACTnicbVHNS8MwHE3n15xfVY9egkOdjI1WBL0Mhl48TnAfsM2SZtkWlqYlSTdG6V/oRbz5Z3jxoIhm3YS5+SDweO8lv+TFDRiVyrJejdTK6tr6Rnozs7W9s7tn7h/UpB8KTKrYZ75ouEgSRjmpKqoYaQSCIM9lpO4Obid+fUiEpD5/UOAtD3U47RLMVJackwyOnMiHsMXMBpCY4SJw9zLRl6TkRLvBDk7fgx4gU7hi1fnzsZGw21F8ewAOdyv6nEO3fMrFW0EsBlYs9IFsxQcyXVsfHoUe4wgxJ2bStQLUjJBTFjMSZVihJgPA9UhTU48ItRUkcMT7TSgV1f6MUVTNT5HRHypBx7rk56SPXlojcR/OaoepetyPKg1ARjqeDuiGDyoeTbmGHCoIVG2uCsKD6rhD3kUBY6R/I6BLsxScvk9pF0baK9v1ltnwzqyMNjsAxyAEbXIEyuAMVUAUYPIE38AE+jWfj3fgyvqfRlDHbcwj+IJX+ATZOr4k=</latexit>

w0

n = wn + ( n

X

i=np i

X

j=np

mijvj −

n

X

i=np i

X

j=np

mijvj)

<latexit sha1_base64="nIr2laPQjblI0+QuL5yDedoc=">ADQ3icpVLPT9swGHUCDCgwynbcxaLil1CqBE3aLpWq7bJjkSgwNSVyXBdcbCeyHVAV5X/jsn9gt/0DXDiAEFcknKQHaIaExJMifX7P7/terC+MGVXadf9Z9szs3If5hcXa0vLKx9X62qdDFSUSky6OWCSPQ6QIo4J0NdWMHMeSIB4ychSe/8z1owsiFY3EgR7HpM/RqaBDipE2VLBm/b7cClKR1eAUNjZb8DJX4C7c9lXCg5S2hBNnJzlXEqOWmx8dz6EZ5CfpyAiRmZaHS+MIcugA9uLiw70PcraZ6HqYg5Xh1SEuUI6kwnHBUJ39mx6FPpsRPUG27TLQCrhTcpGmCTlD/6w8inHAiNGZIqZ7nxrqfIqkpZiSr+YkiMcLn6JT0TCkQJ6qfFjuQwQ3DOAwkuYTGhbsc0eKuFJjHpqbHOkzNa3l5P+0XqKH3/spFXGicDloGHCoI5gvlBwQCXBmo1NgbCkJivEZ0girM3a1cwjeNO/XC0O95qe2/T2vzbaPybPsQC+gHWwDTzwDbTBL9ABXYCtK+vaurXu7D/2jX1vP5RXbWvi+QxewH58AlDT+Fk=</latexit><latexit sha1_base64="nIr2laPQjblI0+QuL5yDedoc=">ADQ3icpVLPT9swGHUCDCgwynbcxaLil1CqBE3aLpWq7bJjkSgwNSVyXBdcbCeyHVAV5X/jsn9gt/0DXDiAEFcknKQHaIaExJMifX7P7/terC+MGVXadf9Z9szs3If5hcXa0vLKx9X62qdDFSUSky6OWCSPQ6QIo4J0NdWMHMeSIB4ychSe/8z1owsiFY3EgR7HpM/RqaBDipE2VLBm/b7cClKR1eAUNjZb8DJX4C7c9lXCg5S2hBNnJzlXEqOWmx8dz6EZ5CfpyAiRmZaHS+MIcugA9uLiw70PcraZ6HqYg5Xh1SEuUI6kwnHBUJ39mx6FPpsRPUG27TLQCrhTcpGmCTlD/6w8inHAiNGZIqZ7nxrqfIqkpZiSr+YkiMcLn6JT0TCkQJ6qfFjuQwQ3DOAwkuYTGhbsc0eKuFJjHpqbHOkzNa3l5P+0XqKH3/spFXGicDloGHCoI5gvlBwQCXBmo1NgbCkJivEZ0girM3a1cwjeNO/XC0O95qe2/T2vzbaPybPsQC+gHWwDTzwDbTBL9ABXYCtK+vaurXu7D/2jX1vP5RXbWvi+QxewH58AlDT+Fk=</latexit><latexit sha1_base64="nIr2laPQjblI0+QuL5yDedoc=">ADQ3icpVLPT9swGHUCDCgwynbcxaLil1CqBE3aLpWq7bJjkSgwNSVyXBdcbCeyHVAV5X/jsn9gt/0DXDiAEFcknKQHaIaExJMifX7P7/terC+MGVXadf9Z9szs3If5hcXa0vLKx9X62qdDFSUSky6OWCSPQ6QIo4J0NdWMHMeSIB4ychSe/8z1owsiFY3EgR7HpM/RqaBDipE2VLBm/b7cClKR1eAUNjZb8DJX4C7c9lXCg5S2hBNnJzlXEqOWmx8dz6EZ5CfpyAiRmZaHS+MIcugA9uLiw70PcraZ6HqYg5Xh1SEuUI6kwnHBUJ39mx6FPpsRPUG27TLQCrhTcpGmCTlD/6w8inHAiNGZIqZ7nxrqfIqkpZiSr+YkiMcLn6JT0TCkQJ6qfFjuQwQ3DOAwkuYTGhbsc0eKuFJjHpqbHOkzNa3l5P+0XqKH3/spFXGicDloGHCoI5gvlBwQCXBmo1NgbCkJivEZ0girM3a1cwjeNO/XC0O95qe2/T2vzbaPybPsQC+gHWwDTzwDbTBL9ABXYCtK+vaurXu7D/2jX1vP5RXbWvi+QxewH58AlDT+Fk=</latexit><latexit sha1_base64="nIr2laPQjblI0+QuL5yDedoc=">ADQ3icpVLPT9swGHUCDCgwynbcxaLil1CqBE3aLpWq7bJjkSgwNSVyXBdcbCeyHVAV5X/jsn9gt/0DXDiAEFcknKQHaIaExJMifX7P7/terC+MGVXadf9Z9szs3If5hcXa0vLKx9X62qdDFSUSky6OWCSPQ6QIo4J0NdWMHMeSIB4ychSe/8z1owsiFY3EgR7HpM/RqaBDipE2VLBm/b7cClKR1eAUNjZb8DJX4C7c9lXCg5S2hBNnJzlXEqOWmx8dz6EZ5CfpyAiRmZaHS+MIcugA9uLiw70PcraZ6HqYg5Xh1SEuUI6kwnHBUJ39mx6FPpsRPUG27TLQCrhTcpGmCTlD/6w8inHAiNGZIqZ7nxrqfIqkpZiSr+YkiMcLn6JT0TCkQJ6qfFjuQwQ3DOAwkuYTGhbsc0eKuFJjHpqbHOkzNa3l5P+0XqKH3/spFXGicDloGHCoI5gvlBwQCXBmo1NgbCkJivEZ0girM3a1cwjeNO/XC0O95qe2/T2vzbaPybPsQC+gHWwDTzwDbTBL9ABXYCtK+vaurXu7D/2jX1vP5RXbWvi+QxewH58AlDT+Fk=</latexit>
slide-18
SLIDE 18

Temporally Sparse Update: reduce sync frequency

18

Naive Distributed SGD Temporally Sparse Update

Tcommunicate − Toverlap

<latexit sha1_base64="w7WDBTtzDEAJpjPm6TJnQTGwBQs=">ACFXicbVDLSgNBEJz1bXxFPXoZDIHDbsi6FH04jFCoEkhNlJxmcxzLTK4Zlf8KLv+LFgyJeBW/+jZMY8BELGoq7pnuihMpHIbhRzA1PTM7N7+wWFhaXldK65vXDqTWg41bqSx9Zg5kEJDQVKqCcWmIolXMXZ0P/6gasE0ZXcZBAS7GeFl3BGXqpXdyj1XbWRLjFjBulUj10IM/p/rdh/AOSJXneLpbCcjgCnSTRmJTIGJV28b3ZMTxVoJFL5lwjChNsZcyi4BLyQjN1kDB+zXrQ8FQzBa6Vja7K6Y5XOrRrC+NdKT+nMiYcm6gYt+pGPbdX28o/uc1UuwetzKhkxRB86+PuqmkaOgwItoRFjKgSeMW+F3pbzPLOPogyz4EK/J0+Sy4NyFJaji8PSyek4jgWyRbJLonIETkh56RCaoSTO/JAnshzcB8Bi/B61frVDCe2S/ELx9AhPoAY=</latexit><latexit sha1_base64="w7WDBTtzDEAJpjPm6TJnQTGwBQs=">ACFXicbVDLSgNBEJz1bXxFPXoZDIHDbsi6FH04jFCoEkhNlJxmcxzLTK4Zlf8KLv+LFgyJeBW/+jZMY8BELGoq7pnuihMpHIbhRzA1PTM7N7+wWFhaXldK65vXDqTWg41bqSx9Zg5kEJDQVKqCcWmIolXMXZ0P/6gasE0ZXcZBAS7GeFl3BGXqpXdyj1XbWRLjFjBulUj10IM/p/rdh/AOSJXneLpbCcjgCnSTRmJTIGJV28b3ZMTxVoJFL5lwjChNsZcyi4BLyQjN1kDB+zXrQ8FQzBa6Vja7K6Y5XOrRrC+NdKT+nMiYcm6gYt+pGPbdX28o/uc1UuwetzKhkxRB86+PuqmkaOgwItoRFjKgSeMW+F3pbzPLOPogyz4EK/J0+Sy4NyFJaji8PSyek4jgWyRbJLonIETkh56RCaoSTO/JAnshzcB8Bi/B61frVDCe2S/ELx9AhPoAY=</latexit><latexit sha1_base64="w7WDBTtzDEAJpjPm6TJnQTGwBQs=">ACFXicbVDLSgNBEJz1bXxFPXoZDIHDbsi6FH04jFCoEkhNlJxmcxzLTK4Zlf8KLv+LFgyJeBW/+jZMY8BELGoq7pnuihMpHIbhRzA1PTM7N7+wWFhaXldK65vXDqTWg41bqSx9Zg5kEJDQVKqCcWmIolXMXZ0P/6gasE0ZXcZBAS7GeFl3BGXqpXdyj1XbWRLjFjBulUj10IM/p/rdh/AOSJXneLpbCcjgCnSTRmJTIGJV28b3ZMTxVoJFL5lwjChNsZcyi4BLyQjN1kDB+zXrQ8FQzBa6Vja7K6Y5XOrRrC+NdKT+nMiYcm6gYt+pGPbdX28o/uc1UuwetzKhkxRB86+PuqmkaOgwItoRFjKgSeMW+F3pbzPLOPogyz4EK/J0+Sy4NyFJaji8PSyek4jgWyRbJLonIETkh56RCaoSTO/JAnshzcB8Bi/B61frVDCe2S/ELx9AhPoAY=</latexit><latexit sha1_base64="w7WDBTtzDEAJpjPm6TJnQTGwBQs=">ACFXicbVDLSgNBEJz1bXxFPXoZDIHDbsi6FH04jFCoEkhNlJxmcxzLTK4Zlf8KLv+LFgyJeBW/+jZMY8BELGoq7pnuihMpHIbhRzA1PTM7N7+wWFhaXldK65vXDqTWg41bqSx9Zg5kEJDQVKqCcWmIolXMXZ0P/6gasE0ZXcZBAS7GeFl3BGXqpXdyj1XbWRLjFjBulUj10IM/p/rdh/AOSJXneLpbCcjgCnSTRmJTIGJV28b3ZMTxVoJFL5lwjChNsZcyi4BLyQjN1kDB+zXrQ8FQzBa6Vja7K6Y5XOrRrC+NdKT+nMiYcm6gYt+pGPbdX28o/uc1UuwetzKhkxRB86+PuqmkaOgwItoRFjKgSeMW+F3pbzPLOPogyz4EK/J0+Sy4NyFJaji8PSyek4jgWyRbJLonIETkh56RCaoSTO/JAnshzcB8Bi/B61frVDCe2S/ELx9AhPoAY=</latexit>

Temporal Sparsity alleviates congestion and improves amortized latency / bandwidth.

Tcommunicate − Toverlap P

<latexit sha1_base64="MiKVnO5WutOCFBmuaHg0joc/OY=">ACHnicbVDJSgNBEO1xN25Rj14ag+DFMCOKHoNePEYwUhC6OnUaGMvQ3eNGIb5Ei/+ihcPigie9G/sLIjbg4LHe1XdVS9OpXAYh/BxOTU9Mzs3HxpYXFpeaW8utZ0JrMcGtxIYy9i5kAKDQ0UKOEitcBULOE8vj4e+Oc3YJ0w+gz7KXQUu9QiEZyhl7rl/XZiGc/Punkb4RZzbpTK9MCGoqA79Msw/hXJ0qIo8nrRLVfCajgE/UuiMamQMerd8lu7Z3imQCOXzLlWFKbYyZlFwSUpXbmIGX8ml1Cy1PNFLhOPjyvoFte6dHEWF8a6VD9PpEz5Vxfxb5TMbxyv72B+J/XyjA57ORCpxmC5qOPkxSNHSQFe0JCxl3xPGrfC7Un7FfF7oEy35EKLfJ/8lzd1qFaj071K7WgcxzZIJtkm0TkgNTICamTBuHkjyQJ/Ic3AePwUvwOmqdCMYz6+QHgvdPVY6keA=</latexit><latexit sha1_base64="MiKVnO5WutOCFBmuaHg0joc/OY=">ACHnicbVDJSgNBEO1xN25Rj14ag+DFMCOKHoNePEYwUhC6OnUaGMvQ3eNGIb5Ei/+ihcPigie9G/sLIjbg4LHe1XdVS9OpXAYh/BxOTU9Mzs3HxpYXFpeaW8utZ0JrMcGtxIYy9i5kAKDQ0UKOEitcBULOE8vj4e+Oc3YJ0w+gz7KXQUu9QiEZyhl7rl/XZiGc/Punkb4RZzbpTK9MCGoqA79Msw/hXJ0qIo8nrRLVfCajgE/UuiMamQMerd8lu7Z3imQCOXzLlWFKbYyZlFwSUpXbmIGX8ml1Cy1PNFLhOPjyvoFte6dHEWF8a6VD9PpEz5Vxfxb5TMbxyv72B+J/XyjA57ORCpxmC5qOPkxSNHSQFe0JCxl3xPGrfC7Un7FfF7oEy35EKLfJ/8lzd1qFaj071K7WgcxzZIJtkm0TkgNTICamTBuHkjyQJ/Ic3AePwUvwOmqdCMYz6+QHgvdPVY6keA=</latexit><latexit sha1_base64="MiKVnO5WutOCFBmuaHg0joc/OY=">ACHnicbVDJSgNBEO1xN25Rj14ag+DFMCOKHoNePEYwUhC6OnUaGMvQ3eNGIb5Ei/+ihcPigie9G/sLIjbg4LHe1XdVS9OpXAYh/BxOTU9Mzs3HxpYXFpeaW8utZ0JrMcGtxIYy9i5kAKDQ0UKOEitcBULOE8vj4e+Oc3YJ0w+gz7KXQUu9QiEZyhl7rl/XZiGc/Punkb4RZzbpTK9MCGoqA79Msw/hXJ0qIo8nrRLVfCajgE/UuiMamQMerd8lu7Z3imQCOXzLlWFKbYyZlFwSUpXbmIGX8ml1Cy1PNFLhOPjyvoFte6dHEWF8a6VD9PpEz5Vxfxb5TMbxyv72B+J/XyjA57ORCpxmC5qOPkxSNHSQFe0JCxl3xPGrfC7Un7FfF7oEy35EKLfJ/8lzd1qFaj071K7WgcxzZIJtkm0TkgNTICamTBuHkjyQJ/Ic3AePwUvwOmqdCMYz6+QHgvdPVY6keA=</latexit><latexit sha1_base64="MiKVnO5WutOCFBmuaHg0joc/OY=">ACHnicbVDJSgNBEO1xN25Rj14ag+DFMCOKHoNePEYwUhC6OnUaGMvQ3eNGIb5Ei/+ihcPigie9G/sLIjbg4LHe1XdVS9OpXAYh/BxOTU9Mzs3HxpYXFpeaW8utZ0JrMcGtxIYy9i5kAKDQ0UKOEitcBULOE8vj4e+Oc3YJ0w+gz7KXQUu9QiEZyhl7rl/XZiGc/Punkb4RZzbpTK9MCGoqA79Msw/hXJ0qIo8nrRLVfCajgE/UuiMamQMerd8lu7Z3imQCOXzLlWFKbYyZlFwSUpXbmIGX8ml1Cy1PNFLhOPjyvoFte6dHEWF8a6VD9PpEz5Vxfxb5TMbxyv72B+J/XyjA57ORCpxmC5qOPkxSNHSQFe0JCxl3xPGrfC7Un7FfF7oEy35EKLfJ/8lzd1qFaj071K7WgcxzZIJtkm0TkgNTICamTBuHkjyQJ/Ic3AePwUvwOmqdCMYz6+QHgvdPVY6keA=</latexit>

Wait time: Bandwidth:

||W|| Ttraining

<latexit sha1_base64="px09ded7rXNbZUcY1FqQHa2X4A=">ACnicbVBNS8NAEN3Ur1q/oh69RIvgqSQi6LHoxWOFfkFTwma7aZduNmF3IpYkZy/+FS8eFPHqL/Dmv3Hb5qCtDwYe780wM8+POVNg29GaWV1bX2jvFnZ2t7Z3TP3D9oqSiShLRLxSHZ9rChngraAafdWFIc+px2/PHN1O/cU6lYJowiWk/xEPBAkYwaMkzj91AYpJmWSfL8rTpS7QB0hBYiaYGOZ57plVu2bPYC0TpyBVKDhmV/uICJSAUQjpXqOXYM/RLYITvOImisaYjPGQ9jQVOKSqn85eya1TrQysIJK6BFgz9fdEikOlJqGvO0MI7XoTcX/vF4CwVU/ZSJOgAoyXxQk3ILImuZiDZikBPhE0wk07daZIR1NqDTq+gQnMWXl0n7vObYNefuolq/LuIoyN0gs6Qgy5RHd2iBmohgh7RM3pFb8aT8WK8Gx/z1pJRzByiPzA+fwDd4JxE</latexit><latexit sha1_base64="px09ded7rXNbZUcY1FqQHa2X4A=">ACnicbVBNS8NAEN3Ur1q/oh69RIvgqSQi6LHoxWOFfkFTwma7aZduNmF3IpYkZy/+FS8eFPHqL/Dmv3Hb5qCtDwYe780wM8+POVNg29GaWV1bX2jvFnZ2t7Z3TP3D9oqSiShLRLxSHZ9rChngraAafdWFIc+px2/PHN1O/cU6lYJowiWk/xEPBAkYwaMkzj91AYpJmWSfL8rTpS7QB0hBYiaYGOZ57plVu2bPYC0TpyBVKDhmV/uICJSAUQjpXqOXYM/RLYITvOImisaYjPGQ9jQVOKSqn85eya1TrQysIJK6BFgz9fdEikOlJqGvO0MI7XoTcX/vF4CwVU/ZSJOgAoyXxQk3ILImuZiDZikBPhE0wk07daZIR1NqDTq+gQnMWXl0n7vObYNefuolq/LuIoyN0gs6Qgy5RHd2iBmohgh7RM3pFb8aT8WK8Gx/z1pJRzByiPzA+fwDd4JxE</latexit><latexit sha1_base64="px09ded7rXNbZUcY1FqQHa2X4A=">ACnicbVBNS8NAEN3Ur1q/oh69RIvgqSQi6LHoxWOFfkFTwma7aZduNmF3IpYkZy/+FS8eFPHqL/Dmv3Hb5qCtDwYe780wM8+POVNg29GaWV1bX2jvFnZ2t7Z3TP3D9oqSiShLRLxSHZ9rChngraAafdWFIc+px2/PHN1O/cU6lYJowiWk/xEPBAkYwaMkzj91AYpJmWSfL8rTpS7QB0hBYiaYGOZ57plVu2bPYC0TpyBVKDhmV/uICJSAUQjpXqOXYM/RLYITvOImisaYjPGQ9jQVOKSqn85eya1TrQysIJK6BFgz9fdEikOlJqGvO0MI7XoTcX/vF4CwVU/ZSJOgAoyXxQk3ILImuZiDZikBPhE0wk07daZIR1NqDTq+gQnMWXl0n7vObYNefuolq/LuIoyN0gs6Qgy5RHd2iBmohgh7RM3pFb8aT8WK8Gx/z1pJRzByiPzA+fwDd4JxE</latexit><latexit sha1_base64="px09ded7rXNbZUcY1FqQHa2X4A=">ACnicbVBNS8NAEN3Ur1q/oh69RIvgqSQi6LHoxWOFfkFTwma7aZduNmF3IpYkZy/+FS8eFPHqL/Dmv3Hb5qCtDwYe780wM8+POVNg29GaWV1bX2jvFnZ2t7Z3TP3D9oqSiShLRLxSHZ9rChngraAafdWFIc+px2/PHN1O/cU6lYJowiWk/xEPBAkYwaMkzj91AYpJmWSfL8rTpS7QB0hBYiaYGOZ57plVu2bPYC0TpyBVKDhmV/uICJSAUQjpXqOXYM/RLYITvOImisaYjPGQ9jQVOKSqn85eya1TrQysIJK6BFgz9fdEikOlJqGvO0MI7XoTcX/vF4CwVU/ZSJOgAoyXxQk3ILImuZiDZikBPhE0wk07daZIR1NqDTq+gQnMWXl0n7vObYNefuolq/LuIoyN0gs6Qgy5RHd2iBmohgh7RM3pFb8aT8WK8Gx/z1pJRzByiPzA+fwDd4JxE</latexit>

||W|| P × Ttraining

<latexit sha1_base64="Mdlw5DPIMQrApedY4I1Rzu8G6yo=">ACE3icbVDLSsNAFJ3UV62vqks3g0UQFyURQZdFNy4r9AVNCZPpB06mYSZG7Ek+Qc3/obF4q4dePOv3H6WGjrgQuHc+7l3nv8WHANtv1tFVZW19Y3ipulre2d3b3y/kFLR4mirEkjEamOTzQTXLImcBCsEytGQl+wtj+6mfjte6Y0j2QDxjHrhWQgecApASN5TM3UISmWdbOsjytYxd4yDRueKkL7AFSUIRLgd5nvlil21p8DLxJmTCpqj7pW/3H5Ek5BJoIJo3XsGHopUcCpYHnJTSLCR2RAesaKonZ3EunP+X4xCh9HETKlAQ8VX9PpCTUehz6pjMkMNSL3kT8z+smEFz1Ui7jBJiks0VBIjBEeBIQ7nPFKIixIYQqbm7FdEhMSGBiLJkQnMWXl0nrvOrYVefuolK7nsdREfoGJ0iB12iGrpFdREFD2iZ/SK3qwn68V6tz5mrQVrPnOI/sD6/AFLcp+s</latexit><latexit sha1_base64="Mdlw5DPIMQrApedY4I1Rzu8G6yo=">ACE3icbVDLSsNAFJ3UV62vqks3g0UQFyURQZdFNy4r9AVNCZPpB06mYSZG7Ek+Qc3/obF4q4dePOv3H6WGjrgQuHc+7l3nv8WHANtv1tFVZW19Y3ipulre2d3b3y/kFLR4mirEkjEamOTzQTXLImcBCsEytGQl+wtj+6mfjte6Y0j2QDxjHrhWQgecApASN5TM3UISmWdbOsjytYxd4yDRueKkL7AFSUIRLgd5nvlil21p8DLxJmTCpqj7pW/3H5Ek5BJoIJo3XsGHopUcCpYHnJTSLCR2RAesaKonZ3EunP+X4xCh9HETKlAQ8VX9PpCTUehz6pjMkMNSL3kT8z+smEFz1Ui7jBJiks0VBIjBEeBIQ7nPFKIixIYQqbm7FdEhMSGBiLJkQnMWXl0nrvOrYVefuolK7nsdREfoGJ0iB12iGrpFdREFD2iZ/SK3qwn68V6tz5mrQVrPnOI/sD6/AFLcp+s</latexit><latexit sha1_base64="Mdlw5DPIMQrApedY4I1Rzu8G6yo=">ACE3icbVDLSsNAFJ3UV62vqks3g0UQFyURQZdFNy4r9AVNCZPpB06mYSZG7Ek+Qc3/obF4q4dePOv3H6WGjrgQuHc+7l3nv8WHANtv1tFVZW19Y3ipulre2d3b3y/kFLR4mirEkjEamOTzQTXLImcBCsEytGQl+wtj+6mfjte6Y0j2QDxjHrhWQgecApASN5TM3UISmWdbOsjytYxd4yDRueKkL7AFSUIRLgd5nvlil21p8DLxJmTCpqj7pW/3H5Ek5BJoIJo3XsGHopUcCpYHnJTSLCR2RAesaKonZ3EunP+X4xCh9HETKlAQ8VX9PpCTUehz6pjMkMNSL3kT8z+smEFz1Ui7jBJiks0VBIjBEeBIQ7nPFKIixIYQqbm7FdEhMSGBiLJkQnMWXl0nrvOrYVefuolK7nsdREfoGJ0iB12iGrpFdREFD2iZ/SK3qwn68V6tz5mrQVrPnOI/sD6/AFLcp+s</latexit><latexit sha1_base64="Mdlw5DPIMQrApedY4I1Rzu8G6yo=">ACE3icbVDLSsNAFJ3UV62vqks3g0UQFyURQZdFNy4r9AVNCZPpB06mYSZG7Ek+Qc3/obF4q4dePOv3H6WGjrgQuHc+7l3nv8WHANtv1tFVZW19Y3ipulre2d3b3y/kFLR4mirEkjEamOTzQTXLImcBCsEytGQl+wtj+6mfjte6Y0j2QDxjHrhWQgecApASN5TM3UISmWdbOsjytYxd4yDRueKkL7AFSUIRLgd5nvlil21p8DLxJmTCpqj7pW/3H5Ek5BJoIJo3XsGHopUcCpYHnJTSLCR2RAesaKonZ3EunP+X4xCh9HETKlAQ8VX9PpCTUehz6pjMkMNSL3kT8z+smEFz1Ui7jBJiks0VBIjBEeBIQ7nPFKIixIYQqbm7FdEhMSGBiLJkQnMWXl0nrvOrYVefuolK7nsdREfoGJ0iB12iGrpFdREFD2iZ/SK3qwn68V6tz5mrQVrPnOI/sD6/AFLcp+s</latexit>
slide-19
SLIDE 19

Temporally Sparse Update: reduce sync frequency

19

Congestion: solved Bandwidth: (900 / P)Mb/s However, even for temporal_sparse=20, 45Mb/s is not always achievable, e.g., cross continent connection.

slide-20
SLIDE 20

Results

20

Scalability

0.00 0.20 0.40 0.60 0.80

Network Latency (ms)

1 5 10 50 100 500 1000 5000

  • riginal

delay=4 delay=8 delay=12 delay=16 delay=20

slide-21
SLIDE 21

Results

21

  • p3.8x Instances on AWS (4 x V100)
  • 4 instances at 4 different places
  • Ohio, California, Tokyo, London
  • Bandwidth: ~15Mb/s Latency: ~300ms
  • The scalability of naive training: 0.02
  • The scalability of DTC training: 0.72 (36x better!)
slide-22
SLIDE 22

Deep Leakage from Gradients

22

Ligeng Zhu, Zhijian Liu, Song Han Neurips 19

slide-23
SLIDE 23

Is gradient safe to share?

23

Pred: cat Pred: dog

Differentiable Model

…… ……

Loss

tensor([[[[-5.3668e+01, -1.0342e+01, -3.1377e+00], [-7.5185e-01, 1.6444e+01, -2.1058e+01], [-8.7487e+00, -5.0473e+00, -5.5008e+00]],

Gradients

? Private Public

slide-24
SLIDE 24

Gradient is not safe to share!

24

Pred: cat Pred: dog

Differentiable Model

…… ……

Loss

tensor([[[[-5.3668e+01, -1.0342e+01, -3.1377e+00], [-7.5185e-01, 1.6444e+01, -2.1058e+01], [-8.7487e+00, -5.0473e+00, -5.5008e+00]],

Gradients

Private Public

slide-25
SLIDE 25

Conventional Shallow Leakage

25 tensor([[[[-5.3668e+01, -1.0342e+01, -3.1377e+00], [-7.5185e-01, 1.6444e+01, -2.1058e+01], [-8.7487e+00, -5.0473e+00, -5.5008e+00]],

Gradients Membership Inference Whether a record is used in the batch. Property Inference Whether a sample with certain property is in the batch.

[1] L. Melis, C. Song, E. D. Cristofaro, and V. Shmatikov. Exploiting unintended feature leakage in collaborative learning. [2] R. Shokri, M. Stronati, C. Song, and V. Shmatikov. Membership inference attacks against machine learning models.

But, can we obtain the original training data?

slide-26
SLIDE 26

Deep Leakage from Gradients

26

Pred: cat

Differentiable Model

Loss

Gradients Normal Training: forward-backward, update model weights

slide-27
SLIDE 27

Deep Leakage from Gradients

27

Pred: cat

Differentiable Model

Loss

Gradients Normal Training: forward-backward, update model weights

Pred: [random]

Differentiable Model

Loss

Gradients Normal Training: forward-backward, update the inputs

MSE

slide-28
SLIDE 28

Recovering Visualization (bs=1)

28

Model: ResNet18 Dataset: CIFAR100 Optimizer: LBFGS 300 iters

slide-29
SLIDE 29

Recovering Visualization (bs=8)

29

Model: ResNet18 Dataset: CIFAR100 Optimizer: LBFGS 300 iters

slide-30
SLIDE 30

Experiment on Bert

30

  • For discrete word, the embeddings are taken as input.

GT: 【[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]】 DLG:【. who is jim henson ? . jim henson was a puppet ##eer .】 Random Init:【a2 furnished angel compromise springsteen ##lice ##ulated sal ##n ##ory moshe unitary ##tori commercial】

slide-31
SLIDE 31

Experiment on Bert

31

Iters=0: tilting fill given **less word **itude fine **nton overheard living vegas **vac **vation *f forte **dis cerambycidae ellison **don yards marne **kali Iters=10: tilting fill given **less full solicitor other ligue shrill living vegas rider treatment carry played sculptures lifelong ellison net yards marne **kali Iters=20: registration , volunteer applications , at student travel application

  • pen the ; week of played ; child care will be glare .

Iters=30: registration, volunteer applications, and student travel application open the first week of september . child care will be available Original text: Registration, volunteer applications, and student travel application open the first week of September. Child care will be available.

slide-32
SLIDE 32

Defense Strategy

32

200 400 600 800 1000 1200 Iterations 0.000 0.025 0.050 0.075 0.100 0.125 0.150 Gradient Match Loss

  • riginal

gaussian-10−4 gaussian-10−3 gaussian-10−2 gaussian-10−1

Deep Leakage Leak with artifacts No leak

200 400 600 800 1000 1200 Iterations 0.000 0.025 0.050 0.075 0.100 0.125 0.150 Gradient Match Loss

  • riginal

laplacian-10−4 laplacian-10−3 laplacian-10−2 laplacian-10−1

Deep Leakage Leak with artifacts No leak

slide-33
SLIDE 33

Defense Strategy

33

200 400 600 800 1000 1200 Iterations 0.000 0.025 0.050 0.075 0.100 0.125 0.150 Gradient Match Loss

  • riginal

prune-ratio-1% prune-ratio-10% prune-ratio-20% prune-ratio-30% prune-ratio-50% prune-ratio-70%

Deep Leakage Leak with artifacts No leak

200 400 600 800 1000 1200 Iterations 0.000 0.025 0.050 0.075 0.100 0.125 0.150 Gradient Match Loss

  • riginal

IEEE-fp16 B-fp16

Deep Leakage

slide-34
SLIDE 34

34

Thank you!

Advertisement: I am applying for Ph.D.