Local SGD for non-i.i.d. data Konstantin Mishchenko Work done - - PowerPoint PPT Presentation

local sgd for non i i d data
SMART_READER_LITE
LIVE PREVIEW

Local SGD for non-i.i.d. data Konstantin Mishchenko Work done - - PowerPoint PPT Presentation

Local SGD for non-i.i.d. data Konstantin Mishchenko Work done together with Ahmed Khaled and Peter Richtrik <latexit


slide-1
SLIDE 1

Local SGD for non-i.i.d. data

Konstantin Mishchenko

Work done together with Ahmed Khaled and Peter Richtárik

slide-2
SLIDE 2

Problem

Convex

min

x

1 M

M

X

m=1

fm(x)

<latexit sha1_base64="DJWDtL5mSWrGw/2Afw1MWCqws=">ACDnicbVC7SgNBFJ2NrxhfUubwRCITdiNgjZC0MYmEME8IBuX2clsMmRmdpmZlYRlv8DGX7GxUMTW2s6/cfIoNPHAhcM593LvPX7EqNK2/W1lVlbX1jeym7mt7Z3dvfz+QVOFscSkgUMWyraPFGFUkIampF2JAniPiMtf3g98VsPRCoaijs9jkiXo76gAcVIG8nLF1OhTeCbiARTpw0qaWuirmX8Esnva/BwOl0YmXL9hlewq4TJw5KYA56l7+y+2FOZEaMyQUh3HjnQ3QVJTzEiac2NFIoSHqE86hgrEieom03dSWDRKDwahNCU0nKq/JxLElRpz3RypAdq0ZuI/3mdWAcX3YSKNZE4NmiIGZQh3CSDexRSbBmY0MQltTcCvEAmWC0STBnQnAWX14mzUrZOS1Xbs8K1at5HFlwBI5BCTjgHFTBDaiDBsDgETyDV/BmPVkv1rv1MWvNWPOZQ/AH1ucPkhmbxw=</latexit>
slide-3
SLIDE 3

Problem

Convex In practice, usually a neural network

min

x

1 M

M

X

m=1

fm(x)

<latexit sha1_base64="DJWDtL5mSWrGw/2Afw1MWCqws=">ACDnicbVC7SgNBFJ2NrxhfUubwRCITdiNgjZC0MYmEME8IBuX2clsMmRmdpmZlYRlv8DGX7GxUMTW2s6/cfIoNPHAhcM593LvPX7EqNK2/W1lVlbX1jeym7mt7Z3dvfz+QVOFscSkgUMWyraPFGFUkIampF2JAniPiMtf3g98VsPRCoaijs9jkiXo76gAcVIG8nLF1OhTeCbiARTpw0qaWuirmX8Esnva/BwOl0YmXL9hlewq4TJw5KYA56l7+y+2FOZEaMyQUh3HjnQ3QVJTzEiac2NFIoSHqE86hgrEieom03dSWDRKDwahNCU0nKq/JxLElRpz3RypAdq0ZuI/3mdWAcX3YSKNZE4NmiIGZQh3CSDexRSbBmY0MQltTcCvEAmWC0STBnQnAWX14mzUrZOS1Xbs8K1at5HFlwBI5BCTjgHFTBDaiDBsDgETyDV/BmPVkv1rv1MWvNWPOZQ/AH1ucPkhmbxw=</latexit>
slide-4
SLIDE 4

Problem

min

x

1 M

M

X

m=1

fm(x)

<latexit sha1_base64="DJWDtL5mSWrGw/2Afw1MWCqws=">ACDnicbVC7SgNBFJ2NrxhfUubwRCITdiNgjZC0MYmEME8IBuX2clsMmRmdpmZlYRlv8DGX7GxUMTW2s6/cfIoNPHAhcM593LvPX7EqNK2/W1lVlbX1jeym7mt7Z3dvfz+QVOFscSkgUMWyraPFGFUkIampF2JAniPiMtf3g98VsPRCoaijs9jkiXo76gAcVIG8nLF1OhTeCbiARTpw0qaWuirmX8Esnva/BwOl0YmXL9hlewq4TJw5KYA56l7+y+2FOZEaMyQUh3HjnQ3QVJTzEiac2NFIoSHqE86hgrEieom03dSWDRKDwahNCU0nKq/JxLElRpz3RypAdq0ZuI/3mdWAcX3YSKNZE4NmiIGZQh3CSDexRSbBmY0MQltTcCvEAmWC0STBnQnAWX14mzUrZOS1Xbs8K1at5HFlwBI5BCTjgHFTBDaiDBsDgETyDV/BmPVkv1rv1MWvNWPOZQ/AH1ucPkhmbxw=</latexit>

fm(x) = Eξfm(x; ξ)

<latexit sha1_base64="2TvImUb0Ro/I8bGbe3H5+DgMKlo=">ACDnicbZDLSsNAFIYn9VbrLerSzWAptJuSVEFBhKILivYCzQhTKaTdujkwsxEWkKfwI2v4saFIm5du/NtnKRZaOuBgY/P4c53cjRoU0jG+tsLK6tr5R3Cxtbe/s7un7Bx0RxhyTNg5ZyHsuEoTRgLQlYz0Ik6Q7zLSdcfXqd9IFzQMLiX04jYPhoG1KMYSU5esVz/OqkBi+h5SM5ct3kZuYk1oTOYOZcQMU1Ry8bdSMruAxmDmWQV8vRv6xBiGOfBIzJETfNCJpJ4hLihmZlaxYkAjhMRqSvsIA+UTYSXbODFaUMoBeyNULJMzU3xMJ8oWY+q7qTHcWi14q/uf1Y+md2wkNoliSAM8/8mIGZQjTbOCAcoIlmypAmFO1K8QjxBGWKsGSCsFcPHkZOo26eVJv3J2Wm1d5HEVwBI5BFZjgDTBLWiBNsDgETyDV/CmPWkv2rv2MW8taPnMIfhT2ucPI2Sa5w=</latexit>
slide-5
SLIDE 5

Local SGD

min

x

1 M

M

X

m=1

fm(x)

<latexit sha1_base64="DJWDtL5mSWrGw/2Afw1MWCqws=">ACDnicbVC7SgNBFJ2NrxhfUubwRCITdiNgjZC0MYmEME8IBuX2clsMmRmdpmZlYRlv8DGX7GxUMTW2s6/cfIoNPHAhcM593LvPX7EqNK2/W1lVlbX1jeym7mt7Z3dvfz+QVOFscSkgUMWyraPFGFUkIampF2JAniPiMtf3g98VsPRCoaijs9jkiXo76gAcVIG8nLF1OhTeCbiARTpw0qaWuirmX8Esnva/BwOl0YmXL9hlewq4TJw5KYA56l7+y+2FOZEaMyQUh3HjnQ3QVJTzEiac2NFIoSHqE86hgrEieom03dSWDRKDwahNCU0nKq/JxLElRpz3RypAdq0ZuI/3mdWAcX3YSKNZE4NmiIGZQh3CSDexRSbBmY0MQltTcCvEAmWC0STBnQnAWX14mzUrZOS1Xbs8K1at5HFlwBI5BCTjgHFTBDaiDBsDgETyDV/BmPVkv1rv1MWvNWPOZQ/AH1ucPkhmbxw=</latexit>

xm

t+1 =

( ˆ xt+1, if t mod H = 0 xm

t γrfm(xm t ; ξm t ), otherwise

<latexit sha1_base64="TU3D7h/GIHQvE30NZkRDc74gSRE=">ACenicbVHbatAEF2pt9S9ueljKQwxbROSGCkNJBACoX3JYwp1ErBcsVqN7CW7K7E7Sm2EP6K/lrd+SV/60LWtQpt0YOHMOXPbmaxS0lEU/QjCe/cfPHy09rjz5Omz5y+6L9fPXVlbgQNRqtJeZtyhkgYHJEnhZWR60zhRXb1aFfXKN1sjRfaFbhSPOxkYUnDyVdr9P04a24/lXDceQZDiWphG+npt3kgknaOWdhHBKjSxgDgQrB3SZe/f0OEoS6ExT8jV2IRlzrTkhmeKQ5HqzaVyBMlULsDWzp/0kiZov0mH4Huhydu+abcX9aOlwV0Qt6DHWjtLuzdJXopaoyGhuHPDOKpo1HBLUij0tWuHFRdXfIxDw3X6EbNcnVzeOuZHIrS+mcIluzfGQ3Xzs105iM1p4m7rS3I/2nDmorDUSNVRMasWpU1AqohMUdIJcWBamZB1xY6WcFMeGWC/LX6vglxLe/fBec7/XjD/29z/u9k4/tOtbYa7bBNlnMDtgJO2VnbMAE+xm8Cd4F74Nf4Ua4FW6vQsOgzXnF/rFw/zcCfb63</latexit>
slide-6
SLIDE 6

Local SGD

min

x

1 M

M

X

m=1

fm(x)

<latexit sha1_base64="DJWDtL5mSWrGw/2Afw1MWCqws=">ACDnicbVC7SgNBFJ2NrxhfUubwRCITdiNgjZC0MYmEME8IBuX2clsMmRmdpmZlYRlv8DGX7GxUMTW2s6/cfIoNPHAhcM593LvPX7EqNK2/W1lVlbX1jeym7mt7Z3dvfz+QVOFscSkgUMWyraPFGFUkIampF2JAniPiMtf3g98VsPRCoaijs9jkiXo76gAcVIG8nLF1OhTeCbiARTpw0qaWuirmX8Esnva/BwOl0YmXL9hlewq4TJw5KYA56l7+y+2FOZEaMyQUh3HjnQ3QVJTzEiac2NFIoSHqE86hgrEieom03dSWDRKDwahNCU0nKq/JxLElRpz3RypAdq0ZuI/3mdWAcX3YSKNZE4NmiIGZQh3CSDexRSbBmY0MQltTcCvEAmWC0STBnQnAWX14mzUrZOS1Xbs8K1at5HFlwBI5BCTjgHFTBDaiDBsDgETyDV/BmPVkv1rv1MWvNWPOZQ/AH1ucPkhmbxw=</latexit>

xm

t+1 =

(

1 M

PM

j=1(xj t γrfj(xj t; ξj t )), if t mod H = 0

xm

t γrfm(xm t ; ξm t ), otherwise

<latexit sha1_base64="lgA18Epjy9RAdJcl5dGIGQd9cg=">ACsXicbVFda9swFJW9ry7SrfHvVwWBg3rgt0OVhiBsr30pdB03bEjicrcqLEko10vSUI/789723/ZnLiwtb2guDo3HuOru5Ny1wYDI/n/v/oOHj3Yed548fb8RXf35YUpKs34iBV5oa9SanguFB+hwJxflZpTmeb8Ml1+afKXP7g2olDnuC5LOlMiUwio5Kur9WicV3YT2RMIQo5TOhLHN+pu5EmabMhrU9rSNTycQuhq7uFPZWCU4W8B6iGZWSQqRomlPIksU28wmilWhAv78fIV+hFRnUgLC9gCym7noyDKIOo1C3uElN17y2kv296/lBc65/ikMB9ciV9O23aTbCwbBJuA2CFvQI2cJd3f0bRgleQKWU6NGYdBibGlGgXLufOuDC8pW9IZHzuoqOQmtpuJ1/DWMVPICu2OQtiw/yoslcasZeoqJcW5uZlryLty4wqzo9gKVbIFds+lFU5YAHN+mAqNGeYrx2gTAvXK7A5dYtCt+SOG0J48u3wcXBIDwcHz90Dv+3I5jh7wmb8geCclHckxOyBkZEeYNvHMv9ib+of/N/+6n21LfazWvyH/hL/8CRMjSKA=</latexit>
slide-7
SLIDE 7

Local SGD

min

x

1 M

M

X

m=1

fm(x)

<latexit sha1_base64="DJWDtL5mSWrGw/2Afw1MWCqws=">ACDnicbVC7SgNBFJ2NrxhfUubwRCITdiNgjZC0MYmEME8IBuX2clsMmRmdpmZlYRlv8DGX7GxUMTW2s6/cfIoNPHAhcM593LvPX7EqNK2/W1lVlbX1jeym7mt7Z3dvfz+QVOFscSkgUMWyraPFGFUkIampF2JAniPiMtf3g98VsPRCoaijs9jkiXo76gAcVIG8nLF1OhTeCbiARTpw0qaWuirmX8Esnva/BwOl0YmXL9hlewq4TJw5KYA56l7+y+2FOZEaMyQUh3HjnQ3QVJTzEiac2NFIoSHqE86hgrEieom03dSWDRKDwahNCU0nKq/JxLElRpz3RypAdq0ZuI/3mdWAcX3YSKNZE4NmiIGZQh3CSDexRSbBmY0MQltTcCvEAmWC0STBnQnAWX14mzUrZOS1Xbs8K1at5HFlwBI5BCTjgHFTBDaiDBsDgETyDV/BmPVkv1rv1MWvNWPOZQ/AH1ucPkhmbxw=</latexit>

xm

t+1 =

( ˆ xt+1, if t mod H = 0 xm

t γrfm(xm t ; ξm t ), otherwise

<latexit sha1_base64="TU3D7h/GIHQvE30NZkRDc74gSRE=">ACenicbVHbatAEF2pt9S9ueljKQwxbROSGCkNJBACoX3JYwp1ErBcsVqN7CW7K7E7Sm2EP6K/lrd+SV/60LWtQpt0YOHMOXPbmaxS0lEU/QjCe/cfPHy09rjz5Omz5y+6L9fPXVlbgQNRqtJeZtyhkgYHJEnhZWR60zhRXb1aFfXKN1sjRfaFbhSPOxkYUnDyVdr9P04a24/lXDceQZDiWphG+npt3kgknaOWdhHBKjSxgDgQrB3SZe/f0OEoS6ExT8jV2IRlzrTkhmeKQ5HqzaVyBMlULsDWzp/0kiZov0mH4Huhydu+abcX9aOlwV0Qt6DHWjtLuzdJXopaoyGhuHPDOKpo1HBLUij0tWuHFRdXfIxDw3X6EbNcnVzeOuZHIrS+mcIluzfGQ3Xzs105iM1p4m7rS3I/2nDmorDUSNVRMasWpU1AqohMUdIJcWBamZB1xY6WcFMeGWC/LX6vglxLe/fBec7/XjD/29z/u9k4/tOtbYa7bBNlnMDtgJO2VnbMAE+xm8Cd4F74Nf4Ua4FW6vQsOgzXnF/rFw/zcCfb63</latexit>

H = 1 − → minibatch SGD

<latexit sha1_base64="TrRkAl2UzCO/3evHqQkuSPWkQ0=">AC3icbVC7SgNBFJ31GeMramkzJBGswm4stBGCqaMaB6QhDA7mWyGzM4sM3eVsKS38VdsLBSx9Qfs/Bsnj0ITD1w4nHMv97jR4IbcN1vZ2l5ZXVtPbWR3tza3tnN7O3XjIo1ZVWqhNINnxgmuGRV4CBYI9KMhL5gdX9wOfbr90wbruQdDCPWDkgeY9TAlbqZL58rnXEkoGmgd9IFqrhzwOueQ+AdrHt9dXnUzOLbgT4EXizUgOzVDpZL5aXUXjkEmghjT9NwI2gnRwKlgo3QrNiwidEAC1rRUkpCZdjL5ZYSPrNLFPaVtScAT9fdEQkJjhqFvO0MCfTPvjcX/vGYMvbN2wmUA5N0uqgXCwKj4PBXa4ZBTG0hFDN7a2Y9okmFGx8aRuCN/yIqkVC95JoXhTzJUuZnGk0CHKomPkoVNUQmVUQVE0SN6Rq/ozXlyXpx352PauTMZg7QHzifP5omiE=</latexit>

H = T − → one-shot averaging

<latexit sha1_base64="EetfaAhe/yrdpwryMbhAPUM3Qgo=">ACEHicbVC7SgNBFJ31bXytWtoMRtHGsBsLbQTRJmUEY4QkhLuTyWbI7Mwyc1cJIZ9g46/YWChia2n3zh5FBo9MHA45x7u3BOlUlgMgi9vZnZufmFxaTm3srq2vuFvbt1YnRnGK0xLbW4jsFwKxSsoUPLb1HBIsmrUfdy6FfvuLFCq2vspbyRQKxEWzBAJzX9g73S2XVdahUbEXcQjNH3e1QrfmQ7Gim4LMRCxU0/HxSCEehfEk5InkxQbvqf9ZmWcIVMgnW1sIgxUYfDAom+SBXzyxPgXUh5jVHFSTcNvqjgwZ03ykt2tbGPYV0pP5M9CGxtpdEbjIB7Nhpbyj+59UybJ82+kKlGXLFxovamaSo6bAd2hKGM5Q9R4AZ4f5KWQcMHQd5lwJ4fTJf8lNsRAeF4pXxfz5xaSOJbJDdskhCckJOSclUiYVwsgDeSIv5NV79J69N+9PDrjTLb5Be8j29W9JzK</latexit>
slide-8
SLIDE 8

Local GD

xm

t+1 = xm t γrfm(xm t )

<latexit sha1_base64="/Ss/hIUrOmuSuaZw6WP6qfoGsY=">ACFXicbVBNSwMxEM36WetX1aOXYBEUteyqoBeh6MWjglWhrctsmq2hSXZJZsWy9E948a948aCIV8Gb/8a09uDXg2Ee782QzItSKSz6/oc3Mjo2PjFZmCpOz8zOzZcWFs9tkhnGayRibmMwHIpNK+hQMkvU8NBRZJfRJ2jvn9xw40ViT7DbsqbCtpaxIBOiksbd6GOW4EvStFD+htiK5v0UYblALa0BJoHGo1gbOelgq+xV/APqXBENSJkOchKX3RithmeIamQRr64GfYjMHg4J3is2MstTYB1o87qjGhS3zXxwVY+uOqVF48S40kgH6veNHJS1XRW5SQV4bX97fE/r5hvN/MhU4z5Jp9PRnkmJC+xHRljCcoew6AswI91fKrsEAQxdk0YUQ/D75LznfrgQ7le3T3XL1cBhHgSyTFbJGArJHquSYnJAaYeSOPJAn8uzde4/ei/f6NTriDXeWyA94b5/fPZ1j</latexit>
slide-9
SLIDE 9

The Variance of Local GD

σ2

f def

= 1 M

M

X

m=1

krfm(x∗)k2

<latexit sha1_base64="3gVDmUvpLV4RZgkqaHQ+IBOrOg=">ACOnicbVBNaxRBFOyJX3H9WvXopXERodlZiPoJRD04iWQgJsEtneHN71vNk26e4buN5KlM7/Li7/CmwcvHhTx6g+wZ7MHTSxoKrq8fpVUWvlKU2/JBvXrt+4eWvzdu/O3Xv3H/QfPjr0VeMkjmWlK3dcgEetLI5Jkcbj2iGYQuNRcfq2848+oPOqsu9pWePUwMKqUkmgKOX9A+HVwkBezkai6oJIQRCeUZhj2bZhp+WidCBD1oa9VvjG5MHsZO1sj4tzYaHQwMvc8K2z/MVzcT4b9fL+IB2mK/CrJFuTAVtjP+9/FvNKNgYtSQ3eT7K0pmkAR0pqbHui8ViDPIUFTiK1YNBPw+r0lj+LypyXlYvPEl+pf08EMN4vTRGTBujEX/Y68X/epKHy9TQoWzeEVl4sKhvNqeJdj3yuHErSy0hAOhX/yuUJxKYoltiVkF0+So5HA2z7eHo4OVg9826jk32hD1lWyxjr9gue8f2ZhJ9pF9Zd/Zj+RT8i35mfy6iG4k65nH7B8kv/8AIC6uEA=</latexit>

xm

t+1 = xm t γrfm(xm t )

<latexit sha1_base64="/Ss/hIUrOmuSuaZw6WP6qfoGsY=">ACFXicbVBNSwMxEM36WetX1aOXYBEUteyqoBeh6MWjglWhrctsmq2hSXZJZsWy9E948a948aCIV8Gb/8a09uDXg2Ee782QzItSKSz6/oc3Mjo2PjFZmCpOz8zOzZcWFs9tkhnGayRibmMwHIpNK+hQMkvU8NBRZJfRJ2jvn9xw40ViT7DbsqbCtpaxIBOiksbd6GOW4EvStFD+htiK5v0UYblALa0BJoHGo1gbOelgq+xV/APqXBENSJkOchKX3RithmeIamQRr64GfYjMHg4J3is2MstTYB1o87qjGhS3zXxwVY+uOqVF48S40kgH6veNHJS1XRW5SQV4bX97fE/r5hvN/MhU4z5Jp9PRnkmJC+xHRljCcoew6AswI91fKrsEAQxdk0YUQ/D75LznfrgQ7le3T3XL1cBhHgSyTFbJGArJHquSYnJAaYeSOPJAn8uzde4/ei/f6NTriDXeWyA94b5/fPZ1j</latexit>
slide-10
SLIDE 10

Analysis difficulties in local GD

xm

t+1 = xm t γrfm(xm t )

<latexit sha1_base64="/Ss/hIUrOmuSuaZw6WP6qfoGsY=">ACFXicbVBNSwMxEM36WetX1aOXYBEUteyqoBeh6MWjglWhrctsmq2hSXZJZsWy9E948a948aCIV8Gb/8a09uDXg2Ee782QzItSKSz6/oc3Mjo2PjFZmCpOz8zOzZcWFs9tkhnGayRibmMwHIpNK+hQMkvU8NBRZJfRJ2jvn9xw40ViT7DbsqbCtpaxIBOiksbd6GOW4EvStFD+htiK5v0UYblALa0BJoHGo1gbOelgq+xV/APqXBENSJkOchKX3RithmeIamQRr64GfYjMHg4J3is2MstTYB1o87qjGhS3zXxwVY+uOqVF48S40kgH6veNHJS1XRW5SQV4bX97fE/r5hvN/MhU4z5Jp9PRnkmJC+xHRljCcoew6AswI91fKrsEAQxdk0YUQ/D75LznfrgQ7le3T3XL1cBhHgSyTFbJGArJHquSYnJAaYeSOPJAn8uzde4/ei/f6NTriDXeWyA94b5/fPZ1j</latexit>

ˆ xt

def

= 1 M

M

X

m=1

xm

t

<latexit sha1_base64="lsPe/03Qr5ovpF4QY82tokgmabo=">ACJ3icbVBS9xAGJ1Ybe2q7VqPvQwugqclWQV7UcRevAgWuipsmEy+2V3cCYJM1+Ky5B/46V/pZeCiuix/6STNQdfTDweO9fPO9pJDCoO8/egvFpfef1j+2FpZXfv0ub3+5czkpebQ57nM9UXCDEiRQR8FSrgoNDCVSDhPLr/X/vkv0Ebk2U+cFhApNs5EKjhDJ8Xtg3DCkF7FSMO8zgHaEOEK7QjSqrL7FQ1TzbgNKntShaZUsVX7QTU8qWeGKm53/K4/A31NgoZ0SIPTuH0TjnJeKsiQS2bMIPALjCzTKLiEqhWBgrGL9kYBo5mTIGJ7OzOim45ZUTXLuXIZ2pzycsU8ZMVeKSiuHEzHu1+JY3KDH9FlmRFSVCxp8WpaWkmNO6NDoSGjKqSOMa+H+SvmEuV7QVdZyJQTzJ78mZ71usNPt/djtHB41dSyTr2STbJOA7JFDckxOSZ9wck3+kFty5/32/nr3sNTdMFrZjbIC3j/gNAc6di</latexit>
slide-11
SLIDE 11

Analysis difficulties in local GD

xm

t+1 = xm t γrfm(xm t )

<latexit sha1_base64="/Ss/hIUrOmuSuaZw6WP6qfoGsY=">ACFXicbVBNSwMxEM36WetX1aOXYBEUteyqoBeh6MWjglWhrctsmq2hSXZJZsWy9E948a948aCIV8Gb/8a09uDXg2Ee782QzItSKSz6/oc3Mjo2PjFZmCpOz8zOzZcWFs9tkhnGayRibmMwHIpNK+hQMkvU8NBRZJfRJ2jvn9xw40ViT7DbsqbCtpaxIBOiksbd6GOW4EvStFD+htiK5v0UYblALa0BJoHGo1gbOelgq+xV/APqXBENSJkOchKX3RithmeIamQRr64GfYjMHg4J3is2MstTYB1o87qjGhS3zXxwVY+uOqVF48S40kgH6veNHJS1XRW5SQV4bX97fE/r5hvN/MhU4z5Jp9PRnkmJC+xHRljCcoew6AswI91fKrsEAQxdk0YUQ/D75LznfrgQ7le3T3XL1cBhHgSyTFbJGArJHquSYnJAaYeSOPJAn8uzde4/ei/f6NTriDXeWyA94b5/fPZ1j</latexit>

Vt

def

= 1 M

M

X

m=1

kxm

t ˆ

xtk2

<latexit sha1_base64="CsDFHYojcI5Ak0ompm+p/cvt9E=">ACM3icbVBSxwxGM1otbpVu9ajl9BF6MVlZi3oRB7EUGw0F2Fze6QyX7jBpOZIfmuMT5T178Ix4E8aCI1/6HZtc9tOqDwMt7yP5XlIoaTEM74KZ2Q9z8x8XFmuflpZXPtdXv3RsXhoBbZGr3Jwm3IKSGbRoLTwgDXiYKT5PzH2D/5DcbKPuFowJ6mp9lMpWCo5fi+mEnRsrycQTQMYQLdANIq8rtVpSlhgsXVe6oYrbUsdO7UdU/ouzyIsa+puUDTlSf2GX/VZcb4TNcAL6lkRT0iBTHMf1GzbIRakhQ6G4td0oLDnuEpFQ1VlouDjnZ9D1NOMabM9Ndq7ohlcGNM2NPxnSifrvhOPa2pFOfFJzHNrX3lh8z+uWmO70nMyKEiETLw+lpaKY03GBdCANCFQjT7gw0v+ViH3RaHvsOZLiF6v/JZ0Ws1oq9n6+b2xtz+tY4Gsk6/kG4nINtkjB+SYtIkgV+SWPJDH4Dq4D56C5foTDCdWSP/IfjzFw7Kq7A=</latexit>

ˆ xt

def

= 1 M

M

X

m=1

xm

t

<latexit sha1_base64="lsPe/03Qr5ovpF4QY82tokgmabo=">ACJ3icbVBS9xAGJ1Ybe2q7VqPvQwugqclWQV7UcRevAgWuipsmEy+2V3cCYJM1+Ky5B/46V/pZeCiuix/6STNQdfTDweO9fPO9pJDCoO8/egvFpfef1j+2FpZXfv0ub3+5czkpebQ57nM9UXCDEiRQR8FSrgoNDCVSDhPLr/X/vkv0Ebk2U+cFhApNs5EKjhDJ8Xtg3DCkF7FSMO8zgHaEOEK7QjSqrL7FQ1TzbgNKntShaZUsVX7QTU8qWeGKm53/K4/A31NgoZ0SIPTuH0TjnJeKsiQS2bMIPALjCzTKLiEqhWBgrGL9kYBo5mTIGJ7OzOim45ZUTXLuXIZ2pzycsU8ZMVeKSiuHEzHu1+JY3KDH9FlmRFSVCxp8WpaWkmNO6NDoSGjKqSOMa+H+SvmEuV7QVdZyJQTzJ78mZ71usNPt/djtHB41dSyTr2STbJOA7JFDckxOSZ9wck3+kFty5/32/nr3sNTdMFrZjbIC3j/gNAc6di</latexit>

gt

def

= 1 M

M

X

m=1

rfm(xm

t )

<latexit sha1_base64="u6d56ApFiGX4woHMZucYOmcywxk=">ACLnicbVBNSyQxFEyr6+qsq7N69BIcFvQydKvgXgRBC+CgqPC9NikM6/HYJukteLQ+hf5MW/ogdBRbz6M8yMc/CrIFBU1ePlVpIYTEM74Ox8Ykfkz+npmu/Zn7PztX/zB/bvDQcWjyXuTlNmQUpNLRQoITwgBTqYST9GJn4J/8B2NFro+wX0BHsZ4WmeAMvZTUd3sJ0jgfRABdjHCJrgtZVbnNisaZYdxFlduvYluqxKnNqDrbp7FmqWQ0S9TyZYJnaiWpN8JmOAT9SqIRaZARDpL6bdzNealAI5fM2nYUFthxzKDgEqpaXFoGL9gPWh7qpkC23HDcyv61ytdmuXGP410qL6fcExZ21epTyqG5/azNxC/89olZv86TuiRND8bVFWSo5HXRHu8IAR9n3hHEj/F8pP2e+I/T1XwJ0eTv5Lj1Wa01lw9XG9sbY/qmCKLZIksk4hskC2yRw5Ii3ByRW7IA3kMroO74Cl4fouOBaOZBfIBwcsrVTCp2w=</latexit>
slide-12
SLIDE 12

Theorem

Choose H such that H 

√ T √ M , then γ = √ M 4L √ T  1 4HL, and hence

f(ˆ xT ) f(x∗)  8Lkx0 x∗k2 p MT + 3Mσ2

fH2

2LT . To get a convergence rate of 1/ p MT we can choose H = O(T 1/4M −3/4), which implies a total number of Ω(T 3/4M 3/4) communication steps. If a rate of 1/ p T is desired instead, we can choose a larger H = O(T 1/4).

<latexit sha1_base64="YC/RF8b/Ff3DJwq81jzCNEnsZsg=">AECnicbVNLb9NAEHYTHiW8WjhyGVGDGmjTOKlEL5UqeglSqxYpfUh1Eq3XY2dV7zr1rmkq12cu/BUuHECIK7+AG/+GcWJVbcpIlsYz38x8+2uN4qENs3m37lK9c7de/fnH9QePnr85OnC4rNDHacJxwMeR3Fy7DGNkVB4YISJ8HiUIJNehEfe6XaRP/qEiRax6pqLEfYkC5UIBGeGQoPFCrgGxybHsaxRsjtjg065UMwQ2bA7oAb4Rm4QcJ45uqzxGTdPC+93Ty3VwiICmw3ZFIy2LwBJUC2DjtwVXi9m1PkOrBT9GDKB2rDEWpA5noYCpWxSITqT4JFRYsu0QqG+eDbh1W6Xc8eFOH19d6bhTDLseDJqUp6V72W1dkoZj/tgS2YZdYiVCyQdBvQafAtai4mzemDFD5N+Z3YwjRAMeK9IznJBNmEGIA7CdtXJIN7fhHIEzBXyqKEm4CXvL3X7mrK3nsNvPVtvk1Gnr86EgoYWkc0ZNrU1sWAQqlR4mk7bunsSQFbXtsnZaSiSkTFV5iKANjnQDPgTU4xalgpHQ4KMWCfogFKGZvzLDkHEaKlkhm7dbtQGC0vNRnNicNtxSmfJKm1/sPDH9WOeSlSGR0zrE6c5Mr2MJUbwCPOam2ocMX7KQjwhVzGJupdNrnIOryjiQxAn9CkDk+j1ioxJrS+kR0jJzFDP5org/3InqQk2eplQo9TQ2U0HBWlEokPxLsAnebiJLshPBHElbRhdFcMvZ5CBGd25dvOYavhtButj62lrfelHPWC+ultWw51jtry+pY+9aBxSufK18r3ys/ql+q36o/q7+m0MpcWfPcumHV3/8A25xABA=</latexit>
slide-13
SLIDE 13

Plots

slide-14
SLIDE 14

Plots

slide-15
SLIDE 15

Local SGD

xm

t+1 = xm t γrfm(xm t ; ξm t )

<latexit sha1_base64="WcKtcLHGZxiDYbIuisgnoie8vI=">ACHnicbZBdaxNBFIZno7Yx1natl94MhkKGHbTFoVSCHrjZQrNB2TjcnYymwyZmV1mzkrCkl/iTf+KN14oInil/8bJx4WmPTDMw/uew8x5k1wKi0Hwx6s8ePhob7/6uPbk4Onhkf/suGezwjDeZnMzCABy6XQvIsCJR/khoNKJO8ns/crv/+JGysyfYOLnI8UTLRIBQN0UuxfzOMSX4XLj4pe0XmM7n5NowkoBTSkEigawa+eSRnOxgtPYrwfNYF30LoRbqJNtdWL/VzTOWKG4RibB2mEY5DgqwaBgki9rUWF5DmwGEz50qEFxOyrX6y3piVPGNM2MOxrpWv13ogRl7UIlrlMBTu2utxLv84YFpm9HpdB5gVyzUNpISlmdJUVHQvDGcqFA2BGuL9SNgUDF2iNRdCuLvyXei1muFZs3V9Xm+/28ZRJS/IS9IgIXlD2uQD6ZAuYeQz+UK+ke/erfV+H93LRWvO3Mc/Jfeb/AqHoPM=</latexit>

Eξkrfm(x; ξ) rfm(x)k2  σ2

<latexit sha1_base64="Pbky/O4rKehZK6sSB0yrhYVciq4=">ACLHicbVBNSwMxFMz6bf2qevQSLIeLtVUPBSFMGjgq1CU5e3abaGJtklyUrL2h/kxb8iAdFvPo7TGsPVh0IDPv8TITpYIb6/tv3sTk1PTM7Nx8YWFxaXmluLpWN0mKavRCT6OgLDBFesZrkV7DrVDGQk2FXUORn4V3dMG56oS9tLWVNCW/GYU7BOCosnRIK9jaL8tB+SLif3mCiIBOA4lNvdI+y0Hbw7Ju6Q+5sKEQwTw9sSbiphseSX/SHwXxKMSAmNcB4Wn0kroZlkylIBxjQCP7XNHLTlVLB+gWSGpUA70GYNRxVIZpr5MGwfbzmlheNEu6csHqo/N3KQxvRk5CYH0cxvbyD+5zUyGx82c67SzDJFvw/FmcA2wYPmcItrRq3oOQJUc/dXTG9BA7Wu34IrIfgd+S+pV8rBXrlysV+qHo/qmEMbaBNtowAdoCo6Q+eohih6QE/oFb15j96L9+59fI9OeKOdTQG7/MLkW2miQ=</latexit>

Eξkrfm(x; ξ) rfm(x)k2  4LDfm(x, x∗) + 2σ2

<latexit sha1_base64="ckeNncuaFw0mul06bUCt3lbTM=">ACQ3icbZBNTxsxEIa9QCkNX6E9jIiQkpKG+1ukajUC+qH1AMHKjWAiMNq1vEGC9u7sr0o0ZL/1kv/QG/9A730KrioQTcqDQV7L06pkZjedNCymsC8Mfwdz8wqPFx0tPasrq2vr9Y2nhzYvDeMdlsvcHKdouRSad5xwkh8XhqNKJT9Kz9P6kcX3FiR6y9uVPCewoEWmWDoPErqJ1ShO0vT6uM4oUMB9JqTCVClqjm8C141oJXcBe26OVpDFRy2NmHD0nl4bg5fAnD5EULtiGmVgwUnsZJvRG2w6ngoYlmpkFmOkjq32k/Z6Xi2jGJ1najsHC9Co0TPJxjZaWF8jOcC73mpU3PaqaQZj2PKkD1lu/NMOpvTuRIXK2pFKfefkYnu/NoH/q3VLl73pVUIXpeOa3S7KSgkuh0mg0BeGMydH3iAzwv8V2BkaZM7HXvMhRPdPfmgO43b0uh1/3mnsvZvFsUSek03SJBHZJXvkEzkgHcLIV/KT/CZ/gm/Br+BvcHXbOhfMZp6RfxRc3wC4G62v</latexit>
slide-16
SLIDE 16

Local SGD

xm

t+1 = xm t γrfm(xm t ; ξm t )

<latexit sha1_base64="WcKtcLHGZxiDYbIuisgnoie8vI=">ACHnicbZBdaxNBFIZno7Yx1natl94MhkKGHbTFoVSCHrjZQrNB2TjcnYymwyZmV1mzkrCkl/iTf+KN14oInil/8bJx4WmPTDMw/uew8x5k1wKi0Hwx6s8ePhob7/6uPbk4Onhkf/suGezwjDeZnMzCABy6XQvIsCJR/khoNKJO8ns/crv/+JGysyfYOLnI8UTLRIBQN0UuxfzOMSX4XLj4pe0XmM7n5NowkoBTSkEigawa+eSRnOxgtPYrwfNYF30LoRbqJNtdWL/VzTOWKG4RibB2mEY5DgqwaBgki9rUWF5DmwGEz50qEFxOyrX6y3piVPGNM2MOxrpWv13ogRl7UIlrlMBTu2utxLv84YFpm9HpdB5gVyzUNpISlmdJUVHQvDGcqFA2BGuL9SNgUDF2iNRdCuLvyXei1muFZs3V9Xm+/28ZRJS/IS9IgIXlD2uQD6ZAuYeQz+UK+ke/erfV+H93LRWvO3Mc/Jfeb/AqHoPM=</latexit>

Eξkrfm(x; ξ) rfm(x)k2  4LDfm(x, x∗) + 2σ2

<latexit sha1_base64="ckeNncuaFw0mul06bUCt3lbTM=">ACQ3icbZBNTxsxEIa9QCkNX6E9jIiQkpKG+1ukajUC+qH1AMHKjWAiMNq1vEGC9u7sr0o0ZL/1kv/QG/9A730KrioQTcqDQV7L06pkZjedNCymsC8Mfwdz8wqPFx0tPasrq2vr9Y2nhzYvDeMdlsvcHKdouRSad5xwkh8XhqNKJT9Kz9P6kcX3FiR6y9uVPCewoEWmWDoPErqJ1ShO0vT6uM4oUMB9JqTCVClqjm8C141oJXcBe26OVpDFRy2NmHD0nl4bg5fAnD5EULtiGmVgwUnsZJvRG2w6ngoYlmpkFmOkjq32k/Z6Xi2jGJ1najsHC9Co0TPJxjZaWF8jOcC73mpU3PaqaQZj2PKkD1lu/NMOpvTuRIXK2pFKfefkYnu/NoH/q3VLl73pVUIXpeOa3S7KSgkuh0mg0BeGMydH3iAzwv8V2BkaZM7HXvMhRPdPfmgO43b0uh1/3mnsvZvFsUSek03SJBHZJXvkEzkgHcLIV/KT/CZ/gm/Br+BvcHXbOhfMZp6RfxRc3wC4G62v</latexit>

σdif

def

= 1 M

M

X

m=1

Eξkrfm(x∗, ξ)k2

<latexit sha1_base64="AGNcpAGCHJmGAS2BlYOWiwgkn0=">ACVXicbZBfaxQxFMUzY61/dNVH30JLkIVWa2QvtSKIrgS6GC2xY2yGTvbMNTJDckd2SfMl+yJ+E18EM9sVtPVC4PA79ya5p2yUdJhlP5L03sb9zQdbD3uPHj95ut1/9vzE1a0VMBa1qu1ZyR0oaWCMEhWcNRa4LhWclpcfO/0G1gna/MVlw1MNZ8bWUnBMaKir5iTc80LzxAW6GeyCoHV3QTgHwaR+YPAKsuFz4M/Csy1uvD6IA/nR5Rpjhdl6T+FeMtCBnbFDC8Vp1WhdxbF23c0jfs6nxU9AfZMFsVvSvytRiQdR0X/Ws2q0WrwaBQ3LlJnjU49dyiFApCj7UOGi4u+RwmURquwU39KpVAX0cyo1Vt4zFIV/TvCc+1c0tdxs5uA3fb6+D/vEmL1f7US9O0CEbcPFS1imJNu4jpTFoQqJZRcGFl/CsVFzyGhzHWXgwhv73yXEyGua7w9GX94PD+s4tshL8orskJzskUPymRyTMRHkmvxMkiRNvie/0o1086Y1TdYzL8g/lW7/BjV3tks=</latexit>
slide-17
SLIDE 17

Local SGD

xm

t+1 = xm t γrfm(xm t ; ξm t )

<latexit sha1_base64="WcKtcLHGZxiDYbIuisgnoie8vI=">ACHnicbZBdaxNBFIZno7Yx1natl94MhkKGHbTFoVSCHrjZQrNB2TjcnYymwyZmV1mzkrCkl/iTf+KN14oInil/8bJx4WmPTDMw/uew8x5k1wKi0Hwx6s8ePhob7/6uPbk4Onhkf/suGezwjDeZnMzCABy6XQvIsCJR/khoNKJO8ns/crv/+JGysyfYOLnI8UTLRIBQN0UuxfzOMSX4XLj4pe0XmM7n5NowkoBTSkEigawa+eSRnOxgtPYrwfNYF30LoRbqJNtdWL/VzTOWKG4RibB2mEY5DgqwaBgki9rUWF5DmwGEz50qEFxOyrX6y3piVPGNM2MOxrpWv13ogRl7UIlrlMBTu2utxLv84YFpm9HpdB5gVyzUNpISlmdJUVHQvDGcqFA2BGuL9SNgUDF2iNRdCuLvyXei1muFZs3V9Xm+/28ZRJS/IS9IgIXlD2uQD6ZAuYeQz+UK+ke/erfV+H93LRWvO3Mc/Jfeb/AqHoPM=</latexit>

Eξkrfm(x; ξ) rfm(x)k2  4LDfm(x, x∗) + 2σ2

<latexit sha1_base64="ckeNncuaFw0mul06bUCt3lbTM=">ACQ3icbZBNTxsxEIa9QCkNX6E9jIiQkpKG+1ukajUC+qH1AMHKjWAiMNq1vEGC9u7sr0o0ZL/1kv/QG/9A730KrioQTcqDQV7L06pkZjedNCymsC8Mfwdz8wqPFx0tPasrq2vr9Y2nhzYvDeMdlsvcHKdouRSad5xwkh8XhqNKJT9Kz9P6kcX3FiR6y9uVPCewoEWmWDoPErqJ1ShO0vT6uM4oUMB9JqTCVClqjm8C141oJXcBe26OVpDFRy2NmHD0nl4bg5fAnD5EULtiGmVgwUnsZJvRG2w6ngoYlmpkFmOkjq32k/Z6Xi2jGJ1najsHC9Co0TPJxjZaWF8jOcC73mpU3PaqaQZj2PKkD1lu/NMOpvTuRIXK2pFKfefkYnu/NoH/q3VLl73pVUIXpeOa3S7KSgkuh0mg0BeGMydH3iAzwv8V2BkaZM7HXvMhRPdPfmgO43b0uh1/3mnsvZvFsUSek03SJBHZJXvkEzkgHcLIV/KT/CZ/gm/Br+BvcHXbOhfMZp6RfxRc3wC4G62v</latexit>

Df(x, y) = f(x) f(y) hrf(y), x yi

<latexit sha1_base64="WUrQxD2lyKmIR1GM2GTS3ZNXqgY=">ACInicbVBNSwMxEM3Wr1q/qh69BIvQgpbdKqgHoagHjxVsFbqlzKbZNjSbXZKsdCn9LV78K148KOpJ8MeYbnvQ6oMwj/dmMzIs6Utu1PKzM3v7C4lF3OrayurW/kN7caKowloXUS8lDeaAoZ4LWNdOc3kWSQuBxeuv1L8b+7T2VioXiRicRbQXQFcxnBLSR2vnTy7ZfHOzjpITPsGElfGBKMi4uB9HlFLsCPA6puo8HB4krU72dL9hlOwX+S5wpKaApau38u9sJSRxQoQkHpZqOHenWEKRmhNRzo0VjYD0oUubhgoIqGoN0xNHeM8oHeyH0jyhcar+nBhCoFQSeKYzAN1Ts95Y/M9rxto/aQ2ZiGJNBZks8mOdYjHeEOk5RonhgCRDLzV0x6IFok2rOhODMnvyXNCpl57BcuT4qVM+ncWTRDtpFReSgY1RFV6iG6oigB/SEXtCr9Wg9W2/Wx6Q1Y01ntEvWF/fhoagDg=</latexit>

σdif

def

= 1 M

M

X

m=1

Eξkrfm(x∗, ξ)k2

<latexit sha1_base64="AGNcpAGCHJmGAS2BlYOWiwgkn0=">ACVXicbZBfaxQxFMUzY61/dNVH30JLkIVWa2QvtSKIrgS6GC2xY2yGTvbMNTJDckd2SfMl+yJ+E18EM9sVtPVC4PA79ya5p2yUdJhlP5L03sb9zQdbD3uPHj95ut1/9vzE1a0VMBa1qu1ZyR0oaWCMEhWcNRa4LhWclpcfO/0G1gna/MVlw1MNZ8bWUnBMaKir5iTc80LzxAW6GeyCoHV3QTgHwaR+YPAKsuFz4M/Csy1uvD6IA/nR5Rpjhdl6T+FeMtCBnbFDC8Vp1WhdxbF23c0jfs6nxU9AfZMFsVvSvytRiQdR0X/Ws2q0WrwaBQ3LlJnjU49dyiFApCj7UOGi4u+RwmURquwU39KpVAX0cyo1Vt4zFIV/TvCc+1c0tdxs5uA3fb6+D/vEmL1f7US9O0CEbcPFS1imJNu4jpTFoQqJZRcGFl/CsVFzyGhzHWXgwhv73yXEyGua7w9GX94PD+s4tshL8orskJzskUPymRyTMRHkmvxMkiRNvie/0o1086Y1TdYzL8g/lW7/BjV3tks=</latexit>
slide-18
SLIDE 18

Theorem

Choose H such that H ≤

p T p M , then γ = p M 8L p T ≤ 1 8HL and

Ef(ˆ xT ) − f(x⇤) ≤ 32Lkˆ

x0x∗k2 p MT

+

5σ2

dif

2L p MT + σ2

difM(H1)2

4LT

.

<latexit sha1_base64="DjE2je4R90UfzEDskAkE7Lisr2E=">ADNHicdVJbaxNBFJ5dbzXeUn305WBWSNSG3a1iX4RiEYI0UCFpC9m4zM7OJkP31p1ZaZjOj/LFH+KLCD4o4qu/wdlkDUmrBwa+Oef7zm0myGPGhW1/NcwrV69dv7Fxs3Hr9p2795qb9w95VhaEDkWZ8VxgDmNWUqHgomYHucFxUkQ06PgZK+KH32gBWdZOhCznI4TPElZxAgW2uVvGm/3plnGKVg9C3hJpiCmWOgbeDE9BS8qMJEePy2EHChVo75S1jNpClY3gQnCYZXa1RNkDuwD0vhajanivVgX1mA0xA8rwHaLC/BYhoE8o2SEbQ93YU8U/6gA1ug72f+k85alm23yn/+l2drmuZ45+/dZdQFX5a81/oZtgkwb6cFyoSGbJIqYruLjvtryr+w4c+tHtbTqdSPtfKgeouBmj4zZbdtecGl4FTgxaq7cBvfvbCjJQJTQWJMecjx87FWOJCMBJT1fBKTnNMTvCEjRMcUL5WM4fXcFj7Qkhygp9UgFz76pC4oTzWRJoZjUAvxirnP+KjUoR7YwlS/NS0JQsCkVlDCKD6gdByApKRDzTAJOC6V6BTLHemND/rFqCc3Hky+DQ7TrbXfed29p9Xa9jAz1Ej1AbOegl2kU9dICGiBgfjS/Gd+OH+cn8Zv40fy2oplFrHqA1M3/AXKwAGE=</latexit>
slide-19
SLIDE 19

Theorem

Choose H such that H ≤

p T p M , then γ = p M 8L p T ≤ 1 8HL and

Ef(ˆ xT ) − f(x⇤) ≤ 32Lkˆ

x0x∗k2 p MT

+

5σ2

dif

2L p MT + σ2

difM(H1)2

4LT

.

<latexit sha1_base64="DjE2je4R90UfzEDskAkE7Lisr2E=">ADNHicdVJbaxNBFJ5dbzXeUn305WBWSNSG3a1iX4RiEYI0UCFpC9m4zM7OJkP31p1ZaZjOj/LFH+KLCD4o4qu/wdlkDUmrBwa+Oef7zm0myGPGhW1/NcwrV69dv7Fxs3Hr9p2795qb9w95VhaEDkWZ8VxgDmNWUqHgomYHucFxUkQ06PgZK+KH32gBWdZOhCznI4TPElZxAgW2uVvGm/3plnGKVg9C3hJpiCmWOgbeDE9BS8qMJEePy2EHChVo75S1jNpClY3gQnCYZXa1RNkDuwD0vhajanivVgX1mA0xA8rwHaLC/BYhoE8o2SEbQ93YU8U/6gA1ug72f+k85alm23yn/+l2drmuZ45+/dZdQFX5a81/oZtgkwb6cFyoSGbJIqYruLjvtryr+w4c+tHtbTqdSPtfKgeouBmj4zZbdtecGl4FTgxaq7cBvfvbCjJQJTQWJMecjx87FWOJCMBJT1fBKTnNMTvCEjRMcUL5WM4fXcFj7Qkhygp9UgFz76pC4oTzWRJoZjUAvxirnP+KjUoR7YwlS/NS0JQsCkVlDCKD6gdByApKRDzTAJOC6V6BTLHemND/rFqCc3Hky+DQ7TrbXfed29p9Xa9jAz1Ej1AbOegl2kU9dICGiBgfjS/Gd+OH+cn8Zv40fy2oplFrHqA1M3/AXKwAGE=</latexit>

Optimal H is H = 1 + bT 1/4M −3/2c

<latexit sha1_base64="tF+C5a4MxsYBIJ56MuCoJRy5jCs=">ACInicbVBNS8NAEN3Ur1q/qh69DLaCINakFdSDUPTSi1ihVaGpZbPd6NJNuxuhBL6W7z4V7x4UNST4I9x2+ag1neZx3szMzIs6Utu1PKzM1PTM7l53PLSwuLa/kV9culYgloU0iuJDXHlaUs5A2NdOcXkeS4sDj9MrnQ79q3sqFRNhQ/cj2g7wbch8RrA2Uid/dB5pFmAOxVoRmDIFjsGBHXC5z4WQ0LhJnL39AZzdJLuVvfIAXDkyip18wS7ZI8AkcVJSQCnqnfy72xUkDmioCcdKtRw70u0ES80Ip4OcGysaYdLDt7RlaIgDqtrJ6MUBbBmlC745yBehpH6cyLBgVL9wDOdAdZ36q83FP/zWrH2D9sJC6NY05CMF/kxBy1gmBd0maRE874hmEhmbgVyhyUm2qSaMyE4f1+eJflklMplS/KhepJGkcWbaBNtI0cdICqIbqIkIekBP6AW9Wo/Ws/VmfYxbM1Y6s45+wfr6Bp/FoA4=</latexit>
slide-20
SLIDE 20

Theorem

Choose H such that H ≤

p T p M , then γ = p M 8L p T ≤ 1 8HL and

Ef(ˆ xT ) − f(x⇤) ≤ 32Lkˆ

x0x∗k2 p MT

+

5σ2

dif

2L p MT + σ2

difM(H1)2

4LT

.

<latexit sha1_base64="DjE2je4R90UfzEDskAkE7Lisr2E=">ADNHicdVJbaxNBFJ5dbzXeUn305WBWSNSG3a1iX4RiEYI0UCFpC9m4zM7OJkP31p1ZaZjOj/LFH+KLCD4o4qu/wdlkDUmrBwa+Oef7zm0myGPGhW1/NcwrV69dv7Fxs3Hr9p2795qb9w95VhaEDkWZ8VxgDmNWUqHgomYHucFxUkQ06PgZK+KH32gBWdZOhCznI4TPElZxAgW2uVvGm/3plnGKVg9C3hJpiCmWOgbeDE9BS8qMJEePy2EHChVo75S1jNpClY3gQnCYZXa1RNkDuwD0vhajanivVgX1mA0xA8rwHaLC/BYhoE8o2SEbQ93YU8U/6gA1ug72f+k85alm23yn/+l2drmuZ45+/dZdQFX5a81/oZtgkwb6cFyoSGbJIqYruLjvtryr+w4c+tHtbTqdSPtfKgeouBmj4zZbdtecGl4FTgxaq7cBvfvbCjJQJTQWJMecjx87FWOJCMBJT1fBKTnNMTvCEjRMcUL5WM4fXcFj7Qkhygp9UgFz76pC4oTzWRJoZjUAvxirnP+KjUoR7YwlS/NS0JQsCkVlDCKD6gdByApKRDzTAJOC6V6BTLHemND/rFqCc3Hky+DQ7TrbXfed29p9Xa9jAz1Ej1AbOegl2kU9dICGiBgfjS/Gd+OH+cn8Zv40fy2oplFrHqA1M3/AXKwAGE=</latexit>

Optimal H is H = 1 + bT 1/4M −3/2c

<latexit sha1_base64="tF+C5a4MxsYBIJ56MuCoJRy5jCs=">ACInicbVBNS8NAEN3Ur1q/qh69DLaCINakFdSDUPTSi1ihVaGpZbPd6NJNuxuhBL6W7z4V7x4UNST4I9x2+ag1neZx3szMzIs6Utu1PKzM1PTM7l53PLSwuLa/kV9culYgloU0iuJDXHlaUs5A2NdOcXkeS4sDj9MrnQ79q3sqFRNhQ/cj2g7wbch8RrA2Uid/dB5pFmAOxVoRmDIFjsGBHXC5z4WQ0LhJnL39AZzdJLuVvfIAXDkyip18wS7ZI8AkcVJSQCnqnfy72xUkDmioCcdKtRw70u0ES80Ip4OcGysaYdLDt7RlaIgDqtrJ6MUBbBmlC745yBehpH6cyLBgVL9wDOdAdZ36q83FP/zWrH2D9sJC6NY05CMF/kxBy1gmBd0maRE874hmEhmbgVyhyUm2qSaMyE4f1+eJflklMplS/KhepJGkcWbaBNtI0cdICqIbqIkIekBP6AW9Wo/Ws/VmfYxbM1Y6s45+wfr6Bp/FoA4=</latexit>

Improves to H = 1 + bT 1/2M −3/2c

<latexit sha1_base64="JfJfUfc8J0YKg4zlvHuGBzOCy/E=">ACIHicbZDLSsNAFIYnXmu9V26GWwFQaxJu6gboeimLoQKvUFvTKaTdugkE2YmhRL6KG58FTcuFNGdPo2TNAtPauP/z+Hc85v+4xKZpfxsrq2vrGZmorvb2zu7efOThsSB4ITOqYMy5aNpKEUY/UFVWMtHxBkGsz0rTHt5HfnBAhKfdqauqTrouGHnUoRkpL/UzpzvUFnxAJFYe5CryGFjyHeYwzgWs9ULrsjCD973wohBR8RGDvYzWTNvxgWXwUogC5Kq9jOfnQHgUs8hRmSsm2ZvuqGSCiKGZmlO4EkPsJjNCRtjR5yieyG8YMzeKqVAXT0RQ73FIzV3xMhcqWcurbudJEayUvEv/z2oFyroh9fxAEQ/PFzkBi8KI0oIDKghWbKoBYUH1rRCPkEBY6UzTOgRr8eVlaBTyVjFfeChkyzdJHClwDE7AGbBACZRBVRBHWDwCJ7BK3gznowX4934mLeuGMnMEfhTxvcPQn+f7g=</latexit>

if Ekrfm(x; ξ) rfm(x)k2  σ2

<latexit sha1_base64="oSqsx+n4enfF0XPe2HJ93e3uRF8=">ACLHicbVDLSgMxFM3Ud31VXboJtoIuLDPjQsGNKIJLBauFpY7aGJpkhyYhl7Ae58VcEcaGIW7/DtHZRHwcCh3Pu5eacKBXcWN9/8woTk1PTM7NzxfmFxaXl0srqpUkyTVmNJiLR9QgME1yxmuVWsHqGchIsKuoezwr26ZNjxRF7aXsqaEjuIxp2Cd1Cod8xhXiAR7E0X5SZ/cEwWRABy35NbdASZ3fBv4HFxm9xfh0QwTAzvSLgOK61S2a/6Q+C/JBiRMhrhrFV6Ju2EZpIpSwUY0wj81DZz0JZTwfpFkhmWAu1ChzUcVSCZaebDsH286ZQ2jhPtnrJ4qI5v5CN6cnITQ5imd/eQPzPa2Q23m/mXKWZYp+H4ozgW2CB83hNteMWtFzBKjm7q+Y3oAGal2/RVdC8DvyX3IZVoPdangelg+PRnXMonW0gbZQgPbQITpFZ6iGKHpAT+gVvXmP3ov37n18jxa80c4a+gHv8wuvqYC</latexit>
slide-21
SLIDE 21

Plot

slide-22
SLIDE 22

Open questions

Meta-Learning

<latexit sha1_base64="9wBgO6sksKgNqJXrL3BFOSczoWE=">AB9HicbVA9SwNBEN3zM8avqKXNYhBsDHex0DJoY6EQwXxAcoS5zVyZG/v3N0LhJDfYWOhiK0/xs5/4ya5QhMfDzem2FmXpAIro3rfjsrq2vrG5u5rfz2zu7efuHgsK7jVDGsVjEqhmARsEl1gw3ApuJQogCgY1gcDP1G0NUmsfy0YwS9CPoSR5yBsZK/j0aOL9DUJLXqdQdEvuDHSZeBkpkgzVTuGr3Y1ZGqE0TIDWLc9NjD8GZTgTOMm3U40JsAH0sGWphAi1P54dPaGnVunSMFa2pKEz9fEGCKtR1FgOyMwfb3oTcX/vFZqwit/zGWSGpRsvihMBTUxnSZAu1whM2JkCTDF7a2U9UEBMzanvA3BW3x5mdTLJe+iVH4oFyvXWRw5ckxOyBnxyCWpkFtSJTXCyBN5Jq/kzRk6L8678zFvXGymSPyB87nD3JHkeI=</latexit>

We can learn an ”improvable” model

<latexit sha1_base64="cf1mxSBI3k5WISD1FcFGxPGR48=">AC3icbZC7TgJBFIZn8Y431NJmAjGxIrtYaEm0sdREhAQIOTscYMJcNjOzJITQ2/gqNhYaY+sL2Pk2DriFgn8yZf/nDMz548Twa0Lw68gt7K6tr6xuZXf3tnd2y8cHN5bnRqGNaFNo0YLAqusOa4E9hIDIKMBdbj4dWsXh+hsVyrOzdOsC2hr3iPM3De6hSKdaQMFBUIRlEPRS4To0fgLyhSqbsoOoVSWA7nosQZVAimW46hc9WV7NUonJMgLXNKExcewLGcSZwm+lFhNgQ+hj06MCibY9me8ypSfe6dKeNv4oR+fu74kJSGvHMvadEtzALtZm5n+1Zup6F+0JV0nqULGfh3qpoE7TWTC0yw0yJ8YegBnu/0rZAw5+PL+xCixZWX4b5Sjs7KldtKqXqZxbFJjkmRnJKInJMquSY3pEYeSBP5IW8Bo/Bc/AWvP+05oJs5oj8UfDxDVGamfY=</latexit>
slide-23
SLIDE 23

Open questions

Meta-Learning

<latexit sha1_base64="9wBgO6sksKgNqJXrL3BFOSczoWE=">AB9HicbVA9SwNBEN3zM8avqKXNYhBsDHex0DJoY6EQwXxAcoS5zVyZG/v3N0LhJDfYWOhiK0/xs5/4ya5QhMfDzem2FmXpAIro3rfjsrq2vrG5u5rfz2zu7efuHgsK7jVDGsVjEqhmARsEl1gw3ApuJQogCgY1gcDP1G0NUmsfy0YwS9CPoSR5yBsZK/j0aOL9DUJLXqdQdEvuDHSZeBkpkgzVTuGr3Y1ZGqE0TIDWLc9NjD8GZTgTOMm3U40JsAH0sGWphAi1P54dPaGnVunSMFa2pKEz9fEGCKtR1FgOyMwfb3oTcX/vFZqwit/zGWSGpRsvihMBTUxnSZAu1whM2JkCTDF7a2U9UEBMzanvA3BW3x5mdTLJe+iVH4oFyvXWRw5ckxOyBnxyCWpkFtSJTXCyBN5Jq/kzRk6L8678zFvXGymSPyB87nD3JHkeI=</latexit>

We can learn an ”improvable” model

<latexit sha1_base64="cf1mxSBI3k5WISD1FcFGxPGR48=">AC3icbZC7TgJBFIZn8Y431NJmAjGxIrtYaEm0sdREhAQIOTscYMJcNjOzJITQ2/gqNhYaY+sL2Pk2DriFgn8yZf/nDMz548Twa0Lw68gt7K6tr6xuZXf3tnd2y8cHN5bnRqGNaFNo0YLAqusOa4E9hIDIKMBdbj4dWsXh+hsVyrOzdOsC2hr3iPM3De6hSKdaQMFBUIRlEPRS4To0fgLyhSqbsoOoVSWA7nosQZVAimW46hc9WV7NUonJMgLXNKExcewLGcSZwm+lFhNgQ+hj06MCibY9me8ypSfe6dKeNv4oR+fu74kJSGvHMvadEtzALtZm5n+1Zup6F+0JV0nqULGfh3qpoE7TWTC0yw0yJ8YegBnu/0rZAw5+PL+xCixZWX4b5Sjs7KldtKqXqZxbFJjkmRnJKInJMquSY3pEYeSBP5IW8Bo/Bc/AWvP+05oJs5oj8UfDxDVGamfY=</latexit>

min

x

1 m

M

X

m=1

fm(x γrfm(x))

<latexit sha1_base64="Q/icloOoJWldrnOyvZ5panAg1c=">ACJXicbVBNSwMxEM36WetX1aOXYBHswbJbBT0oFL14ERSsCt26zKbZGpklyQrLcv+GS/+FS8eFBE8+VdMPw5qfTDweG+GmXlhwpk2rvpTE3PzM7NFxaKi0vLK6ultfVrHaeK0AaJeaxuQ9CUM0kbhlObxNFQYSc3oTd04F/80CVZrG8Mv2EtgR0JIsYAWOloHTkCyaDHvYjBSTz8kzkvk5FkIljL787x1Egdnp4F/sdEAKwLyHkMFIrlaBUdqvuEHiSeGNSRmNcBKU3vx2TVFBpCAetm56bmFYGyjDCaV70U0TIF3o0KalEgTVrWz4ZY63rdLGUaxsSYOH6s+JDITWfRHaTgHmXv/1BuJ/XjM10WErYzJDZVktChKOTYxHkSG20xRYnjfEiCK2VsxuQebl7HBFm0I3t+XJ8l1rertVWuX+X6yTiOAtpEW2gHegA1dEZukANRNAjekav6M15cl6cd+dj1DrljGc20C84X9/fa6Qu</latexit>
slide-24
SLIDE 24

Reference

Better Communication Complexity for Local SGD arXiv:1909.04746 First Analysis of Local GD on Heterogeneous Data arXiv:1909.04715 NeurIPS workshop on Federated Learning http://federated-learning.org/fl-neurips-2019/