escaping saddle points with adaptive gradient methods
play

Escaping Saddle Points with Adaptive Gradient Methods Matthew Staib - PowerPoint PPT Presentation

Escaping Saddle Points with Adaptive Gradient Methods Matthew Staib 1 , Sashank Reddi 2 , Satyen Kale 2 , Sanjiv Kumar 2 , Suvrit Sra 1 1. MIT EECS 2. Google Research, New York Escaping Saddle Points with Adaptive Gradient Methods Matthew


  1. Escaping Saddle Points with 
 Adaptive Gradient Methods Matthew Staib 1 , Sashank Reddi 2 , Satyen Kale 2 , Sanjiv Kumar 2 , Suvrit Sra 1 1. MIT EECS 2. Google Research, New York

  2. Escaping Saddle Points with Adaptive Gradient Methods — Matthew Staib Adam, RMSProp and friends

  3. Escaping Saddle Points with Adaptive Gradient Methods — Matthew Staib Adam, RMSProp and friends • Empirically: good non-convex performance

  4. Escaping Saddle Points with Adaptive Gradient Methods — Matthew Staib Adam, RMSProp and friends • Empirically: good non-convex performance • Limited theory, some non-convergence results [e.g. Reddi et al. ‘18]

  5. Escaping Saddle Points with Adaptive Gradient Methods — Matthew Staib Adam, RMSProp and friends • Empirically: good non-convex performance • Limited theory, some non-convergence results [e.g. Reddi et al. ‘18] • Our take: adaptive methods escape saddles (in words: via isotropic noise), reach SOSPs

  6. Escaping Saddle Points with Adaptive Gradient Methods — Matthew Staib Adam, RMSProp and friends • Empirically: good non-convex performance • Limited theory, some non-convergence results [e.g. Reddi et al. ‘18] • Our take: adaptive methods escape saddles (in words: via isotropic noise), reach SOSPs This paper: The first second -order rates for adaptive methods

  7. <latexit sha1_base64="jiG38X1I4R8CDVse1OhQyU5neE=">AFVXicdZTNbhMxEMfdkIJH23h2MuqVaVKiCrbHuCVNQWOBRqn5J2Sjy7k4SU6+9smdJImvPA1XeBPEjVfgAZCYTdIm6SWdv3fmd/Y47HXYSqFxVrt98LivcrS/QfLD6uPHj95urK69uzC6sxEcB5pqc1VyC1IoeAcBUq4Sg3wJRwGV4fFP7Lr2Cs0OoM+yk0Et5WoiUijmRqrm70mg5f+LkXSGghN0Z3vV4TvZdeAMi9Nsnm6mZtpzZo3qzwR2Jz39/+4cxdtJcq6wHsY6yBRGkltb92spNhw3KCIJeTXILKQ8uZtqJNUPAHbcIPF5N4WKvpQ09Cr2BdTLC8cTafhISmXDs2GlfYZznq2fYet1wQqUZgoqGE7Uy6aH2isp4sTAQoeyT4JERlKsXdbjhEVL9SrP0hqmWZ5ZtTUGdZMacWsioHDouVn4IVBEDHym7TykYjtq4Ch3QZFvGLqjPK9WAwXdSCcJV7ELlBa0vSEhKsvLrCIWpLnECTyfE5cVsS952Si9XoGUskjsN5Bc29qJNHWCog9pqMT8yYdiK0J2p+hsavH9G6Z3q1O78mgLIMxsAPaQOIkFDm540FXt1R4Oo6NEhJpoyUVq5+7g1t5F6q+kD0zULC3ej6cGp1qKwpH7k4mPibwEWuxL8HF0BJqiJQGvW5OxuJknciyh2O9fycoMeTlH4NdzQS8zH6l7IkHY75dqznw/Rwc52702E/H2rR8c7du+J9Z2ktGi4UDkp7o8dwebPboBNA2qhqcCMdQg+7IsbOmz2hcrpO/OnLY1Zc7O74tR3/s7+5f8qGbZmtsw2zXz2iu2zD+yEnbOIfWPf2Q/2s/Kr8m+JbskhurgwinOSm1p5T/zs+sa</latexit> <latexit sha1_base64="3mLW/XsJ4MDq1tCKEDyxz9rYh5Y=">AFVXicdZTdbtMwFMc96GCUj21wuQuiTZMmIaZmu4AbpKFtgNAQY9qXtFSVk5y2Zo4d2Se0lZV7noYrJHgTxD0vwAMgcdJ2a9N2lhL/c87v2MfHjsNUCou12u+5W7cr83fuLtyr3n/w8NHi0vLjU6szE8FJpKU25yG3IWCExQo4Tw1wJNQwl4uVv4z76AsUKrY+ylUE94S4miDiSqbG02m04fObnXiChidwY3fG6DfSewEg91okG0trtc1av3nTwh+KtR1/4+f7+fHjaWKytBrKMsAYWR5NZe+LU64bFJGEvBpkFlIeXfIWXJBUPAFbd/3F5N46WKvqQ09Cr2+dTzC8cTaXhISmXBs20lfYZzlu8iw+bLuhEozBUNJmpm0kPtFZXxYmEgQtkjwSMjKFcvanPDI6T6lWbpDlItzyxbmoLayZQ5tZBROXRcrHwPqCIGPlB2H1MwHLVxwX7ugiLfMHT7eV6tBgo6kU4SrmIXKC1oe0NCVJaXWARtSTPHkjk+Yy4rIh7y8lE6/UMpJHYL3dxvbESKlFRB7QEcn5g06EOtjtD9FY0eP6K0yvVWd3JN+WfpjYBu0gcRJKHJyB/3uwlLh6TjWS0ikjZUrF7udq/lTaj6TPbMQMFe69lwanSqrSgcuTsc+xjDh6zFngQXQ1OoAVIaNjn7ngoSt6xKLc30rNzgi5PUvo13P5QzMboX8qSdDm65GeDdPDzWXujgb9bKhJxzt3b4r3jaW1aLhQ2C/tlR7B5c1ugU4AaOqwZV0CF3siBjbr7aFyuk68Scvj2lxurXp1zb9T/7azhEbtAW2wlbZBvPZC7bD3rFDdsIi9pV9Yz/Yz8qvyr95uiUH6K25YcwTVmrzi/8BLUrsOw=</latexit> <latexit sha1_base64="3mLW/XsJ4MDq1tCKEDyxz9rYh5Y=">AFVXicdZTdbtMwFMc96GCUj21wuQuiTZMmIaZmu4AbpKFtgNAQY9qXtFSVk5y2Zo4d2Se0lZV7noYrJHgTxD0vwAMgcdJ2a9N2lhL/c87v2MfHjsNUCou12u+5W7cr83fuLtyr3n/w8NHi0vLjU6szE8FJpKU25yG3IWCExQo4Tw1wJNQwl4uVv4z76AsUKrY+ylUE94S4miDiSqbG02m04fObnXiChidwY3fG6DfSewEg91okG0trtc1av3nTwh+KtR1/4+f7+fHjaWKytBrKMsAYWR5NZe+LU64bFJGEvBpkFlIeXfIWXJBUPAFbd/3F5N46WKvqQ09Cr2+dTzC8cTaXhISmXBs20lfYZzlu8iw+bLuhEozBUNJmpm0kPtFZXxYmEgQtkjwSMjKFcvanPDI6T6lWbpDlItzyxbmoLayZQ5tZBROXRcrHwPqCIGPlB2H1MwHLVxwX7ugiLfMHT7eV6tBgo6kU4SrmIXKC1oe0NCVJaXWARtSTPHkjk+Yy4rIh7y8lE6/UMpJHYL3dxvbESKlFRB7QEcn5g06EOtjtD9FY0eP6K0yvVWd3JN+WfpjYBu0gcRJKHJyB/3uwlLh6TjWS0ikjZUrF7udq/lTaj6TPbMQMFe69lwanSqrSgcuTsc+xjDh6zFngQXQ1OoAVIaNjn7ngoSt6xKLc30rNzgi5PUvo13P5QzMboX8qSdDm65GeDdPDzWXujgb9bKhJxzt3b4r3jaW1aLhQ2C/tlR7B5c1ugU4AaOqwZV0CF3siBjbr7aFyuk68Scvj2lxurXp1zb9T/7azhEbtAW2wlbZBvPZC7bD3rFDdsIi9pV9Yz/Yz8qvyr95uiUH6K25YcwTVmrzi/8BLUrsOw=</latexit> <latexit sha1_base64="1GK9+Wvf7hqJHTCGvdPZpek+u0s=">AFVXicdZTfb9MwEMcz6GCUH9vgcS/RpklIiKnZHuAFaWgb8DEmPZLaqrKca6tmWNH9oW2svLOX8Mr/CeIPwaJS5utTdtZSvzN3efs89lxlEphsdH4u3Tvfm35wcOVR/XHT54+W1bf35hdWY4nHMtbmKmAUpFJyjQAlXqQGWRBIuo+uDwn/5HYwVWp3hMIVWwrpKdARnSKb2uag7fBVkPuhA4yY3TfH7TRf+2HgMzvkmyvbTV2GqPmz4ugFte2U7a67WNMNY8S0Ahl8zaZtBIseWYQcEl5PUws5Ayfs260CSpWAK25UaLyf1tsR+Rxt6FPoj63SEY4m1wyQiMmHYs7O+wrjI18yw87blhEozBMXHE3Uy6aP2i8r4sTDAUQ5JMG4E5erzHjOMI9WvMstgnGp1ZtnVFNRL5syphYzKoeNi5YdAFTHwmbL7koJhqI0Lj3IXFvlGkTvK83o9VNDnOkmYil2otKDtjQhRWV51gUXUkjyHIJHlC+KyIu4jIxOt1zeQSsbB+gftvZmRFcrIPaYjk7M2nQgtqfoYI7Gvp7Qu1V6tz67J6OyjMbAHmgDiZNQ5OSOR13TUuHpOLYqCNdGSyrWMHcHt/IuVH0je2agYG/1Yjg1OtVWFI7cnUx9TOEla3EowcXQEWqMVAYq+9ydlaLinYpyhxO9OCcYsCSlX8MdlWIxRv9SlqTjMd9P9GKYHmauc3c67hdDHTreuftQvO8srUXDhMJRaW/0BK5udhd0AkgbVQ9vpEMYF/E2Hu3J1RO10kwe3nMi4vdnaCxE3wNtvZPy4tlxdvwNr2XuC98fa9T96Jd+5x74f30/vl/a79qf1bpltyjN5bKmNeJW2vPofwSbosA=</latexit> Escaping Saddle Points with Adaptive Gradient Methods — Matthew Staib x t +1 ← x t − η g t

  8. <latexit sha1_base64="jiG38X1I4R8CDVse1OhQyU5neE=">AFVXicdZTNbhMxEMfdkIJH23h2MuqVaVKiCrbHuCVNQWOBRqn5J2Sjy7k4SU6+9smdJImvPA1XeBPEjVfgAZCYTdIm6SWdv3fmd/Y47HXYSqFxVrt98LivcrS/QfLD6uPHj95urK69uzC6sxEcB5pqc1VyC1IoeAcBUq4Sg3wJRwGV4fFP7Lr2Cs0OoM+yk0Et5WoiUijmRqrm70mg5f+LkXSGghN0Z3vV4TvZdeAMi9Nsnm6mZtpzZo3qzwR2Jz39/+4cxdtJcq6wHsY6yBRGkltb92spNhw3KCIJeTXILKQ8uZtqJNUPAHbcIPF5N4WKvpQ09Cr2BdTLC8cTafhISmXDs2GlfYZznq2fYet1wQqUZgoqGE7Uy6aH2isp4sTAQoeyT4JERlKsXdbjhEVL9SrP0hqmWZ5ZtTUGdZMacWsioHDouVn4IVBEDHym7TykYjtq4Ch3QZFvGLqjPK9WAwXdSCcJV7ELlBa0vSEhKsvLrCIWpLnECTyfE5cVsS952Si9XoGUskjsN5Bc29qJNHWCog9pqMT8yYdiK0J2p+hsavH9G6Z3q1O78mgLIMxsAPaQOIkFDm540FXt1R4Oo6NEhJpoyUVq5+7g1t5F6q+kD0zULC3ej6cGp1qKwpH7k4mPibwEWuxL8HF0BJqiJQGvW5OxuJknciyh2O9fycoMeTlH4NdzQS8zH6l7IkHY75dqznw/Rwc52702E/H2rR8c7du+J9Z2ktGi4UDkp7o8dwebPboBNA2qhqcCMdQg+7IsbOmz2hcrpO/OnLY1Zc7O74tR3/s7+5f8qGbZmtsw2zXz2iu2zD+yEnbOIfWPf2Q/2s/Kr8m+JbskhurgwinOSm1p5T/zs+sa</latexit> <latexit sha1_base64="3mLW/XsJ4MDq1tCKEDyxz9rYh5Y=">AFVXicdZTdbtMwFMc96GCUj21wuQuiTZMmIaZmu4AbpKFtgNAQY9qXtFSVk5y2Zo4d2Se0lZV7noYrJHgTxD0vwAMgcdJ2a9N2lhL/c87v2MfHjsNUCou12u+5W7cr83fuLtyr3n/w8NHi0vLjU6szE8FJpKU25yG3IWCExQo4Tw1wJNQwl4uVv4z76AsUKrY+ylUE94S4miDiSqbG02m04fObnXiChidwY3fG6DfSewEg91okG0trtc1av3nTwh+KtR1/4+f7+fHjaWKytBrKMsAYWR5NZe+LU64bFJGEvBpkFlIeXfIWXJBUPAFbd/3F5N46WKvqQ09Cr2+dTzC8cTaXhISmXBs20lfYZzlu8iw+bLuhEozBUNJmpm0kPtFZXxYmEgQtkjwSMjKFcvanPDI6T6lWbpDlItzyxbmoLayZQ5tZBROXRcrHwPqCIGPlB2H1MwHLVxwX7ugiLfMHT7eV6tBgo6kU4SrmIXKC1oe0NCVJaXWARtSTPHkjk+Yy4rIh7y8lE6/UMpJHYL3dxvbESKlFRB7QEcn5g06EOtjtD9FY0eP6K0yvVWd3JN+WfpjYBu0gcRJKHJyB/3uwlLh6TjWS0ikjZUrF7udq/lTaj6TPbMQMFe69lwanSqrSgcuTsc+xjDh6zFngQXQ1OoAVIaNjn7ngoSt6xKLc30rNzgi5PUvo13P5QzMboX8qSdDm65GeDdPDzWXujgb9bKhJxzt3b4r3jaW1aLhQ2C/tlR7B5c1ugU4AaOqwZV0CF3siBjbr7aFyuk68Scvj2lxurXp1zb9T/7azhEbtAW2wlbZBvPZC7bD3rFDdsIi9pV9Yz/Yz8qvyr95uiUH6K25YcwTVmrzi/8BLUrsOw=</latexit> <latexit sha1_base64="3mLW/XsJ4MDq1tCKEDyxz9rYh5Y=">AFVXicdZTdbtMwFMc96GCUj21wuQuiTZMmIaZmu4AbpKFtgNAQY9qXtFSVk5y2Zo4d2Se0lZV7noYrJHgTxD0vwAMgcdJ2a9N2lhL/c87v2MfHjsNUCou12u+5W7cr83fuLtyr3n/w8NHi0vLjU6szE8FJpKU25yG3IWCExQo4Tw1wJNQwl4uVv4z76AsUKrY+ylUE94S4miDiSqbG02m04fObnXiChidwY3fG6DfSewEg91okG0trtc1av3nTwh+KtR1/4+f7+fHjaWKytBrKMsAYWR5NZe+LU64bFJGEvBpkFlIeXfIWXJBUPAFbd/3F5N46WKvqQ09Cr2+dTzC8cTaXhISmXBs20lfYZzlu8iw+bLuhEozBUNJmpm0kPtFZXxYmEgQtkjwSMjKFcvanPDI6T6lWbpDlItzyxbmoLayZQ5tZBROXRcrHwPqCIGPlB2H1MwHLVxwX7ugiLfMHT7eV6tBgo6kU4SrmIXKC1oe0NCVJaXWARtSTPHkjk+Yy4rIh7y8lE6/UMpJHYL3dxvbESKlFRB7QEcn5g06EOtjtD9FY0eP6K0yvVWd3JN+WfpjYBu0gcRJKHJyB/3uwlLh6TjWS0ikjZUrF7udq/lTaj6TPbMQMFe69lwanSqrSgcuTsc+xjDh6zFngQXQ1OoAVIaNjn7ngoSt6xKLc30rNzgi5PUvo13P5QzMboX8qSdDm65GeDdPDzWXujgb9bKhJxzt3b4r3jaW1aLhQ2C/tlR7B5c1ugU4AaOqwZV0CF3siBjbr7aFyuk68Scvj2lxurXp1zb9T/7azhEbtAW2wlbZBvPZC7bD3rFDdsIi9pV9Yz/Yz8qvyr95uiUH6K25YcwTVmrzi/8BLUrsOw=</latexit> <latexit sha1_base64="1GK9+Wvf7hqJHTCGvdPZpek+u0s=">AFVXicdZTfb9MwEMcz6GCUH9vgcS/RpklIiKnZHuAFaWgb8DEmPZLaqrKca6tmWNH9oW2svLOX8Mr/CeIPwaJS5utTdtZSvzN3efs89lxlEphsdH4u3Tvfm35wcOVR/XHT54+W1bf35hdWY4nHMtbmKmAUpFJyjQAlXqQGWRBIuo+uDwn/5HYwVWp3hMIVWwrpKdARnSKb2uag7fBVkPuhA4yY3TfH7TRf+2HgMzvkmyvbTV2GqPmz4ugFte2U7a67WNMNY8S0Ahl8zaZtBIseWYQcEl5PUws5Ayfs260CSpWAK25UaLyf1tsR+Rxt6FPoj63SEY4m1wyQiMmHYs7O+wrjI18yw87blhEozBMXHE3Uy6aP2i8r4sTDAUQ5JMG4E5erzHjOMI9WvMstgnGp1ZtnVFNRL5syphYzKoeNi5YdAFTHwmbL7koJhqI0Lj3IXFvlGkTvK83o9VNDnOkmYil2otKDtjQhRWV51gUXUkjyHIJHlC+KyIu4jIxOt1zeQSsbB+gftvZmRFcrIPaYjk7M2nQgtqfoYI7Gvp7Qu1V6tz67J6OyjMbAHmgDiZNQ5OSOR13TUuHpOLYqCNdGSyrWMHcHt/IuVH0je2agYG/1Yjg1OtVWFI7cnUx9TOEla3EowcXQEWqMVAYq+9ydlaLinYpyhxO9OCcYsCSlX8MdlWIxRv9SlqTjMd9P9GKYHmauc3c67hdDHTreuftQvO8srUXDhMJRaW/0BK5udhd0AkgbVQ9vpEMYF/E2Hu3J1RO10kwe3nMi4vdnaCxE3wNtvZPy4tlxdvwNr2XuC98fa9T96Jd+5x74f30/vl/a79qf1bpltyjN5bKmNeJW2vPofwSbosA=</latexit> Escaping Saddle Points with Adaptive Gradient Methods — Matthew Staib x t +1 ← x t − η g t

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend