obtaining adjustable regularization for free via iterate
play

Obtaining Adjustable Regularization for Free via Iterate Averaging - PowerPoint PPT Presentation

Obtaining Adjustable Regularization for Free via Iterate Averaging Jingfeng Wu , Vladimir Braverman, Lin F. Yang Johns Hopkins University & UCLA June 2020 <latexit


  1. Obtaining Adjustable Regularization for Free via Iterate Averaging Jingfeng Wu , Vladimir Braverman, Lin F. Yang Johns Hopkins University & UCLA June 2020

  2. <latexit sha1_base64="/BQPh6/OvMQrEs/7E6XsCNoHT0c=">ACNXicbVBNb9NAEF2XrxK+Ahy5jFohpYqI7AqJckCK4MKh4JIWymOrPF6nKyXlu740aR5X/BL+HC/+BEDxAiCt/gU3SA7Q8aWn92b0Zl9aeU4DM+DrWvXb9y8tX27c+fuvfsPug8fHbuytpJGstSlPU3RkVaGRqxY02lCYtU0k6f7PyT87IOlWaD7ysaFLg1KhcSWQvJd3DRdLM+1EL8AoWyRyeQUyMEGvKuRcbTDXCYW+xB31o4nVcYylr/YDPyLCF9ysztmo6472kuxsOwjXgKokuyO5wJ+5/PB8uj5LulzgrZV2QYanRuXEUVjxp0LKSmtpOXDuqUM5xSmNPDRbkJs36jBaeiWDvLT+GYa1+vdGg4VzyL1kwXyzF32VuL/vHN+cGkUaqmYzcBOW1Bi5hVSFkypJkvfQEpVX+VpAztCjZF93xJUSXv3yVHO8PoueDl+98G6/FBtvidgRPRGJF2Io3ojMRJSfBJfxXfxI/gcfAt+Br82o1vBxc5j8Q+C38AktCr+A=</latexit> <latexit sha1_base64="NgkEF/pzBUBbtk0dDOe5Z6kCGk=">ACE3icbVDLSgMxFM3UV62vqks3wSJUhTIjgnZXdCPiop9QGcomUzahmYyQ5KxlGH+wY1bP8ONC0XcunHnb/gFptMutPVA4HDOfeW4IaNSmeaXkZmbX1hcyi7nVlbX1jfym1t1GUQCkxoOWCaLpKEU5qipGmqEgyHcZabj985HfuCNC0oDfqmFIHB91Oe1QjJSW2vkD26e8HQ8SeFUc7MNDGNvp0FgQL4E205M8lMAbzBbNkpoCzxJqQqWMLp3vwmO1nf+0vQBHPuEKMyRlyzJD5cRIKIoZSXJ2JEmIcB91SUtTjnwinThdn8A9rXiwEwj9uIKp+rsjRr6UQ9/VlT5SPTntjcT/vFakOqdOTHkYKcLxeFEnYlAFcBQ9KgWLGhJgLqm+FuIcEwkrHmNMhWNfniX1o5J1XCpf6zTOwBhZsAN2QRFY4ARUwAWoghrA4B48gRfwajwYz8ab8T4uzRiTnm3wB8bHD4ocoFU=</latexit> <latexit sha1_base64="WqLor5Nc19eHJ3+zI7DqK83WEhc=">ACnicbVC7TsMwFHXKq5RXgJHFtEKqGKoEIRW2ChbGItGH1ITIcdzWqvOQ7VBVUWYW/oBvYGEAIVYWVjYEH4ObdoCWI1k6Ouc+fI8bMSqkYXxquYXFpeWV/GphbX1jc0vf3mKMOaYNHDIQt52kSCMBqQhqWSkHXGCfJeRljs4H/utG8IFDYMrOYqI7aNeQLsUI6kR98fOgNoyRAOrw+dxMoGJpx4KbSYmuKh1NFLRsXIAOeJOSWlWrH8/V9v687+oflhTj2SAxQ0J0TCOSdoK4pJiRtGDFgkQID1CPdBQNkE+EnWSbU3igFA92Q65eIGm/u5IkC/EyHdVpY9kX8x6Y/E/rxPL7omd0CKJQnwZFE3ZlCdPs4FepQTLNlIEYQ5VX+FuI84wlKlV1AhmLMnz5PmUcU8rpxeqjTOwAR5sAeKoAxMUAU1cAHqoAEwuAUP4Ak8a3fao/aivU5Kc9q0Zxf8gfb2A2ZJno=</latexit> Searching optimal hyperparameter ML/Opt problem min w L ( w ) + λ R ( w ) Regularization Main loss Hyperparameter GD/SGD w k +1 = w k � η ( r L ( w ) + λ R ( w )) w k → w ∗ λ Learning rate/step size

  3. Re-running the optimizer is expensive! L ResNet-50 + ImageNet + 8 GPUs • A single round of training takes about 3 days . • Almost a year to try a hundred different hyperparameters. Can we obtain adjustable regularization for free ?

  4. Iterate averaging => regularization (Neu et al.) Contour of 𝑀(𝑥) Solution for min 𝑀 𝑥 + 𝜇𝑆 𝑥 SGD Path for Geometric solving min 𝑀 𝑥 averaging

  5. Iterate averaging protocol • Require: A stored opt. path • Input: a hyperparameter 𝜇 • Compute a weighting scheme • Average the path • Output: the regularized solution Iterate averaging is cheap J But Neu et al.’s result is limited L

  6. <latexit sha1_base64="f25a0wzOI/NgSEYzE82YR0fnMxg=">ACBXicbVC7SgNBFJ2NrxhfUQsRLQaDEJuwGwRjIQRtLKOYB2TXZXYymwyZfTAzawibNDb+io2FIrb+gthpY+tnOHkUmnjgwuGce7n3HidkVEhd/9ASM7Nz8wvJxdTS8srqWnp9oyKCiGNSxgELeM1BgjDqk7KkpFayAnyHEaqTvts4FdvCBc08K9kNySWh5o+dSlGUkl2evcy2zmAJ9B0OcKx0Y/zfWj2OmbPzl/n7XRGz+lDwGlijEmWPh62/r83i7Z6XezEeDI7EDAlRN/RQWjHikmJG+ikzEiREuI2apK6ojzwirHj4R/uK6UB3YCr8iUcqr8nYuQJ0fUc1ekh2RKT3kD8z6tH0i1YMfXDSBIfjxa5EYMygINIYINygiXrKoIwp+pWiFtI5SFVcCkVgjH58jSp5HPGYe74QqVxCkZIgh2wB7LAEegCM5BCZQBrfgHjyCJ+1Oe9CetZdRa0Ibz2yCP9BefwCil5su</latexit> <latexit sha1_base64="FAcAPX0VgpeDU5bzVZuBTWV3X4=">ACGnicbVDLSsNAFJ34rPUVdelmsAi6sCRFUBdC0YUuXFToQ2jSMJlO6uBkEmYm1pDmO9z4Bf6DGxeKuBM3/o3T1oWvAxcO59zLvf4MaNSWdaHMTE5NT0zW5grzi8sLi2bK6tNGSUCkwaOWCQufCQJo5w0FWMXMSCoNBnpOVfHQ/91jURka8rtKYuCHqcRpQjJSWPNM+2+pvw0PoBALhzM4znkNHJqGX0UM73DoDPqdOryBOzB1Bl6lU/HMklW2RoB/if1FStUTeO94g17NM9+cboSTkHCFGZKybVuxcjMkFMWM5EUnkSRG+Ar1SFtTjkIi3Wz0Wg43tdKFQSR0cQVH6veJDIVSpqGvO0OkLuVvbyj+57UTFey7GeVxogjH40VBwqCK4DAn2KWCYMVSTRAWVN8K8SXSISmdZlGHYP9+S9pVsr2bvngXKdxBMYogHWwAbaADfZAFZyCGmgADG7BA3gCz8ad8Wi8GK/j1gnja2YN/IDx/gkcCqIH</latexit> <latexit sha1_base64="DN39c4r+tOUEleqYdUsr2NFJXpY=">ACInicbVDLSgMxFM34rPVdekmKIKiLTMiaBdi0YUuFawVOrVkMndqaGYmJBmxDPMtbvwVNy6U6koQ/BXTx0KtFwIn59x7T3I8wZnStv1hjY1PTE5N52bys3PzC4uFpeUrFSeSQpXGPJbXHlHAWQRVzTSHayGBhB6Hmtc+6em1O5CKxdGl7ghohKQVsYBRog3VLJRFs40P8aZTFvipr3jYmGubiAJTZ0sdbZTt2+SvAzl5vFPslc0CRrFtbtkt0vPAqcIVivH174sHp+fNwpvrxzQJIdKUE6Xqji10IyVSM8ohy7uJAkFom7SgbmBEQlCNtO+e4Q3D+DiIpTmRxn3250RKQqU6oWc6Q6Jv1V+tR/6n1RMdHDRSFolEQ0QHRkHCsY5xLy/sMwlU84BhEpm3orpLTHxaJNq3oTg/P3yKLjaLTl7pfKFSeMYDSqHVtEa2kQO2kcVdIbOURVR9ICe0At6tR6tZ6trvQ9ax6zhzAr6VdbnN7MkplI=</latexit> <latexit sha1_base64="NgkEF/pzBUBbtk0dDOe5Z6kCGk=">ACE3icbVDLSgMxFM3UV62vqks3wSJUhTIjgnZXdCPiop9QGcomUzahmYyQ5KxlGH+wY1bP8ONC0XcunHnb/gFptMutPVA4HDOfeW4IaNSmeaXkZmbX1hcyi7nVlbX1jfym1t1GUQCkxoOWCaLpKEU5qipGmqEgyHcZabj985HfuCNC0oDfqmFIHB91Oe1QjJSW2vkD26e8HQ8SeFUc7MNDGNvp0FgQL4E205M8lMAbzBbNkpoCzxJqQqWMLp3vwmO1nf+0vQBHPuEKMyRlyzJD5cRIKIoZSXJ2JEmIcB91SUtTjnwinThdn8A9rXiwEwj9uIKp+rsjRr6UQ9/VlT5SPTntjcT/vFakOqdOTHkYKcLxeFEnYlAFcBQ9KgWLGhJgLqm+FuIcEwkrHmNMhWNfniX1o5J1XCpf6zTOwBhZsAN2QRFY4ARUwAWoghrA4B48gRfwajwYz8ab8T4uzRiTnm3wB8bHD4ocoFU=</latexit> <latexit sha1_base64="FhViANzgI41mA7lEGHD9+SXO/+o=">ACDXicbZDLSsNAFIYnXmu9RV0qMlgFQShJEdRd0Y3LFuwF2hAmk0k7dDIJMxOlhC7duPFV3AhVxK17dz6DL+E07UJbfzjw8Z9zmDm/FzMqlWV9GXPzC4tLy7mV/Ora+samubVdl1EiMKnhiEWi6SFJGOWkpqhipBkLgkKPkYbXuxr1G7dESBrxG9WPiROiDqcBxUhpyzUPY9eGd7pOYOyWNJU0tbEfKZlZPW31XLNgFa1McBbsCRTKe8Pq9/3+sOKan20/wklIuMIMSdmyrVg5KRKYkYG+XYiSYxwD3VISyNHIZFOml0zgEfa8WEQCV1cwcz9vZGiUMp+6OnJEKmunO6NzP96rUQF505KeZwowvH4oSBhUEVwFA30qSBYsb4GhAXVf4W4iwTCSgeY1yHY0yfPQr1UtE+LF1WdxiUYKwd2wQE4BjY4A2VwDSqgBjB4AE/gBbwaj8az8Wa8j0fnjMnODvgj4+MHgBqcaw=</latexit> <latexit sha1_base64="wYnVpEecl+QCJ/GPzD21/2VLADQ=">AC3icbVA9SwNBEN3zM8avqKXNkiBEguFOBLUQgjYWFhGMCeTCMbfZJMvt7R27e4ZwpLex83fYWChi6x+wy79xk1io8cHA470Zub5MWdK2/bImptfWFxazqxkV9fWNzZzW9u3KkokoTUS8Ug2fFCUM0FrmlOG7GkEPqc1v3gYuzX76hULBI3ehDTVghdwTqMgDaSl8v3vTQoOUOMz3DfC/ABdqkG7ArwOeCrYn8fe7mCXbYnwLPE+SaFSt4tPY4qg6qX+3TbEUlCKjThoFTsWPdSkFqRjgdZt1E0RhIAF3aNFRASFUrnfwyxHtGaeNOJE0JjSfqz4kUQqUGoW86Q9A9dcbi/95zUR3TlopE3GiqSDTRZ2EYx3hcTC4zSQlmg8MASKZuRWTHkg2sSXNSE4f1+eJbeHZeofHpt0jhHU2TQLsqjInLQMaqgS1RFNUTQPXpCL+jVerCerTfrfdo6Z3P7KBfsD6+AM2Im1o=</latexit> Formally, Neu et al. shows n • Linear regression L ( w ) = 1 k w T x � y k 2 X 2 n i =1 • ℓ ! -regularization R ( w ) = 1 2 k w k 2 2 • GD/SGD path w k +1 = w k � η r L ( w ) • Geometric averaging 1 p k = (1 − p ) p k , p = 1 + λη solves min w L ( w ) + λ R ( w ) p 1 w 1 + p 2 w 2 + · · · + p k w k

  7. Our contributions: J J J J Iterate averaging works for more general 1. regularizers <= generalized ℓ ! -regularizer 2. optimizers <= Nesterov’s acceleration 3. objectives <= strongly convex and smooth losses 4. deep neural networks! (Empirically)

  8. <latexit sha1_base64="70MwI2fPbgrJ7PozSD5+vfNk4j4=">ACEnicbVA9SwNBEN2LXzF+RS1tlohgEMOdCGohBG0sLBIwMZCLx9xmkyzZ2zt29wzhyG+wyV+xsVDE1sou/8bNR6HGBwOP92aYmedHnClt2yMrtbC4tLySXs2srW9sbmW3d6oqjCWhFRLyUNZ8UJQzQSuaU5rkaQ+Jze+93rsX/SKViobjT/Yg2AmgL1mIEtJG8bL7nJd0jZ4DxJe5XyMXaoBlx+SYyO6AnwO+Pawl8dedt8u2BPgeLMyH4x5x4NR8V+yct+uc2QxAEVmnBQqu7YkW4kIDUjnA4ybqxoBKQLbVo3VEBAVSOZvDTAB0Zp4lYoTQmNJ+rPiQCpfqBbzoD0B31xuL/3n1WLfOGwkTUaypINFrZhjHeJxPrjJCWa9w0BIpm5FZMOSCDapJgxITh/X54n1ZOCc1q4KJs0rtAUabSHcugQOegMFdENKqEKIugJPaNX9GYNrRfr3fqYtqas2cwu+gXr8xth2J3F</latexit> <latexit sha1_base64="ws5cJAX3oCOTYhK7CiVfRA+KIzQ=">ACBnicbVDLSsNAFJ34rPUVdSnCYBHqpiSloC6EohuXrdgHNLFMpN26CQTZiaWErJy46+4caGIW7/BnX/jtM1CWw9cOJxzL/fe40WMSmVZ38bS8srq2npuI7+5tb2za+7tNyWPBSYNzBkXbQ9JwmhIGoqRtqRICjwGl5w+uJ3ogQlIe3qlxRNwA9UPqU4yUlrm0W1xdAovoeMLhBM7TcopHN07ikewDkds2CVrCngIrEzUgAZal3zy+lxHAckVJghKTu2FSk3QUJRzEiad2JIoSHqE86moYoINJNpm+k8EQrPehzoStUcKr+nkhQIOU48HRngNRAznsT8T+vEyv/3E1oGMWKhHi2yI8ZVBxOMoE9KghWbKwJwoLqWyEeIB2I0snldQj2/MuLpFku2ZXSRb1SqF5lceTAITgGRWCDM1AFN6AGgCDR/AMXsGb8WS8GO/Gx6x1ychmDsAfGJ8/tb2XZw=</latexit> <latexit sha1_base64="FhViANzgI41mA7lEGHD9+SXO/+o=">ACDXicbZDLSsNAFIYnXmu9RV0qMlgFQShJEdRd0Y3LFuwF2hAmk0k7dDIJMxOlhC7duPFV3AhVxK17dz6DL+E07UJbfzjw8Z9zmDm/FzMqlWV9GXPzC4tLy7mV/Ora+samubVdl1EiMKnhiEWi6SFJGOWkpqhipBkLgkKPkYbXuxr1G7dESBrxG9WPiROiDqcBxUhpyzUPY9eGd7pOYOyWNJU0tbEfKZlZPW31XLNgFa1McBbsCRTKe8Pq9/3+sOKan20/wklIuMIMSdmyrVg5KRKYkYG+XYiSYxwD3VISyNHIZFOml0zgEfa8WEQCV1cwcz9vZGiUMp+6OnJEKmunO6NzP96rUQF505KeZwowvH4oSBhUEVwFA30qSBYsb4GhAXVf4W4iwTCSgeY1yHY0yfPQr1UtE+LF1WdxiUYKwd2wQE4BjY4A2VwDSqgBjB4AE/gBbwaj8az8Wa8j0fnjMnODvgj4+MHgBqcaw=</latexit> <latexit sha1_base64="NgkEF/pzBUBbtk0dDOe5Z6kCGk=">ACE3icbVDLSgMxFM3UV62vqks3wSJUhTIjgnZXdCPiop9QGcomUzahmYyQ5KxlGH+wY1bP8ONC0XcunHnb/gFptMutPVA4HDOfeW4IaNSmeaXkZmbX1hcyi7nVlbX1jfym1t1GUQCkxoOWCaLpKEU5qipGmqEgyHcZabj985HfuCNC0oDfqmFIHB91Oe1QjJSW2vkD26e8HQ8SeFUc7MNDGNvp0FgQL4E205M8lMAbzBbNkpoCzxJqQqWMLp3vwmO1nf+0vQBHPuEKMyRlyzJD5cRIKIoZSXJ2JEmIcB91SUtTjnwinThdn8A9rXiwEwj9uIKp+rsjRr6UQ9/VlT5SPTntjcT/vFakOqdOTHkYKcLxeFEnYlAFcBQ9KgWLGhJgLqm+FuIcEwkrHmNMhWNfniX1o5J1XCpf6zTOwBhZsAN2QRFY4ARUwAWoghrA4B48gRfwajwYz8ab8T4uzRiTnm3wB8bHD4ocoFU=</latexit> 1. Generalized ℓ " -regularization R ( w ) = 1 2 w > Qw Use a preconditioned GD/SGD path instead! w k +1 = w k � η Q − 1 r L ( w ) solves min w L ( w ) + λ R ( w ) p 1 w 1 + p 2 w 2 + · · · + p k w k

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend