Adaptive Stochastic Natural Gradient Method for One-Shot Neural - PowerPoint PPT Presentation

Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search ○Youhei Akimoto (University of Tsukuba / RIKEN AIP) Shinichi Shirakawa (Yokohama National University) Nozomu Yoshinari (Yokohama National University) Kento Uchida (Yokohama National University) Shota Saito (Yokohama National University) Kouhei Nishida (Shinshu University)

Neural Architecture Neural Network Architectures often pre-trained on some datasets VGGNet ResNet Inception … Task （Dataset） Trial and Error! Sometimes... • a known architecture works well on our tasks. Happy! Other times... • Find a good one • Design a brand-new architecture and train it � 2

<latexit sha1_base64="HZyaidN5WsSwoTEAv8/MtN9ob3w=">ACtHicbZHfb9MwEMedjB+jbNDBIy8W1aCUJUMBLwgTfDC45DoNqkuleNcWjM7juzLusrKH8oD/wtOG01bx0mWPvqe7873dVYp6TBJ/kTxzoOHjx7vPuk93dt/9rx/8OLMmdoKGAujL3IuAMlSxijRAUXlQWuMwXn2eW3Nn9+BdZJU/7EVQVTzelLKTgGKRZf3nENL+eZpL5qGFsOWls2vt8ON9OYd7YAy1jtiCNfoXZ39BoEUTcPodgH9Qhm385u2y1tb5rN+oNklKyD3oe0gwHp4nR2EB2y3IhaQ4lCcecmaVLh1HOLUihoeqx2UHFxyecwCVhyDW7q1w419DAoOS2MDadEulZvV3iunVvpLNzUHBduO9eK/8tNaiw+T70sqxqhFJtBRa2CM7S1m+bSBqPUKgAXVoa3UrHglgsMn3JnSqbDpU1VzIHYbTmZe47HxvPTAWo7HtWkuJCyW1ROe7fJhLN9gLxqbNt6Hs+NR+n50/OPD4ORrZ/EueUVekyFJySdyQr6TUzImgvyNdqK9aD/+GLNYxLC5GkdzUtyJ+LyH1NQ1xc=</latexit> <latexit sha1_base64="vyiD93EaWMADRqvRUYSVe0cJzgA=">ACH3icbVC7TsMwFHXKq5RXgZHFoqoEqSMsCECgwFok+pCaqHNdprdpJZDtAFeUT+Ah+gIUVZjbESEc2PgP3MdCWI13do3Pu1bWPGzIqlWkOjNTC4tLySno1s7a+sbmV3d6pySASmFRxwALRcJEkjPqkqhipBEKgrjLSN3tXQ79+h0Rkgb+reqHxOGo41OPYqS01Moe2Bw9tGLb5fF9cgSHScJ9ArTymErmzOL5ghwnlgTkiufXT3nz79+Kq3st90OcMSJrzBDUjYtM1ROjISimJEkY0eShAj3UIc0NfURJ9KJRx9KYF4rbegFQpev4Ej9uxEjLmWfu3qSI9WVs95Q/M9rRso7dWLqh5EiPh4f8iIGVQCH6cA2FQr1tcEYUH1WyHuIoGw0hlOXF5ojOxZhOYJ7VS0Toulm50OBdgjDTYA/ugACxwAsrgGlRAFWDwCF7AK3gznox348P4HI+mjMnOLpiCMfgFCE6nCA=</latexit> … … One-Shot Neural Architecture Search Joint Optimization of Architecture c and Weights w Conv 3 x 3 0 NAS as hyper-parameter search W 1 1 c evaluation = 1 training Conv 5 x 5 max f ( w ∗ ( c ) , c ) 0 W 2 + c max x t x t+1 subject to w ∗ ( c ) = argmax f ( w , c ) 1 pooling w avg 0 One-shot NAS pooling optimization of x and c within 1 training max w , c f ( w , c ) w : (W1, W2) c : (0, 0, 1, 0) � 3

<latexit sha1_base64="HVsL6MfL3cRx/YNlI3wm1Sd7FAs=">ADBXichVFNb9NAEF2brxI+msKRy4qoqBUosgsSHCu4cCxS01bKmi8WSer7nqt3XFDZPnMr+GuHLlL/TfdGMbaBMkRlrpvTczejszaGkwyi6DMJbt+/cvbd1v/fg4aPH2/2dJyfOlJaLETfK2LMUnFAyFyOUqMRZYQXoVInT9PzDKn96IayTJj/GZSESDbNcZpIDemnS/8VSXS1q+oIpkSFYaxa0k15SJgonlS+rWqlmOaQK/tCM7rXoVdPD64YynAuEen+fsd41vmHxW1/36fS/Zi3/j9mkP4iGURN0E8QdGJAujiY7wS6bGl5qkSNX4Nw4jgpMKrAouRJ1j5VOFMDPYSbGHuaghUuqZuU13fXKlGbG+pcjbdTrHRVo5Y69ZUacO7WcyvxX7lxidm7pJ5UaLIeWuUlYqioav70am0gqNaegDcSv9XyudgaO/8g2XVPsZCmsu5FRwozXk04qBnWn4UlfMFMICGrsayFxrqSW6Kou731pC3t+sfH6GjfBycEwfj08+PRmcPi+W/EWeUaekz0Sk7fkHwkR2REeDAMjoMk+Bx+Db+F38MfbWkYdD1PyY0If14BKe/6VA=</latexit> Difficulties for Practitioners How to choose / tune the search strategy? Search Space Search Strategy Gradient-Based Method w w + ✏ w r w f ( w , c ( θ )) θ θ + ✏ θ r θ f ( w , c ( θ )) hyper-parameter: step-size Other Choices • Evolutionary Computation Based • Reinforcement Learning Based - how to treat integer variables such as #filters? - how to tune the hyper-parameters in such situations? � 4

<latexit sha1_base64="y6/6GQgAGnSQUJ1h9XWy4NyciY=">ACyXicjVHLThsxFHWGtlBoS4AlGwtUCUQVzdAFLFhEdIPKhkoNIGVC5PHcEAs/pvYdQmrNil/od/RrumHRfkudCVULZNErWTo651zfV1ZI4TCO7xrR3LPnL+YXi4uvXr9Zrm5snrqTGk5dLiRxp5nzIEUGjoUMJ5YGpTMJZdvVhop9dg3XC6M84LqCn2KUWA8EZBqrfPq4lY5GFx6rdzRFxAnapgf0D72T/IcQ8Ha/uRm34jroU5Dcg832Rrz7a49PumvNESaG14q0Mglc6bxAX2PLMouIRqMS0dFIxfsUvoBqiZAtfz9cgVfRuYnA6MDU8jrdl/MzxTzo1VFpyK4dA91ibkLK1b4mC/54UuSgTNp4UGpaRo6GR/NBcWOMpxAIxbEXqlfMgs4xi2/KBKpsIMhTXIgdulGI692F3lfdp3a7PZHBWab8qKpmeDn/67WQ104+0xnOMOvbFIeALGSE4ySPT/EUnO62kvet3U/hSodkGgtknWyQLZKQPdImR+SEdAgn38kP8pP8io6jL9FN9HVqjRr3OWvkQUS3vwEFZe4</latexit> <latexit sha1_base64="HLe5ikCOIzFH1404FMTVwWAgKc=">ADSnicbVFdb9MwFHWyAaN8dfDIi0UFKhqUZCDBC9IEkI8DYluk+q2chxntWo7kX2zqrLyA5H4A/wN3hAvOGkY69orWbo+95x7XuSQgoLUfQzCHd2b9y8tXe7c+fuvfsPuvsPT2xeGsaHLJe5OUuo5VJoPgQBkp8VhlOVSH6azD/W9dMLbqzI9TdYFnys6LkWmWAUPDTtfieLxcTBQVzhZ+/x6lLhA0x4YX0DOexagK+JFI+o+CIpomkKx/6beSF5gAQJ09rwjptJd/bdvSel8PNo0VhVmSuU9V/7LFxL30ys2RteRypm9+deq024sGURN4M4nbpIfaOJ7uB4KkOSsV18AktXYURwWMHTUgmORVh5SWF5TN6Tkf+VRTxe3YNTuv8FOPpDjLjT8acINeVTiqrF2qxDPrD9rtRrcVhuVkL0bO6GLErhmq0FZKTHkuDYQp8JwBnLpE8qM8G/FbEYNZeBtXpuSKP+HwuQXfo0sV4rqtHNOdI81/mlsnlFEuUWVbWFy9h/ruFpw2RbmbUxW9oSmHGgXuHNia9bsZmcHA7i14PDr296Rx9am/bQY/QE9VGM3qIj9BkdoyFiwatgGEyCafgj/BX+Dv+sqGHQah6htdjZ/Qt0fBdr</latexit> <latexit sha1_base64="ei1X4bv0b/x2an2mpDxfz7crXD4=">AC93icbVLdbtMwFHbC3xb+OrjkxmJC6iRUJZsQ0yS0CW4QVwPRbVJdVY7jtFbtOLJPVqrgh+AJuEPc8jwDrwDTlqdeuRLH8+5zvy8fc5LaWwEMe/g/DW7Tt3721tR/cfPHz0uLPz5MzqyjDeZ1pqc5FSy6UoeB8ESH5RGk5VKvl5On3X1M8vubFCF59hXvKhouNC5IJR8KlRB4iX0Y1mc1eYsKYw3n3P97DEfkxhOgxuiZP1xhAoDH7qrwx4+eoOJKGCtv+z6jSiRLTkZSVXNXDTq7Ma9uA18EyRLsHty3Ht1/PXvt9PRTjAlmWaV4gUwSa0dJHEJw5oaExyF5HK8pKyKR3zgYcFVdwO61Yeh1/4TIZzbfzyE7bZqx01VdbOVeqZisLEXq81yU21QX54bAWRVkBL9jiorySGDRutMaZMJyBnHtAmRF+Vswm1FAG3pG1W1Ll31AafSkyzrRStMgapV1dk3bcOpWe6Rr9Zs5t4HrVlzDM7dQehOzsW5F9b/BS0pbOoEJ9A5b09y3Yyb4Gy/lxz09j96n96iRWyhZ+g56qIEvUYn6D06RX3E0J8ABdtBFM7D7+GP8OeCGgbLnqdoLcJf/wAWwfXD</latexit> Contributions Novel Search Strategy for One-shot NAS 1. arbitrary search space (categorical + ordinal) 2. robust against its inputs (hyper-param. and search space) Our approach 1. Stochastic Relaxation exponential family Z max w , c f ( w , c ) ⇒ max w , θ J ( w , θ ) := f ( w , c ) p ( c | θ ) d c differentiable w.r.t. w and θ 2. Stochastic Natural Gradient + Adaptive Step-Size w t +1 = w t + ✏ t \ r w J ( w t , θ t ) w θ t +1 = θ t + ✏ t \ θ F ( θ t ) − 1 r θ J ( w t +1 , θ t ) Natural Gradient Under appropriate step-size J ( w t , θ t ) < J ( w t +1 , θ t ) < J ( w t +1 , θ t +1 ) Monotone Improvement � 5

Results and Details • Faster & Competitive Accuracy to other one-shot NAS The detail will be explained at Poster #53 � 6

Adaptive Stochastic Natural Gradient Method for One-Shot Neural - PowerPoint PPT Presentation

Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search Youhei Akimoto (University of Tsukuba / RIKEN AIP) Shinichi Shirakawa (Yokohama National University) Nozomu Yoshinari (Yokohama National University) Kento

Adaptive primal-dual stochastic gradient methods Yangyang Xu Mathematical Sciences, Rensselaer

Overview of the Stochastic Gradient Method December 02, 2020 P. Carpentier Master Optimization

CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient

SHOT Brand Price NOTES WEST COAST MAGNUM SIZES 4 - 9 $ 39.20 Eagle shot prices may not be

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. MLSS

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. NeurIPS

Stochastic Gradient Descent (SGD) Todays Class Stochastic Gradient Descent (SGD) SGD Recap

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

Block stochastic gradient update method Yangyang Xu and Wotao Yin IMA, University of

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

Stochastic Gradient Method: Applications February 03, 2015 P. Carpentier Master MMMEF Cours

Applications of the Stochastic Gradient Method December 11, 2019 P. Carpentier Master

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

WELC LCOME ME TO JS JS101 Job Search ch Training Skills, Knowledge, and Information for the

Task-Oriented Query Reformulation with Reinforcement Learning Authors: Rodrigo Nogueira and

Image Identification with Natural Language Specification Qi Feng, Donghyun Kim Department of

Stochastic Search using the Natural Gradient Ecient Natural Evolution Strategies (eNES) Yi Sun,

Stochastic Methods for Continuous Optimization Anne Auger and Dimo Brockhoff Paris-Saclay Master

Current Status of GMSB Searches at CMS SUSY at the Near Energy Frontier Fermilab Peter

Overall CMS SUSY search strategy Filip Moortgat (ETH Zurich) Florence, October 22, 2012 GGI

Introduction to Natural Language Processing CMSC 470 Marine Carpuat Final Exam Friday