Guided Evolutionary Strategies Augmenting random search with - PowerPoint PPT Presentation

Guided Evolutionary Strategies Augmenting random search with surrogate gradients Niru Maheswaranathan // Google Research, Brain Team Joint work with: Luke Metz, George Tucker, Dami Choi, Jascha Sohl-dickstein

Optimizing with surrogate gradients Surrogate gradient directions that are correlated with the true gradient (but may be biased)

Optimizing with surrogate gradients Surrogate gradient directions that are correlated with the true gradient (but may be biased) Example applications

Optimizing with surrogate gradients Surrogate gradient directions that are correlated with the true gradient (but may be biased) Example applications • Neural networks with non-differentiable layers

Optimizing with surrogate gradients Surrogate gradient directions that are correlated with the true gradient (but may be biased) Example applications • Neural networks with non-differentiable layers • Meta-learning (where computing an exact meta-gradient is costly)

Optimizing with surrogate gradients Surrogate gradient directions that are correlated with the true gradient (but may be biased) Example applications • Neural networks with non-differentiable layers • Meta-learning (where computing an exact meta-gradient is costly) • Gradients from surrogate models (synthetic gradients, black box attacks)

Optimizing with surrogate gradients Surrogate gradient directions that are correlated with the true gradient (but may be biased)

Optimizing with surrogate gradients Surrogate gradient directions that are correlated with the true gradient (but may be biased) Zeroth-Order only function values, f(x)

Optimizing with surrogate gradients Surrogate gradient directions that are correlated with the true gradient (but may be biased) First-Order Zeroth-Order gradient information, 𝝰 f(x) only function values, f(x)

<latexit sha1_base64="QXzJ+63yzjnBhE/B24j/45Maj2U=">ACF3icbVDLSgNBEJz1bXxFPXoZDYJ6CLs+15uoBy+igjFCNsrspJMmZ1dZnrFsOQjPKof4028evRbvDiJQXwVDBRV3T3dFSZSGHTdN2dgcGh4ZHRsPDcxOTU9k5+duzBxqjmUeCxjfRkyA1IoKFACZeJBhaFEsph6Drl29AGxGrc2wnUI1YQ4m64AytVL69ylZwtXOdL7jF3d1Nf8unlvjbvrdOvaLbwxcpkD5Or/PvQS3maQKuWTGVDw3wWrGNAouoZMLUgMJ4y3WgIqlikVgqlv3Q5dtkqN1mNtn0LaU793ZCwyph2FtjJi2DS/va74n1dJse5XM6GSFEHxz4/qaQY0+7tCY0cJRtSxjXwu5KeZNpxtEmlAsOwd6i4djOPUlAM4z1WhYw3YiE6tjbGsFi0KU2rT/Z/CUX60Vvo+iebRb29vu5jZEFskRWiEd2yB45IqekRDhpkTvyQB6de+fJeXZePksHnH7PkB5/UDTIugxg=</latexit> Optimizing with surrogate gradients Surrogate gradient directions that are correlated with the true gradient (but may be biased) First-Order Guided ES Zeroth-Order gradient information, 𝝰 f(x) only function values, f(x) x ( t ) Surrogate gradient

Guided evolutionary strategies Schematic 0.25 Loss 0

Guided evolutionary strategies Schematic 0.25 Guiding distribution Loss 0

<latexit sha1_base64="xgziz2qIJBne+iFHmu40YhOTK+Q=">ACNnicdVBNT1NBFJ2H0BRKbpkM1pN0JjmvVIRd0RZsFExWiDpNM1909vHhPl4mZlH0rz0D/BrWIL/xA07wpY9G+a1JVGjJ5nk5Jx79x70lwK5+P4VzR37/6Dh/MLi7WlR4+fLNdXnu45U1iOHW6ksQcpOJRCY8cL/EgtwgqlbifHn2q/P1jtE4Y/cOPcuwpyLQYCg4+SP36S4a5E9JoypxQlCnwhxk+W8Fr+l7LvIFLzu1xtx80M7SdrvaEU2WlOyEa8nMU2a8QNMsNuv37DBoYXCrXnEpzrJnHueyVYL7jEcY0VDnPgR5BhN1ANCl2vnFwzpq+CMqBDY8PTnk7U3ztKUM6NVBoq23d314l/svrFn642SuFzguPmk8/GhaSekOraOhAWORejgIBbkXYlfJDsMB9CLDGtjHcYvFzmPs1Rwve2DclA5spocfhtow9ZxUNad1FQv9P9lrNZL3Z+tZubH2c5bZAVskLskYS8p5skR2ySzqEkxNySs7Jz+gsuoguo6tp6Vw063lG/kB0fQsJGqxU</latexit> Guided evolutionary strategies Schematic 0.25 Sample perturbations ✏ ∼ N (0 , Σ ) Guiding distribution Loss 0

<latexit sha1_base64="xgziz2qIJBne+iFHmu40YhOTK+Q=">ACNnicdVBNT1NBFJ2H0BRKbpkM1pN0JjmvVIRd0RZsFExWiDpNM1909vHhPl4mZlH0rz0D/BrWIL/xA07wpY9G+a1JVGjJ5nk5Jx79x70lwK5+P4VzR37/6Dh/MLi7WlR4+fLNdXnu45U1iOHW6ksQcpOJRCY8cL/EgtwgqlbifHn2q/P1jtE4Y/cOPcuwpyLQYCg4+SP36S4a5E9JoypxQlCnwhxk+W8Fr+l7LvIFLzu1xtx80M7SdrvaEU2WlOyEa8nMU2a8QNMsNuv37DBoYXCrXnEpzrJnHueyVYL7jEcY0VDnPgR5BhN1ANCl2vnFwzpq+CMqBDY8PTnk7U3ztKUM6NVBoq23d314l/svrFn642SuFzguPmk8/GhaSekOraOhAWORejgIBbkXYlfJDsMB9CLDGtjHcYvFzmPs1Rwve2DclA5spocfhtow9ZxUNad1FQv9P9lrNZL3Z+tZubH2c5bZAVskLskYS8p5skR2ySzqEkxNySs7Jz+gsuoguo6tp6Vw063lG/kB0fQsJGqxU</latexit> <latexit sha1_base64="tNPwrnTJqaT6qo9hEVG3YAf6sQ=">ACu3icdVHLbhMxFPUMj5bwaIAlG0OEVB6NZqatkiwClWDBhEk0laK08j3JmYeh6yPYjI8gfxF/wG38IGz0wiUVSOZN3je67t63PjUnClg+CX59+4ev2zu6dzt179x/sdR8+OlVFJRlMWSEKeR5TBYLnMNVcCzgvJdAsFnAWX76r9bNvIBUv8i96XcI8o2nOE86odqlF94chzSUzmcZzE/aDBq+D/shcLQho9CmY5JIygyJQVNrIkwUTzN6YSKLJ9btqmxh+Di0F6beQqm4cA8YbomARO8nbfj+6okebrSLw624sE14ibYRbfXNDMcHO2q+FhS4bRCG876ENJovub7IsWJVBrpmgSs3CoNRzQ6XmTIDtkEpBSdklTWHmaE4zUHPTuGHxc5dZ4qSQbuUaN9m/TxiaKbXOYleZUb1S/2p18jptVulkODc8LysNOWsfSiqBdYHr+eAl8C0WDtCmeSuV8xW1Dmv3RQ75D24v0j46O79VIKkupAvDaEyzXhu3d9S8pTU1Lm1tQT/n5xG/fCwH30+6p3In61vu+gJeob2UYgG6AR9QBM0Rczb8469N95bf+wz/6sv2lLf23j9GF2BX/0BEFHZeA=</latexit> Guided evolutionary strategies Schematic 0.25 Sample perturbations ✏ ∼ N (0 , Σ ) Guiding distribution Loss Gradient estimate P � X g = ✏ i ( f ( x + ✏ i ) − f ( x − ✏ i )) 2 � 2 P i =1 0

<latexit sha1_base64="/0j5mgekPFxLzknXSTCBL6dDVg4=">ACS3icbZDLbtQwFIadKYV2uHTaLtkYRkgFxCgpvWQWSBVlAQtEU1baTKMTjwnGWtsJ7KdSqMoz9KnYQld9zm6AxZ4LkJAOZKlX/9/ju3zJYXgxvr+ldYurV8+87KavPuvfsP1lrGycmLzXDiOUi12cJGBRcYWS5FXhWaASZCDxNxofT/PQcteG5OraTAvsSMsVTzsA6a9Dqxp94JoG+onGqgVUxiGIEdaVq+o4+X5hbAX1B58nTuhrXNIo+Hw9ab/T7e6EuyF1ItwLg20adPxZ/RZtsqijQetHPMxZKVFZJsCYXuAXtl+BtpwJrJtxabANoYMe04qkGj61WzFmj5xzpCmuXZHWTpz/5yoQBozkYnrlGBH5t9sav4v65U2DfsV0VpUbH5Q2kpqM3plBcdco3MiokTwDR3f6VsBI6KdVSb8Rt0u2h87+79UKAGm+tnDqLOJFe12y2LH8VT6WjdYHNTnGx3gpcd/+NO+D1gtsKeUgeky0SkH1yQN6SIxIRi7IF/KNXHpfvWvu/dz3trwFjOb5K9qLP8ChAeyzw=</latexit> Guided evolutionary strategies Schematic Choosing the guiding distribution 0.25 Standard (vanilla) ES Guiding distribution Identity covariance Loss nI + (1 − α ) Σ = α UU T k 0 𝛽 : hyperparameter n: parameter dimension

<latexit sha1_base64="OfV4IOQRAbPKw5Qc3Ug12K2VJ3E=">ACL3icbVBNaxNBGJ6tsZo26hHEUaDUHoIuzHq5hbaHnopRjEfkE3D7ORNMmR2dpl5VwjLnvw1Htv+mNJL8drf4MXZJIg1PjDwzPN+P2EihUHXvXG2Hjzc3nlUelx+8nR3b7/y7HnXxKnm0OGxjHU/ZAakUNBgRL6iQYWhRJ64fy4iPe+gTYiVl9xkcAwYlMlJoIztNKo8qpDA6FoEDGchWH2JT/P7A9FBIbO81Gl6tazYb/3qeW+B98r069mrvEH1Ila7RHlV/BOZpBAq5ZMYMPDfBYcY0Ci4hLwepgYTxOZvCwFLF7Jxhtjwjp2+tMqaTWNunkC7VvysyFhmziEKbWaxr/o0V4v9igxQn/jATKkRF8NmqSYkwLT+hYaOAoF5YwroXdlfIZ04yjda4cnIC9RcOZ7fspAc0w1odZwPQ0Eiq3t02D10FBrVsb3mySbr3mvau5nxvV1tHatxJ5Sd6QA+KRj6RFTkmbdAgn38kPckmunAvn2rl1fq5St5x1zQtyD87db0ogqgk=</latexit> <latexit sha1_base64="/0j5mgekPFxLzknXSTCBL6dDVg4=">ACS3icbZDLbtQwFIadKYV2uHTaLtkYRkgFxCgpvWQWSBVlAQtEU1baTKMTjwnGWtsJ7KdSqMoz9KnYQld9zm6AxZ4LkJAOZKlX/9/ju3zJYXgxvr+ldYurV8+87KavPuvfsP1lrGycmLzXDiOUi12cJGBRcYWS5FXhWaASZCDxNxofT/PQcteG5OraTAvsSMsVTzsA6a9Dqxp94JoG+onGqgVUxiGIEdaVq+o4+X5hbAX1B58nTuhrXNIo+Hw9ab/T7e6EuyF1ItwLg20adPxZ/RZtsqijQetHPMxZKVFZJsCYXuAXtl+BtpwJrJtxabANoYMe04qkGj61WzFmj5xzpCmuXZHWTpz/5yoQBozkYnrlGBH5t9sav4v65U2DfsV0VpUbH5Q2kpqM3plBcdco3MiokTwDR3f6VsBI6KdVSb8Rt0u2h87+79UKAGm+tnDqLOJFe12y2LH8VT6WjdYHNTnGx3gpcd/+NO+D1gtsKeUgeky0SkH1yQN6SIxIRi7IF/KNXHpfvWvu/dz3trwFjOb5K9qLP8ChAeyzw=</latexit> Guided evolutionary strategies Schematic Choosing the guiding distribution 0.25 Guided ES Guiding distribution Identity + low rank covariance Loss nI + (1 − α ) Σ = α UU T k 0 Guiding subspace U ∈ R n × k columns are surrogate gradients 𝛽 : hyperparameter n: parameter dimension k: subspace dimension

Demo Perturbed quadratic Quadratic function with a bias added to the gradient

Example applications Unrolled optimization Surrogate gradient from one step of BPTT

Example applications Unrolled optimization Synthetic gradients Surrogate gradient from one step of BPTT Surrogate gradient is from a synthetic model

Summary Guided Evolutionary Strategies Optimization algorithm when you only have access to surrogate gradients Pacific Ballroom #146 Learn more at our poster brain-research/guided-evolutionary-strategies @niru_m

<latexit sha1_base64="/0j5mgekPFxLzknXSTCBL6dDVg4=">ACS3icbZDLbtQwFIadKYV2uHTaLtkYRkgFxCgpvWQWSBVlAQtEU1baTKMTjwnGWtsJ7KdSqMoz9KnYQld9zm6AxZ4LkJAOZKlX/9/ju3zJYXgxvr+ldYurV8+87KavPuvfsP1lrGycmLzXDiOUi12cJGBRcYWS5FXhWaASZCDxNxofT/PQcteG5OraTAvsSMsVTzsA6a9Dqxp94JoG+onGqgVUxiGIEdaVq+o4+X5hbAX1B58nTuhrXNIo+Hw9ab/T7e6EuyF1ItwLg20adPxZ/RZtsqijQetHPMxZKVFZJsCYXuAXtl+BtpwJrJtxabANoYMe04qkGj61WzFmj5xzpCmuXZHWTpz/5yoQBozkYnrlGBH5t9sav4v65U2DfsV0VpUbH5Q2kpqM3plBcdco3MiokTwDR3f6VsBI6KdVSb8Rt0u2h87+79UKAGm+tnDqLOJFe12y2LH8VT6WjdYHNTnGx3gpcd/+NO+D1gtsKeUgeky0SkH1yQN6SIxIRi7IF/KNXHpfvWvu/dz3trwFjOb5K9qLP8ChAeyzw=</latexit> Choosing optimal hyperparameters Optimal hyperparameter ( ɑ ) Guided ES Identity + low rank covariance nI + (1 − α ) Σ = α UU T k

Guided Evolutionary Strategies Augmenting random search with - PowerPoint PPT Presentation

Guided Evolutionary Strategies Augmenting random search with surrogate gradients Niru Maheswaranathan // Google Research, Brain Team Joint work with: Luke Metz, George Tucker, Dami Choi, Jascha Sohl-dickstein Optimizing with surrogate gradients

FFR Guided Functional FFR Guided Functional FFR Guided Functional FFR Guided Functional

Guided Therapeutics in Cancer Surgery Guided Therapeutics in Cancer Surgery Guided Therapeutics

CSE CSE 460 460 Evolutionary Evolutionary Methods Methods In this section we will look at

MVC Guided Pathways Brief review of Guided Pathways at MVC Plan for Today Spring

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Evolutionary Algorithms CS 478 - Evolutionary Algorithms 1 Evolutionary Computation/Algorithms

Evolutionary Design By: Dianna Fox and Dan Morris Review 4 main types of Evolutionary

Year 3 Guided Pathways Plan Presentation Presented by: Palomar Guided Pathways Team DATE: May

Guided Pathways Equity & Education Update Feb 7, 2020 Guided Pathways Decision Making

6 th A NNUAL H UMIES A WARDS Evolutionary Learning of Local Descriptor Evolutionary

Using Evolutionary Algorithm to find image segmentation Yossef Kitrossky & Yoad Lewenberg

Principles and Techniques of Evolutionary Architecture Rebecca Parsons Chief Technology O ffi cer

I t Introduction to d ti t Evolutionary Algorithms Federico Nesti, f.nesti@santannapisa.it

Outline DM812 METAHEURISTICS Lecture 6 Evolutionary Algorithms 1. Evolutionary Algorithms

Models of Language Evolution Session 04 : Evolutionary Game Theory: Evolutionary Dynamics Michael

How competition affects evolutionary rescue: theoretical insight Matthew Osmond Claire de

Yet Another Tool... Chip Assembly This is a tool you can use to connect large blocks that

Close singular perturbations of selfadjoint operators Vadym Adamyan Department of Theoretical

Measurements of the Prompt Fission Neutron Spectrum at LANSCE: The Chi-Nu Experiment 6 th

SCATTERING THEORY AT LOW ENERGIES Erik Skibsted Jan Derezi nski 1 Special class of

Component-based Modeling of Real-Time Systems LASER 2005 Joseph Sifakis VERIMAG

DUNE FD Calibrations Consortium Jos Maneira (LIP), Kendall Mahn (MSU) December 20, 2018

Schrdinger symmetry and AdS /NRCFT correspondence Dept. of Phys. Kyoto Univ. Kentaroh

Testing Astrophysical Black Holes Cosimo Bambi Fudan University

Guided Evolutionary Strategies Augmenting random search with - PowerPoint PPT Presentation

Guided Evolutionary Strategies Augmenting random search with surrogate gradients Niru Maheswaranathan // Google Research, Brain Team Joint work with: Luke Metz, George Tucker, Dami Choi, Jascha Sohl-dickstein Optimizing with surrogate gradients

FFR Guided Functional FFR Guided Functional FFR Guided Functional FFR Guided Functional

Guided Therapeutics in Cancer Surgery Guided Therapeutics in Cancer Surgery Guided Therapeutics

CSE CSE 460 460 Evolutionary Evolutionary Methods Methods In this section we will look at

MVC Guided Pathways Brief review of Guided Pathways at MVC Plan for Today Spring

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Evolutionary Algorithms CS 478 - Evolutionary Algorithms 1 Evolutionary Computation/Algorithms

Evolutionary Design By: Dianna Fox and Dan Morris Review 4 main types of Evolutionary

Year 3 Guided Pathways Plan Presentation Presented by: Palomar Guided Pathways Team DATE: May

Guided Pathways Equity &amp; Education Update Feb 7, 2020 Guided Pathways Decision Making

6 th A NNUAL H UMIES A WARDS Evolutionary Learning of Local Descriptor Evolutionary

Using Evolutionary Algorithm to find image segmentation Yossef Kitrossky &amp; Yoad Lewenberg

Principles and Techniques of Evolutionary Architecture Rebecca Parsons Chief Technology O ffi cer

I t Introduction to d ti t Evolutionary Algorithms Federico Nesti, f.nesti@santannapisa.it

Outline DM812 METAHEURISTICS Lecture 6 Evolutionary Algorithms 1. Evolutionary Algorithms

Models of Language Evolution Session 04 : Evolutionary Game Theory: Evolutionary Dynamics Michael

How competition affects evolutionary rescue: theoretical insight Matthew Osmond Claire de

Yet Another Tool... Chip Assembly This is a tool you can use to connect large blocks that

Close singular perturbations of selfadjoint operators Vadym Adamyan Department of Theoretical

Measurements of the Prompt Fission Neutron Spectrum at LANSCE: The Chi-Nu Experiment 6 th

SCATTERING THEORY AT LOW ENERGIES Erik Skibsted Jan Derezi nski 1 Special class of

Component-based Modeling of Real-Time Systems LASER 2005 Joseph Sifakis VERIMAG

DUNE FD Calibrations Consortium Jos Maneira (LIP), Kendall Mahn (MSU) December 20, 2018

Schrdinger symmetry and AdS /NRCFT correspondence Dept. of Phys. Kyoto Univ. Kentaroh

Testing Astrophysical Black Holes Cosimo Bambi Fudan University

Guided Pathways Equity & Education Update Feb 7, 2020 Guided Pathways Decision Making

Using Evolutionary Algorithm to find image segmentation Yossef Kitrossky & Yoad Lewenberg