Guided Evolutionary Strategies
Niru Maheswaranathan // Google Research, Brain Team Joint work with: Luke Metz, George Tucker, Dami Choi, Jascha Sohl-dickstein
Guided Evolutionary Strategies Augmenting random search with - - PowerPoint PPT Presentation
Guided Evolutionary Strategies Augmenting random search with surrogate gradients Niru Maheswaranathan // Google Research, Brain Team Joint work with: Luke Metz, George Tucker, Dami Choi, Jascha Sohl-dickstein Optimizing with surrogate gradients
Niru Maheswaranathan // Google Research, Brain Team Joint work with: Luke Metz, George Tucker, Dami Choi, Jascha Sohl-dickstein
Surrogate gradient directions that are correlated with the true gradient (but may be biased)
Surrogate gradient directions that are correlated with the true gradient (but may be biased) Example applications
Surrogate gradient directions that are correlated with the true gradient (but may be biased) Example applications
Surrogate gradient directions that are correlated with the true gradient (but may be biased) Example applications
Surrogate gradient directions that are correlated with the true gradient (but may be biased) Example applications
Surrogate gradient directions that are correlated with the true gradient (but may be biased)
Surrogate gradient directions that are correlated with the true gradient (but may be biased)
Surrogate gradient directions that are correlated with the true gradient (but may be biased)
gradient information, 𝝰f(x)
Surrogate gradient directions that are correlated with the true gradient (but may be biased)
Surrogate gradient
gradient information, 𝝰f(x)
0.25
Schematic Loss
0.25
Schematic
Loss
0.25
Guiding distribution
Schematic Loss
0.25
Guiding distribution
Schematic
Sample perturbations Loss
0.25
Guiding distribution
Schematic
Sample perturbations Gradient estimate
g =
P
X
i=1
✏i (f (x + ✏i) − f (x − ✏i))
<latexit sha1_base64="tNPwrnTJqaT6qo9hEVG3YAf6sQ=">ACu3icdVHLbhMxFPUMj5bwaIAlG0OEVB6NZqatkiwClWDBhEk0laK08j3JmYeh6yPYjI8gfxF/wG38IGz0wiUVSOZN3je67t63PjUnClg+CX59+4ev2zu6dzt179x/sdR8+OlVFJRlMWSEKeR5TBYLnMNVcCzgvJdAsFnAWX76r9bNvIBUv8i96XcI8o2nOE86odqlF94chzSUzmcZzE/aDBq+D/shcLQho9CmY5JIygyJQVNrIkwUTzN6YSKLJ9btqmxh+Di0F6beQqm4cA8YbomARO8nbfj+6okebrSLw624sE14ibYRbfXNDMcHO2q+FhS4bRCG876ENJovub7IsWJVBrpmgSs3CoNRzQ6XmTIDtkEpBSdklTWHmaE4zUHPTuGHxc5dZ4qSQbuUaN9m/TxiaKbXOYleZUb1S/2p18jptVulkODc8LysNOWsfSiqBdYHr+eAl8C0WDtCmeSuV8xW1Dmv3RQ75D24v0j46O79VIKkupAvDaEyzXhu3d9S8pTU1Lm1tQT/n5xG/fCwH30+6p3In61vu+gJeob2UYgG6AR9QBM0Rczb8469N95bf+wz/6sv2lLf23j9GF2BX/0BEFHZeA=</latexit>Loss
0.25
Guiding distribution
Schematic Standard (vanilla) ES Identity covariance Choosing the guiding distribution
𝛽: hyperparameter n: parameter dimension Loss
0.25
Guiding distribution
Schematic Guided ES Identity + low rank covariance Choosing the guiding distribution 𝛽: hyperparameter n: parameter dimension k: subspace dimension
Guiding subspace columns are surrogate gradients Loss
Perturbed quadratic Quadratic function with a bias added to the gradient
Perturbed quadratic Quadratic function with a bias added to the gradient
Perturbed quadratic Quadratic function with a bias added to the gradient
Unrolled optimization Surrogate gradient from one step of BPTT
Unrolled optimization Surrogate gradient from one step of BPTT Synthetic gradients Surrogate gradient is from a synthetic model
Guided Evolutionary Strategies Optimization algorithm when you only have access to surrogate gradients Pacific Ballroom #146 Learn more at our poster brain-research/guided-evolutionary-strategies @niru_m
Optimal hyperparameter (ɑ) Guided ES Identity + low rank covariance