Stochastic Search using the Natural Gradient Ecient Natural - PowerPoint PPT Presentation

The Gaussian Search Distribution The search distribution is given by p ( z j θ ) = N ( z j x , C ) . We use the parameter set θ = h x , A i , with A being the Cholesky decomposition of C , i.e., A is an upper triangular matrix (UTM) and C = A > A . No redundancy in θ since C is symmetric. Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 6 / 22

The Gaussian Search Distribution The search distribution is given by p ( z j θ ) = N ( z j x , C ) . We use the parameter set θ = h x , A i , with A being the Cholesky decomposition of C , i.e., A is an upper triangular matrix (UTM) and C = A > A . No redundancy in θ since C is symmetric. O θ log p ( z j θ ) can be computed in closed form: O x log p ( z j θ ) = C � ( z � x ) � A � � O A log p ( z j θ ) = A �> ( z � x ) ( z � x ) > C � � diag Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 6 / 22

The Gaussian Search Distribution The search distribution is given by p ( z j θ ) = N ( z j x , C ) . We use the parameter set θ = h x , A i , with A being the Cholesky decomposition of C , i.e., A is an upper triangular matrix (UTM) and C = A > A . No redundancy in θ since C is symmetric. O θ log p ( z j θ ) can be computed in closed form: O x log p ( z j θ ) = C � ( z � x ) � A � � O A log p ( z j θ ) = A �> ( z � x ) ( z � x ) > C � � diag O s θ J ( θ ) can be computed from O θ log p ( z 1 j θ ) . . . O θ log p ( z 1 j θ ) . Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 6 / 22

Stochastic Gradient Ascent θ J ( θ ) = θ + α θ θ + α O s n Gf Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 7 / 22

Novel Ideas in eNES Use the Natural Gradient instead of the vanilla gradient. 1 Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 8 / 22

Novel Ideas in eNES Use the Natural Gradient instead of the vanilla gradient. 1 The natural gradient is computed in an Exact and E¢cient way. 2 Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 8 / 22

Novel Ideas in eNES Use the Natural Gradient instead of the vanilla gradient. 1 The natural gradient is computed in an Exact and E¢cient way. 2 Use Importance Mixing for reusing previously evaluated samples. 3 Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 8 / 22

Novel Ideas in eNES Use the Natural Gradient instead of the vanilla gradient. 1 The natural gradient is computed in an Exact and E¢cient way. 2 Use Importance Mixing for reusing previously evaluated samples. 3 Introducing Optimal Fitness Baseline to reduce the variance of 4 gradient estimation. Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 8 / 22

1. Why Natural Gradient? Vanilla gradient doesn’t work: Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 9 / 22

1. Why Natural Gradient? Vanilla gradient doesn’t work: Over-aggressive steps on ridges. Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 9 / 22

1. Why Natural Gradient? Vanilla gradient doesn’t work: Over-aggressive steps on ridges. Too small steps on plateaus. Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 9 / 22

1. Why Natural Gradient? Vanilla gradient doesn’t work: Over-aggressive steps on ridges. Too small steps on plateaus. Slow or premature convergence, non-robust performance. Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 9 / 22

1. Why Natural Gradient? Vanilla gradient doesn’t work: Over-aggressive steps on ridges. Too small steps on plateaus. Slow or premature convergence, non-robust performance. Basic idea of natural gradient Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 9 / 22

1. Why Natural Gradient? Vanilla gradient doesn’t work: Over-aggressive steps on ridges. Too small steps on plateaus. Slow or premature convergence, non-robust performance. Basic idea of natural gradient Steepest ascent direction when considering correlations between elements in θ . Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 9 / 22

1. Why Natural Gradient? Vanilla gradient doesn’t work: Over-aggressive steps on ridges. Too small steps on plateaus. Slow or premature convergence, non-robust performance. Basic idea of natural gradient Steepest ascent direction when considering correlations between elements in θ . Re-weight gradient elements according to their uncertainties, resp. Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 9 / 22

1. Why Natural Gradient? Vanilla gradient doesn’t work: Over-aggressive steps on ridges. Too small steps on plateaus. Slow or premature convergence, non-robust performance. Basic idea of natural gradient Steepest ascent direction when considering correlations between elements in θ . Re-weight gradient elements according to their uncertainties, resp. Isotropic convergence on ill-shaped surface. Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 9 / 22

1. Formulation of Natural Gradient Assume the distance between two adjacent distributions p ( �j θ ) and p ( �j θ + δθ ) is de…ned by their KL divergence. The natural gradient O θ J ( θ ) is given by the necessary condition ˜ O θ J ( θ ) = O θ J ( θ ) . F ˜ Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 10 / 22

1. Formulation of Natural Gradient Assume the distance between two adjacent distributions p ( �j θ ) and p ( �j θ + δθ ) is de…ned by their KL divergence. The natural gradient O θ J ( θ ) is given by the necessary condition ˜ O θ J ( θ ) = O θ J ( θ ) . F ˜ F is the Fisher information matrix (FIM) of θ : (Intuitively, the normalized covariance of the gradient.) h ( O θ log p ( z j θ )) ( O θ log p ( z j θ )) > i F = E . Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 10 / 22

1. Formulation of Natural Gradient Assume the distance between two adjacent distributions p ( �j θ ) and p ( �j θ + δθ ) is de…ned by their KL divergence. The natural gradient O θ J ( θ ) is given by the necessary condition ˜ O θ J ( θ ) = O θ J ( θ ) . F ˜ F is the Fisher information matrix (FIM) of θ : (Intuitively, the normalized covariance of the gradient.) h ( O θ log p ( z j θ )) ( O θ log p ( z j θ )) > i F = E . F may not be invertible. Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 10 / 22

1. Formulation of Natural Gradient Assume the distance between two adjacent distributions p ( �j θ ) and p ( �j θ + δθ ) is de…ned by their KL divergence. The natural gradient O θ J ( θ ) is given by the necessary condition ˜ O θ J ( θ ) = O θ J ( θ ) . F ˜ F is the Fisher information matrix (FIM) of θ : (Intuitively, the normalized covariance of the gradient.) h ( O θ log p ( z j θ )) ( O θ log p ( z j θ )) > i F = E . F may not be invertible. If F is invertable, we can compute the (estimated) natural gradient as O θ J ( θ ) = F � O θ J ( θ ) , ˜ O s θ J ( θ ) = F � O s ˜ θ J ( θ ) . Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 10 / 22

2. Property of FIM in the Gaussian Case Let θ = h x , A i . Under this setting, we …nd (quite luckily): The Fisher information matrix is indeed invertible . Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 11 / 22

2. Property of FIM in the Gaussian Case Let θ = h x , A i . Under this setting, we …nd (quite luckily): The Fisher information matrix is indeed invertible . The Fisher information matrix is a block diagonal matrix 2 3 C � 6 7 F 1 6 7 F = 6 7 5 . ... 4 F d Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 11 / 22

2. Property of FIM in the Gaussian Case Let θ = h x , A i . Under this setting, we …nd (quite luckily): The Fisher information matrix is indeed invertible . The Fisher information matrix is a block diagonal matrix 2 3 C � 6 7 F 1 6 7 F = 6 5 . 7 ... 4 F d C � is the FIM for x . Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 11 / 22

2. Property of FIM in the Gaussian Case Let θ = h x , A i . Under this setting, we …nd (quite luckily): The Fisher information matrix is indeed invertible . The Fisher information matrix is a block diagonal matrix 2 3 C � 6 7 F 1 6 7 F = 6 5 . 7 ... 4 F d C � is the FIM for x . F k is the FIM for ( n � k + 1 non-zero elements in) the k -th row of A . Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 11 / 22

2. Property of FIM in the Gaussian Case Let θ = h x , A i . Under this setting, we …nd (quite luckily): The Fisher information matrix is indeed invertible . The Fisher information matrix is a block diagonal matrix 2 3 C � 6 7 F 1 6 7 F = 6 7 5 . ... 4 F d C � is the FIM for x . F k is the FIM for ( n � k + 1 non-zero elements in) the k -th row of A . The FIM suggest a natural grouping of elements in θ . Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 11 / 22

2. E¢cient Inverse of FIM The computation of natural gradient requires the inverse of F . � d 2 � , so computing F � requires Naively, F is a matrix of size O � d 6 � O . Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 12 / 22

2. E¢cient Inverse of FIM The computation of natural gradient requires the inverse of F . � d 2 � , so computing F � requires Naively, F is a matrix of size O � d 6 � O . We already …nd that F is block diagonal, so computing F � requires � d 4 � O . Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 12 / 22

2. E¢cient Inverse of FIM The computation of natural gradient requires the inverse of F . � d 2 � , so computing F � requires Naively, F is a matrix of size O � d 6 � O . We already …nd that F is block diagonal, so computing F � requires � d 4 � O . We can do better! Use the special form of each sub-block, the � d 3 � complexity is reduced to O . Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 12 / 22

2. E¢cient Inverse of FIM The computation of natural gradient requires the inverse of F . � d 2 � , so computing F � requires Naively, F is a matrix of size O � d 6 � O . We already …nd that F is block diagonal, so computing F � requires � d 4 � O . We can do better! Use the special form of each sub-block, the � d 3 � complexity is reduced to O . The estimated natural gradient is then computed as θ J ( θ ) = 1 O s n F � Gf . � d 3 � with complexity O . Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 12 / 22

3. Importance Mixing At each cycle, we need to evaluate n new samples. Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 13 / 22

3. Importance Mixing At each cycle, we need to evaluate n new samples. It is common that the updated θ ( t ) is close to θ ( t � 1 ) . Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 13 / 22

3. Importance Mixing At each cycle, we need to evaluate n new samples. It is common that the updated θ ( t ) is close to θ ( t � 1 ) . Problem: Redundant …tness evaluations in overlapping high density area. Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 13 / 22

3. Importance Mixing At each cycle, we need to evaluate n new samples. It is common that the updated θ ( t ) is close to θ ( t � 1 ) . Problem: Redundant …tness evaluations in overlapping high density area. Importance Mixing: Generate samples in less explored areas, while keeping the updated batch conformed to the new search distribution. Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 13 / 22

3. Importance Mixing At each cycle, we need to evaluate n new samples. It is common that the updated θ ( t ) is close to θ ( t � 1 ) . Problem: Redundant …tness evaluations in overlapping high density area. Importance Mixing: Generate samples in less explored areas, while keeping the updated batch conformed to the new search distribution. Reusing samples: fewer …tness evaluations. Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 13 / 22

3. Importance Mixing Formally, importance mixing is carried out by two rejection samplings. Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 14 / 22

3. Importance Mixing Formally, importance mixing is carried out by two rejection samplings. Forward pass: For each sample z from the previous batch, accept with probability � z j θ ( t ) � 8 9 < = p � z j θ ( t � 1 ) � min : 1 , ; . p Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 14 / 22

3. Importance Mixing Formally, importance mixing is carried out by two rejection samplings. Forward pass: For each sample z from the previous batch, accept with probability � z j θ ( t ) � 8 9 < = p � z j θ ( t � 1 ) � min : 1 , ; . p Backward pass: Accept newly generated sample z with probability � z j θ ( t � 1 ) � 8 9 < = p � z j θ ( t ) � max : 0 , 1 � ; p until batch size reached. Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 14 / 22

4. Optimal Fitness Baseline A typical problem with the Monte-Carlo gradient estimation is that the variance is too big. The …tness baseline is introduced to reduce the variance. Z Z O θ J = O θ f ( z ) p ( z j θ ) d z � O θ bp ( z j θ ) d z | {z } = 0 Z = O θ [ f ( z ) � b ] p ( z j θ ) d z , b is called the …tness baseline. Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 15 / 22

4. Optimal Fitness Baseline A typical problem with the Monte-Carlo gradient estimation is that the variance is too big. The …tness baseline is introduced to reduce the variance. Z Z O θ J = O θ f ( z ) p ( z j θ ) d z � O θ bp ( z j θ ) d z | {z } = 0 Z = O θ [ f ( z ) � b ] p ( z j θ ) d z , b is called the …tness baseline. Adding the baseline b won’t a¤ect the expectation of O θ J . Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 15 / 22

4. Optimal Fitness Baseline A typical problem with the Monte-Carlo gradient estimation is that the variance is too big. The …tness baseline is introduced to reduce the variance. Z Z O θ J = O θ f ( z ) p ( z j θ ) d z � O θ bp ( z j θ ) d z | {z } = 0 Z = O θ [ f ( z ) � b ] p ( z j θ ) d z , b is called the …tness baseline. Adding the baseline b won’t a¤ect the expectation of O θ J . But it a¤ects the variance of the estimation: For natural gradient h i h i O θ J ( θ )] ∝ b 2 E u > u u > v V [ ˜ � 2 b E + const with u = F � O θ log p ( z j θ ) , v = f ( z ) u . Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 15 / 22

4. Optimal Fitness Baseline O θ J ( θ )] is of quadratic form, we can minimize it. The optimal V [ ˜ …tness baseline is given by � � E [ u > u ] ' ∑ n u > v i = 1 u > b � = E i v i . ∑ n i = 1 u > i u i Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 16 / 22

4. Optimal Fitness Baseline O θ J ( θ )] is of quadratic form, we can minimize it. The optimal V [ ˜ …tness baseline is given by � � E [ u > u ] ' ∑ n u > v i = 1 u > b � = E i v i . ∑ n i = 1 u > i u i The natural gradient is then estimated by θ J ( θ ) = 1 O s n F � G ( f � b � ) . ˜ Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 16 / 22

4. Optimal Fitness Baseline O θ J ( θ )] is of quadratic form, we can minimize it. The optimal V [ ˜ …tness baseline is given by � � E [ u > u ] ' ∑ n u > v i = 1 u > b � = E i v i . ∑ n i = 1 u > i u i The natural gradient is then estimated by θ J ( θ ) = 1 O s n F � G ( f � b � ) . ˜ Better: Di¤erent baselines b j for di¤erent (groups of) parameter θ j , further reducing the variance. Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 16 / 22

4. Optimal Fitness Baseline O θ J ( θ )] is of quadratic form, we can minimize it. The optimal V [ ˜ …tness baseline is given by � � E [ u > u ] ' ∑ n u > v i = 1 u > b � = E i v i . ∑ n i = 1 u > i u i The natural gradient is then estimated by θ J ( θ ) = 1 O s n F � G ( f � b � ) . ˜ Better: Di¤erent baselines b j for di¤erent (groups of) parameter θ j , further reducing the variance. The block diagonal structure of F suggests using a block …tness baseline , where di¤erent baseline values are computed for each group of parameters in θ . Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 16 / 22

Putting Things Together Initialization Update population using importance mixing loop Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 17 / 22

Putting Things Together Initialization Update population using importance mixing loop Evaluate newly generated samples Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 17 / 22

Putting Things Together Initialization Update population using importance mixing loop Compute optimal baseline b � and Evaluate newly O s ˜ θ J ( θ ) generated samples Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 17 / 22

Putting Things Together Initialization Update: Update population O s θ θ + α ˜ θ J ( θ ) using importance mixing loop Compute optimal baseline b � and Evaluate newly O s ˜ θ J ( θ ) generated samples Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 17 / 22

Empirical Results - Standard Blackbox Benchmarks Unimodal-50 Cigar DiffPow Ellipsoid ParabR Schwefel SharpR Sphere Tablet -fitness number of evaluations Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 18 / 22

Empirical Results - Multimodal 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 −0.5 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 eNES is able to jump over deceptive local optima. Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 19 / 22

Empirical Results - Double Pole Balancing β 1 β 2 F x Non-Markovian double pole balancing, average numbers of evaluations. Method SANE ESP NEAT CMA CoSyNE FEM NES Eval. 262 , 700 7 , 374 6 , 929 3 , 521 1 , 249 2 , 099 1 , 753 Yi Sun, et al. (IDSIA) E¢cient Natural Evolution Strategies 17/06/2009 20 / 22

Stochastic Search using the Natural Gradient Ecient Natural - PowerPoint PPT Presentation

Stochastic Search using the Natural Gradient Ecient Natural Evolution Strategies (eNES) Yi Sun, Daan Wierstra, Tom Schaul, and Jrgen Schmidhuber {yi,daan,tom,juergen}@idsia.ch IDSIA, Galleria 2, Manno 6928, Switzerland June 17th, 2009 Yi

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. MLSS

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. NeurIPS

CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient

Stochastic Gradient Descent (SGD) Todays Class Stochastic Gradient Descent (SGD) SGD Recap

Overview of the Stochastic Gradient Method December 02, 2020 P. Carpentier Master Optimization

Adaptive primal-dual stochastic gradient methods Yangyang Xu Mathematical Sciences, Rensselaer

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

Stochastic Local Search Methods Dynamic Local Search Iterated Local Search Tabu Search Marco

Machine Learning (CSE 446): Gradient Descent and Stochastic Gradient Descent Sham M Kakade

Fitting Neural Networks Gradient Descent and Stochastic Gradient Descent CS109A Introduction to

Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search Youhei

Scalable natural gradient using probabilistic models of backprop Roger Grosse Overview

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

NaturalLI: Natural Logic Inference for Common Sense Reasoning Angeli & Manning (2014)

CSE 473: Artificial Intelligence Autumn 2011 Search Luke Zettlemoyer Slides from Dan Klein,

The Hill We Must Die On: Cryptographers and Congress Shaanan Cohney Gabriel Kaptchuk University

Req equest uest for or Proposal oposal Tech echnical nical Ass ssis istance tance Web

Image Identification with Natural Language Specification Qi Feng, Donghyun Kim Department of

Task-Oriented Query Reformulation with Reinforcement Learning Authors: Rodrigo Nogueira and

WELC LCOME ME TO JS JS101 Job Search ch Training Skills, Knowledge, and Information for the

Stochastic Methods for Continuous Optimization Anne Auger and Dimo Brockhoff Paris-Saclay Master

Stochastic Search using the Natural Gradient Ecient Natural - PowerPoint PPT Presentation

Stochastic Search using the Natural Gradient Ecient Natural Evolution Strategies (eNES) Yi Sun, Daan Wierstra, Tom Schaul, and Jrgen Schmidhuber {yi,daan,tom,juergen}@idsia.ch IDSIA, Galleria 2, Manno 6928, Switzerland June 17th, 2009 Yi

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. MLSS

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. NeurIPS

CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient

Stochastic Gradient Descent (SGD) Todays Class Stochastic Gradient Descent (SGD) SGD Recap

Overview of the Stochastic Gradient Method December 02, 2020 P. Carpentier Master Optimization

Adaptive primal-dual stochastic gradient methods Yangyang Xu Mathematical Sciences, Rensselaer

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

Stochastic Local Search Methods Dynamic Local Search Iterated Local Search Tabu Search Marco

Machine Learning (CSE 446): Gradient Descent and Stochastic Gradient Descent Sham M Kakade

Fitting Neural Networks Gradient Descent and Stochastic Gradient Descent CS109A Introduction to

Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search Youhei

Scalable natural gradient using probabilistic models of backprop Roger Grosse Overview

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

NaturalLI: Natural Logic Inference for Common Sense Reasoning Angeli &amp; Manning (2014)

CSE 473: Artificial Intelligence Autumn 2011 Search Luke Zettlemoyer Slides from Dan Klein,

The Hill We Must Die On: Cryptographers and Congress Shaanan Cohney Gabriel Kaptchuk University

Req equest uest for or Proposal oposal Tech echnical nical Ass ssis istance tance Web

Image Identification with Natural Language Specification Qi Feng, Donghyun Kim Department of

Task-Oriented Query Reformulation with Reinforcement Learning Authors: Rodrigo Nogueira and

WELC LCOME ME TO JS JS101 Job Search ch Training Skills, Knowledge, and Information for the

Stochastic Methods for Continuous Optimization Anne Auger and Dimo Brockhoff Paris-Saclay Master

NaturalLI: Natural Logic Inference for Common Sense Reasoning Angeli & Manning (2014)