Covariance Matrix Adaptation Covariance Matrix Adaptation Evolution - PowerPoint PPT Presentation

Covariance Matrix Adaptation Covariance Matrix Adaptation

Evolution Strategies Recalling New search points are sampled normally distributed x i ⇠ m + σ N i ( 0 , C ) for i = 1 , . . . , λ where x i , m 2 R n , σ 2 R + , as perturbations of m , C 2 R n ⇥ n where I the mean vector m 2 R n represents the favorite solution I the so-called step-size σ 2 R + controls the step length I the covariance matrix C 2 R n × n determines the shape of the distribution ellipsoid The remaining question is how to update C .

Covariance Matrix Adaptation Rank-One Update y w = P µ y i ⇠ N i ( 0 , C ) m m + σ y w , i = 1 w i y i : λ , initial distribution, C = I

Covariance Matrix Adaptation Rank-One Update y w = P µ y i ⇠ N i ( 0 , C ) m m + σ y w , i = 1 w i y i : λ , y w , movement of the population mean m (disregarding σ )

Covariance Matrix Adaptation Rank-One Update y w = P µ y i ⇠ N i ( 0 , C ) m m + σ y w , i = 1 w i y i : λ , mixture of distribution C and step y w , C 0 . 8 ⇥ C + 0 . 2 ⇥ y w y T w

Covariance Matrix Adaptation Rank-One Update y w = P µ y i ⇠ N i ( 0 , C ) m m + σ y w , i = 1 w i y i : λ , new distribution (disregarding σ )

Covariance Matrix Adaptation Rank-One Update y w = P µ y i ⇠ N i ( 0 , C ) m m + σ y w , i = 1 w i y i : λ , movement of the population mean m

Covariance Matrix Adaptation Rank-One Update y w = P µ y i ⇠ N i ( 0 , C ) m m + σ y w , i = 1 w i y i : λ , mixture of distribution C and step y w , C 0 . 8 ⇥ C + 0 . 2 ⇥ y w y T w

Covariance Matrix Adaptation Rank-One Update y w = P µ y i ⇠ N i ( 0 , C ) m m + σ y w , i = 1 w i y i : λ , new distribution, C 0 . 8 ⇥ C + 0 . 2 ⇥ y w y T w the ruling principle: the adaptation increases the likelihood of successful steps, y w , to appear again

Covariance Matrix Adaptation Rank-One Update Initialize m 2 R n , and C = I , set σ = 1, learning rate c cov ⇡ 2 / n 2 While not terminate y i ⇠ N i ( 0 , C ) , = m + σ y i , x i µ X m + σ y w where y w = m w i y i : λ i = 1 1 ( 1 � c cov ) C + c cov µ w y w y T where µ w = i = 1 w i 2 � 1 C P µ w | {z } rank-one

Problem Statement Stochastic search algorithms - basics Adaptive Evolution Strategies Mean Vector Adaptation Step-size control Covariance Matrix Adaptation Rank-One Update Cumulation—the Evolution Path Rank- µ Update

Cumulation The Evolution Path Evolution Path Conceptually, the evolution path is the search path the strategy takes over a number of iteration steps. It can be expressed as a sum of consecutive steps of the mean m . An exponentially weighted sum of steps y w is used g X y ( i ) ( 1 � c c ) g − i p c / w | {z } i = 0 exponentially fading weights

Cumulation The Evolution Path Evolution Path Conceptually, the evolution path is the search path the strategy takes over a number of iteration steps. It can be expressed as a sum of consecutive steps of the mean m . An exponentially weighted sum of steps y w is used g X y ( i ) ( 1 � c c ) g − i p c / w | {z } i = 0 exponentially fading weights The recursive construction of the evolution path (cumulation): 1 � ( 1 � c c ) 2 p µ w p ( 1 � c c ) p c + p c y w |{z} | {z } | {z } decay factor normalization factor m − m old input = σ 1 where µ w = P w i 2 , c c ⌧ 1. History information is accumulated in the evolution path.

Cumulation Utilizing the Evolution Path w = � y w ( � y w ) T the sign of y w We used y w y T w for updating C . Because y w y T is lost.

Cumulation Utilizing the Evolution Path w = � y w ( � y w ) T the sign of y w We used y w y T w for updating C . Because y w y T is lost. The sign information is (re-)introduced by using the evolution path . 1 � ( 1 � c c ) 2 p µ w p ( 1 � c c ) p c + p c y w | {z } | {z } decay factor normalization factor T ( 1 � c cov ) C + c cov p c p c C | {z } rank-one 1 where µ w = P w i 2 , c c ⌧ 1.

Using an evolution path for the rank-one update of the covariance matrix reduces the number of function evaluations to adapt to a straight ridge from O ( n 2 ) to O ( n ) . ( 3 ) The overall model complexity is n 2 but important parts of the model can be learned in time of order n 3Hansen, Müller and Koumoutsakos 2003. Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES). Evolutionary Computation, 11(1) , pp. 1-18

Rank- µ Update = m + σ y i , N i ( 0 , C ) , y i ∼ x i P µ m + σ y w = i = 1 w i y i : λ ← y w m The rank- µ update extends the update rule for large population sizes λ using µ > 1 vectors to update C at each iteration step.

Rank- µ Update = m + σ y i , N i ( 0 , C ) , y i ∼ x i P µ m + σ y w = i = 1 w i y i : λ ← y w m The rank- µ update extends the update rule for large population sizes λ using µ > 1 vectors to update C at each iteration step. The matrix µ X w i y i : λ y T C µ = i : λ i = 1 computes a weighted mean of the outer products of the best µ steps and has rank min( µ, n ) with probability one.

Rank- µ Update = m + σ y i , N i ( 0 , C ) , y i ∼ x i P µ m + σ y w = i = 1 w i y i : λ ← y w m The rank- µ update extends the update rule for large population sizes λ using µ > 1 vectors to update C at each iteration step. The matrix µ X w i y i : λ y T C µ = i : λ i = 1 computes a weighted mean of the outer products of the best µ steps and has rank min( µ, n ) with probability one. The rank- µ update then reads C ( 1 � c cov ) C + c cov C µ where c cov ⇡ µ w / n 2 and c cov  1.

P y i : λ y T P y i : λ m + 1 1 y i ∼ N ( 0 , C ) C µ = = m + σ y i , m new ← x i i : λ µ µ C ( 1 − 1 ) × C + 1 × C µ ← sampling of calculating C where new distribution µ = 50, w 1 = · · · = λ = 150 solutions w µ = 1 where C = I and µ , and σ = 1 c cov = 1

The rank- µ update I increases the possible learning rate in large populations roughly from 2 / n 2 to µ w / n 2 I can reduce the number of necessary iterations roughly from O ( n 2 ) to O ( n ) ( 4 ) given µ w / λ / n Therefore the rank- µ update is the primary mechanism whenever a large population size is used say λ � 3 n + 10 4Hansen, Müller, and Koumoutsakos 2003. Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES). Evolutionary Computation, 11(1) , pp. 1-18

The rank- µ update I increases the possible learning rate in large populations roughly from 2 / n 2 to µ w / n 2 I can reduce the number of necessary iterations roughly from O ( n 2 ) to O ( n ) ( 4 ) given µ w / λ / n Therefore the rank- µ update is the primary mechanism whenever a large population size is used say λ � 3 n + 10 The rank-one update I uses the evolution path and reduces the number of necessary function evaluations to learn straight ridges from O ( n 2 ) to O ( n ) . 4Hansen, Müller, and Koumoutsakos 2003. Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES). Evolutionary Computation, 11(1) , pp. 1-18

The rank- µ update I increases the possible learning rate in large populations roughly from 2 / n 2 to µ w / n 2 I can reduce the number of necessary iterations roughly from O ( n 2 ) to O ( n ) ( 4 ) given µ w / λ / n Therefore the rank- µ update is the primary mechanism whenever a large population size is used say λ � 3 n + 10 The rank-one update I uses the evolution path and reduces the number of necessary function evaluations to learn straight ridges from O ( n 2 ) to O ( n ) . Rank-one update and rank- µ update can be combined 4Hansen, Müller, and Koumoutsakos 2003. Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES). Evolutionary Computation, 11(1) , pp. 1-18

Summary of Equations The Covariance Matrix Adaptation Evolution Strategy Input: m 2 R n , σ 2 R + , λ Initialize: C = I , and p c = 0 , p σ = 0 , Set: c c ⇡ 4 / n , c σ ⇡ 4 / n , c 1 ⇡ 2 / n 2 , c µ ⇡ µ w / n 2 , c 1 + c µ  1, p µ w 1 d σ ⇡ 1 + n , and w i = 1 ... λ such that µ w = i = 1 w i 2 ⇡ 0 . 3 λ P µ While not terminate y i ⇠ N i ( 0 , C ) , x i = m + σ y i , for i = 1 , . . . , λ sampling m P µ where y w = P µ i = 1 w i x i : λ = m + σ y w i = 1 w i y i : λ update mean 1 � ( 1 � c c ) 2 p µ w y w p I { k p σ k < 1 . 5 p n } p c ( 1 � c c ) p c + 1 cumulation for C 1 � ( 1 � c σ ) 2 p µ w C � 1 p 2 y w p σ ( 1 � c σ ) p σ + cumulation for σ C ( 1 � c 1 � c µ ) C + c 1 p c p c T + c µ P µ i = 1 w i y i : λ y T update C i : λ ⇣ ⇣ ⌘⌘ k p σ k c σ σ σ ⇥ exp E k N ( 0 , I ) k � 1 update of σ d σ Not covered on this slide: termination, restarts, useful output, boundaries and encoding

Rank-one and Rank-mu updates

Rank-one and Rank-mu update - default pop size

Rank-one and Rank-mu update - larger pop size

Covariance Matrix Adaptation Covariance Matrix Adaptation Evolution - PowerPoint PPT Presentation

Covariance Matrix Adaptation Covariance Matrix Adaptation Evolution Strategies Recalling New search points are sampled normally distributed x i m + N i ( 0 , C ) for i = 1 , . . . , where x i , m 2 R n , 2 R + , as perturbations of

Lecture 14 Covariance Functions 3/08/2018 1 More on Covariance Functions 2 Nugget Covariance

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Week 1: Introduc/on Precision and covariance matrix 2 1.2C

Implementation of Covariance Matrix on ReconstructedParticle C. Calancha ILD Analysis &

A Tale of Two Theories: A Tale of Two Theories: Reconciling Reconciling random matrix theory

Matrix Algebra of Sample Statistics James H. Steiger Department of Psychology and Human

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Coastal Adaptation Kellie Fisher FCERM Senior Advisor Why Adaptation? Adaptation to a

Modelling covariance kernels for nonstationary random fields Christopher G. Small University of

Covariance & anchored t ypes 1 Covariance? Wit hin t he t ype syst em of a programming

Covariance and spectrum Repetition Covariance function: r w ( ) Ew ( t + ) w T ( t )

Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence GPMC 6th February 2017

Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence GPSS 13th September 2016

High-Dimensional Covariance Decomposition into Sparse Markov and Independence Domains Majid

Model-Based Evolutionary Algorithms Part 1: Estimation of Distribution Algorithms Dirk Thierens

Evolutionary Algorithms General Concepts Prof. Thomas Bck Natural Computing Group

INTRODUCTION TO EVOLUTION STRATEGY ALGORITHMS James Gleeson Eric Langlois William Saunders

Evolution Strategies Distributed deep reinforcement learning (blog.otoro.net) Evolutionary

Natural Computing Lecture 9: Evolutionary Strategies Michael Herrmann INFR09038

Understanding causation: the practicalities Jackie Chappell and Aaron Sloman Centre for

Artificial Intelligence Local and Randomized/StochasGc Search Lecture 6 CS 444 Spring 2019

Hybrid Deep Learning Topology for Image Classification Petru Radu petru.radu@ness.com 27 th

Covariance Matrix Adaptation Covariance Matrix Adaptation Evolution - PowerPoint PPT Presentation

Covariance Matrix Adaptation Covariance Matrix Adaptation Evolution Strategies Recalling New search points are sampled normally distributed x i m + N i ( 0 , C ) for i = 1 , . . . , where x i , m 2 R n , 2 R + , as perturbations of

Lecture 14 Covariance Functions 3/08/2018 1 More on Covariance Functions 2 Nugget Covariance

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Week 1: Introduc/on Precision and covariance matrix 2 1.2C

Implementation of Covariance Matrix on ReconstructedParticle C. Calancha ILD Analysis &amp;

A Tale of Two Theories: A Tale of Two Theories: Reconciling Reconciling random matrix theory

Matrix Algebra of Sample Statistics James H. Steiger Department of Psychology and Human

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Coastal Adaptation Kellie Fisher FCERM Senior Advisor Why Adaptation? Adaptation to a

Modelling covariance kernels for nonstationary random fields Christopher G. Small University of

Covariance &amp; anchored t ypes 1 Covariance? Wit hin t he t ype syst em of a programming

Covariance and spectrum Repetition Covariance function: r w ( ) Ew ( t + ) w T ( t )

Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence GPMC 6th February 2017

Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence GPSS 13th September 2016

High-Dimensional Covariance Decomposition into Sparse Markov and Independence Domains Majid

Model-Based Evolutionary Algorithms Part 1: Estimation of Distribution Algorithms Dirk Thierens

Evolutionary Algorithms General Concepts Prof. Thomas Bck Natural Computing Group

INTRODUCTION TO EVOLUTION STRATEGY ALGORITHMS James Gleeson Eric Langlois William Saunders

Evolution Strategies Distributed deep reinforcement learning (blog.otoro.net) Evolutionary

Natural Computing Lecture 9: Evolutionary Strategies Michael Herrmann INFR09038

Understanding causation: the practicalities Jackie Chappell and Aaron Sloman Centre for

Artificial Intelligence Local and Randomized/StochasGc Search Lecture 6 CS 444 Spring 2019

Hybrid Deep Learning Topology for Image Classification Petru Radu petru.radu@ness.com 27 th

Implementation of Covariance Matrix on ReconstructedParticle C. Calancha ILD Analysis &

Covariance & anchored t ypes 1 Covariance? Wit hin t he t ype syst em of a programming