covariance matrix adaptation covariance matrix adaptation
play

Covariance Matrix Adaptation Covariance Matrix Adaptation Evolution - PowerPoint PPT Presentation

Covariance Matrix Adaptation Covariance Matrix Adaptation Evolution Strategies Recalling New search points are sampled normally distributed x i m + N i ( 0 , C ) for i = 1 , . . . , where x i , m 2 R n , 2 R + , as perturbations of


  1. Covariance Matrix Adaptation Covariance Matrix Adaptation

  2. Evolution Strategies Recalling New search points are sampled normally distributed x i ⇠ m + σ N i ( 0 , C ) for i = 1 , . . . , λ where x i , m 2 R n , σ 2 R + , as perturbations of m , C 2 R n ⇥ n where I the mean vector m 2 R n represents the favorite solution I the so-called step-size σ 2 R + controls the step length I the covariance matrix C 2 R n × n determines the shape of the distribution ellipsoid The remaining question is how to update C .

  3. Covariance Matrix Adaptation Rank-One Update y w = P µ y i ⇠ N i ( 0 , C ) m m + σ y w , i = 1 w i y i : λ , initial distribution, C = I

  4. Covariance Matrix Adaptation Rank-One Update y w = P µ y i ⇠ N i ( 0 , C ) m m + σ y w , i = 1 w i y i : λ , initial distribution, C = I

  5. Covariance Matrix Adaptation Rank-One Update y w = P µ y i ⇠ N i ( 0 , C ) m m + σ y w , i = 1 w i y i : λ , y w , movement of the population mean m (disregarding σ )

  6. Covariance Matrix Adaptation Rank-One Update y w = P µ y i ⇠ N i ( 0 , C ) m m + σ y w , i = 1 w i y i : λ , mixture of distribution C and step y w , C 0 . 8 ⇥ C + 0 . 2 ⇥ y w y T w

  7. Covariance Matrix Adaptation Rank-One Update y w = P µ y i ⇠ N i ( 0 , C ) m m + σ y w , i = 1 w i y i : λ , new distribution (disregarding σ )

  8. Covariance Matrix Adaptation Rank-One Update y w = P µ y i ⇠ N i ( 0 , C ) m m + σ y w , i = 1 w i y i : λ , new distribution (disregarding σ )

  9. Covariance Matrix Adaptation Rank-One Update y w = P µ y i ⇠ N i ( 0 , C ) m m + σ y w , i = 1 w i y i : λ , movement of the population mean m

  10. Covariance Matrix Adaptation Rank-One Update y w = P µ y i ⇠ N i ( 0 , C ) m m + σ y w , i = 1 w i y i : λ , mixture of distribution C and step y w , C 0 . 8 ⇥ C + 0 . 2 ⇥ y w y T w

  11. Covariance Matrix Adaptation Rank-One Update y w = P µ y i ⇠ N i ( 0 , C ) m m + σ y w , i = 1 w i y i : λ , new distribution, C 0 . 8 ⇥ C + 0 . 2 ⇥ y w y T w the ruling principle: the adaptation increases the likelihood of successful steps, y w , to appear again

  12. Covariance Matrix Adaptation Rank-One Update Initialize m 2 R n , and C = I , set σ = 1, learning rate c cov ⇡ 2 / n 2 While not terminate y i ⇠ N i ( 0 , C ) , = m + σ y i , x i µ X m + σ y w where y w = m w i y i : λ i = 1 1 ( 1 � c cov ) C + c cov µ w y w y T where µ w = i = 1 w i 2 � 1 C P µ w | {z } rank-one

  13. Problem Statement Stochastic search algorithms - basics Adaptive Evolution Strategies Mean Vector Adaptation Step-size control Covariance Matrix Adaptation Rank-One Update Cumulation—the Evolution Path Rank- µ Update

  14. Cumulation The Evolution Path Evolution Path Conceptually, the evolution path is the search path the strategy takes over a number of iteration steps. It can be expressed as a sum of consecutive steps of the mean m . An exponentially weighted sum of steps y w is used g X y ( i ) ( 1 � c c ) g − i p c / w | {z } i = 0 exponentially fading weights

  15. Cumulation The Evolution Path Evolution Path Conceptually, the evolution path is the search path the strategy takes over a number of iteration steps. It can be expressed as a sum of consecutive steps of the mean m . An exponentially weighted sum of steps y w is used g X y ( i ) ( 1 � c c ) g − i p c / w | {z } i = 0 exponentially fading weights The recursive construction of the evolution path (cumulation): 1 � ( 1 � c c ) 2 p µ w p ( 1 � c c ) p c + p c y w |{z} | {z } | {z } decay factor normalization factor m − m old input = σ 1 where µ w = P w i 2 , c c ⌧ 1. History information is accumulated in the evolution path.

  16. Cumulation Utilizing the Evolution Path w = � y w ( � y w ) T the sign of y w We used y w y T w for updating C . Because y w y T is lost.

  17. Cumulation Utilizing the Evolution Path w = � y w ( � y w ) T the sign of y w We used y w y T w for updating C . Because y w y T is lost.

  18. Cumulation Utilizing the Evolution Path w = � y w ( � y w ) T the sign of y w We used y w y T w for updating C . Because y w y T is lost. The sign information is (re-)introduced by using the evolution path . 1 � ( 1 � c c ) 2 p µ w p ( 1 � c c ) p c + p c y w | {z } | {z } decay factor normalization factor T ( 1 � c cov ) C + c cov p c p c C | {z } rank-one 1 where µ w = P w i 2 , c c ⌧ 1.

  19. Using an evolution path for the rank-one update of the covariance matrix reduces the number of function evaluations to adapt to a straight ridge from O ( n 2 ) to O ( n ) . ( 3 ) The overall model complexity is n 2 but important parts of the model can be learned in time of order n 3Hansen, Müller and Koumoutsakos 2003. Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES). Evolutionary Computation, 11(1) , pp. 1-18

  20. Rank- µ Update = m + σ y i , N i ( 0 , C ) , y i ∼ x i P µ m + σ y w = i = 1 w i y i : λ ← y w m The rank- µ update extends the update rule for large population sizes λ using µ > 1 vectors to update C at each iteration step.

  21. Rank- µ Update = m + σ y i , N i ( 0 , C ) , y i ∼ x i P µ m + σ y w = i = 1 w i y i : λ ← y w m The rank- µ update extends the update rule for large population sizes λ using µ > 1 vectors to update C at each iteration step. The matrix µ X w i y i : λ y T C µ = i : λ i = 1 computes a weighted mean of the outer products of the best µ steps and has rank min( µ, n ) with probability one.

  22. Rank- µ Update = m + σ y i , N i ( 0 , C ) , y i ∼ x i P µ m + σ y w = i = 1 w i y i : λ ← y w m The rank- µ update extends the update rule for large population sizes λ using µ > 1 vectors to update C at each iteration step. The matrix µ X w i y i : λ y T C µ = i : λ i = 1 computes a weighted mean of the outer products of the best µ steps and has rank min( µ, n ) with probability one. The rank- µ update then reads C ( 1 � c cov ) C + c cov C µ where c cov ⇡ µ w / n 2 and c cov  1.

  23. P y i : λ y T P y i : λ m + 1 1 y i ∼ N ( 0 , C ) C µ = = m + σ y i , m new ← x i i : λ µ µ C ( 1 − 1 ) × C + 1 × C µ ← sampling of calculating C where new distribution µ = 50, w 1 = · · · = λ = 150 solutions w µ = 1 where C = I and µ , and σ = 1 c cov = 1

  24. The rank- µ update I increases the possible learning rate in large populations roughly from 2 / n 2 to µ w / n 2 I can reduce the number of necessary iterations roughly from O ( n 2 ) to O ( n ) ( 4 ) given µ w / λ / n Therefore the rank- µ update is the primary mechanism whenever a large population size is used say λ � 3 n + 10 4Hansen, Müller, and Koumoutsakos 2003. Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES). Evolutionary Computation, 11(1) , pp. 1-18

  25. The rank- µ update I increases the possible learning rate in large populations roughly from 2 / n 2 to µ w / n 2 I can reduce the number of necessary iterations roughly from O ( n 2 ) to O ( n ) ( 4 ) given µ w / λ / n Therefore the rank- µ update is the primary mechanism whenever a large population size is used say λ � 3 n + 10 The rank-one update I uses the evolution path and reduces the number of necessary function evaluations to learn straight ridges from O ( n 2 ) to O ( n ) . 4Hansen, Müller, and Koumoutsakos 2003. Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES). Evolutionary Computation, 11(1) , pp. 1-18

  26. The rank- µ update I increases the possible learning rate in large populations roughly from 2 / n 2 to µ w / n 2 I can reduce the number of necessary iterations roughly from O ( n 2 ) to O ( n ) ( 4 ) given µ w / λ / n Therefore the rank- µ update is the primary mechanism whenever a large population size is used say λ � 3 n + 10 The rank-one update I uses the evolution path and reduces the number of necessary function evaluations to learn straight ridges from O ( n 2 ) to O ( n ) . Rank-one update and rank- µ update can be combined 4Hansen, Müller, and Koumoutsakos 2003. Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES). Evolutionary Computation, 11(1) , pp. 1-18

  27. Summary of Equations The Covariance Matrix Adaptation Evolution Strategy Input: m 2 R n , σ 2 R + , λ Initialize: C = I , and p c = 0 , p σ = 0 , Set: c c ⇡ 4 / n , c σ ⇡ 4 / n , c 1 ⇡ 2 / n 2 , c µ ⇡ µ w / n 2 , c 1 + c µ  1, p µ w 1 d σ ⇡ 1 + n , and w i = 1 ... λ such that µ w = i = 1 w i 2 ⇡ 0 . 3 λ P µ While not terminate y i ⇠ N i ( 0 , C ) , x i = m + σ y i , for i = 1 , . . . , λ sampling m P µ where y w = P µ i = 1 w i x i : λ = m + σ y w i = 1 w i y i : λ update mean 1 � ( 1 � c c ) 2 p µ w y w p I { k p σ k < 1 . 5 p n } p c ( 1 � c c ) p c + 1 cumulation for C 1 � ( 1 � c σ ) 2 p µ w C � 1 p 2 y w p σ ( 1 � c σ ) p σ + cumulation for σ C ( 1 � c 1 � c µ ) C + c 1 p c p c T + c µ P µ i = 1 w i y i : λ y T update C i : λ ⇣ ⇣ ⌘⌘ k p σ k c σ σ σ ⇥ exp E k N ( 0 , I ) k � 1 update of σ d σ Not covered on this slide: termination, restarts, useful output, boundaries and encoding

  28. Rank-one and Rank-mu updates

  29. Rank-one and Rank-mu update - default pop size

  30. Rank-one and Rank-mu update - larger pop size

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend