Information Geometric Optimization How information theory sheds new - PowerPoint PPT Presentation

Information Geometric Optimization How information theory sheds new light on black-box optimization Anne Auger, Inria and CMAP

Main reference: Y Ollivier, L. Arnold, A. Auger, N. Hansen, Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles , JMLR (accepted)

Black-Box Optimization optimize f : Ω 7! R Ω = { 0 , 1 } n discrete optimization continuous optimization Ω ⊂ R n 3

Black-Box Optimization optimize f : Ω 7! R Ω = { 0 , 1 } n discrete optimization continuous optimization Ω ⊂ R n f ( x ) ∈ R x ∈ Ω gradients not available or not useful 4

Adaptive Stochastic Black-Box Algorithm θ t : state of the algorithm Sample candidate solutions X i t +1 = S ol ( θ t , U i t +1 ) , i = 1 , . . . , λ { U t +1 , t ∈ N } i.i.d. Evaluate solutions � X i � X i f t +1 t +1 Update state of the algorithm ⇣ ⇣ ⌘⌘ X λ t +1 , f ( X λ X 1 t +1 , f ( X 1 � � θ t +1 = F t +1 ) t +1 ) θ t , , . . . , 5

Comparison-based Stochastic Algorithms Invariance to strictly increasing transformations Sample candidate solutions X i t +1 = S ol ( θ t , U i t +1 ) , i = 1 , . . . , λ Evaluate and rank solutions ⇣ ⌘ ⇣ ⌘ X S (1) X S ( λ ) f ≤ . . . ≤ f t +1 t +1 S permutation with index of ordered solutions Update state of the algorithm ⇣ ⌘ θ t , U S (1) t +1 , . . . , U S ( λ ) θ t +1 = F t +1 6

Overview ➊ Black-Box Optimization Typical difficulties ➋ Information Geometric Optimization ➌ Invariance ➍ Recovering well-known algorithms CMA-ES PBIL, cGA 7

Information Geometric Optimization Setting ( P θ ) θ ∈ Θ on Ω Family of probability distributions continuous multicomponent parameter θ ∈ Θ 8

Information Geometric Optimization Setting Family of probability distributions ( P θ ) θ ∈ Θ on Ω continuous multicomponent parameter θ ∈ Θ Θ : statistical manifold Example: Ω = R n P θ multivariate normal distribution θ = ( m, C ) 9

Changing Viewpoint I Ω Transform original optimization problem on min x ∈ Ω f ( x ) Onto optimization problem on : Minimize Θ Z F ( θ ) = f ( x ) P θ ( dx ) 10

Changing Viewpoint I Ω Transform original optimization problem on min x ∈ Ω f ( x ) Onto optimization problem on : Minimize Θ Z F ( θ ) = f ( x ) P θ ( dx ) Find dirac-delta distribution Minimizing F ⇔ concentrated on argmin x f ( x ) [Wiestra et al, 2014] 11

Changing Viewpoint I Ω Transform original optimization problem on min x ∈ Ω f ( x ) Onto optimization problem on : Minimize Θ Z F ( θ ) = f ( x ) P θ ( dx ) Find dirac-delta distribution But not invariant to strictly increasing Minimizing F ⇔ argmin x f ( x ) concentrated on transformations of f 12

Changing Viewpoint II Invariant under strictly increasing transformation of f Ω Transform original optimization problem on min x ∈ Ω f ( x ) Onto optimization problem on : Maximize Θ Z W f J θ t ( θ ) = θ t ( x ) } P θ ( dx ) | {z w ( P θ t [ y : f ( y ) ≤ f ( x )]) with w : [0 , 1] → R decreasing weight function Rationale: f “small” ↔ W f θ t ( x ) “large” [Ollivier et al.] 13

Maximizing J θ t ( θ ) Information Geometric Optimization Perform natural gradient step on Θ Z θ t + δ t = θ t + δ t e W f θ t ( x ) P θ ( dx ) r θ 14

Maximizing J θ t ( θ ) Information Geometric Optimization Perform natural gradient step on Θ Z θ t + δ t = θ t + δ t e W f θ t ( x ) P θ ( dx ) r θ 15

Natural Gradient Fisher Information Metric e Natural gradient : r θ gradient wrt Fisher metric defined via Fisher matrix ∂ ln P θ ( x ) ∂ ln P θ ( x ) Z I ij ( θ ) = P θ ( dx ) ∂θ i ∂θ j x ∂ 2 ln P θ ( x ) Z = − P θ ( dx ) ∂θ i ∂θ j x e r = I − 1 ∂ ∂θ 16

Fisher Information Metric Equivalently defined via second order expansion of KL Kullback–Leibler divergence: measure of “distance” between distributions ln P θ 0 ( dx ) Z KL( P θ 0 | | P θ ) = P θ ( dx ) P θ ( dx ) Relation between KL divergence and Fisher matrix | P θ ) = 1 X I ij ( θ ) δθ i δθ j + O ( δθ 3 ) KL( P θ + δθ | 2 17

Natural Gradient Fisher Information Metric e Natural gradient : r θ gradient wrt Fisher metric defined via Fisher matrix ∂ ln P θ ( x ) ∂ ln P θ ( x ) Z I ij ( θ ) = P θ ( dx ) ∂θ i ∂θ j x ∂ 2 ln P θ ( x ) Z = − P θ ( dx ) ∂θ i ∂θ j x intrinsic: independent of chosen parametrization θ of P θ Fisher metric essentially the only way to obtain this property [Amari, Nagaoka, 2001] 18

Maximizing J θ t ( θ ) Information Geometric Optimization Perform natural gradient step on Θ Z θ t + δ t = θ t + δ t e W f θ t ( x ) P θ ( dx ) r θ Z = θ t + δ t W f θ t ( x ) e θ ln P θ ( x ) | θ = θ t P θ t ( dx ) r Z = θ t + δ t w ( P θ t [ y : f ( y )  f ( x )]) e θ ln P θ ( x ) | θ = θ t P θ t ( dx ) r does not depend on r f 19

Maximizing J θ t ( θ ) Information Geometric Optimization Perform natural gradient step on Θ Z θ t + δ t = θ t + δ t e W f θ t ( x ) P θ ( dx ) r θ Z = θ t + δ t W f θ t ( x ) e θ ln P θ ( x ) | θ = θ t P θ t ( dx ) r Z = θ t + δ t w ( P θ t [ y : f ( y )  f ( x )]) e θ ln P θ ( x ) | θ = θ t P θ t ( dx ) r does not depend on r f 20

Maximizing J θ t ( θ ) Information Geometric Optimization Perform natural gradient step on Θ Z θ t + δ t = θ t + δ t e W f θ t ( x ) P θ ( dx ) r θ Z e θ P θ ( x ) r = θ t + δ t W f θ t ( x ) P θ t ( x ) dx P θ t ( x ) Z = θ t + δ t W f θ t ( x ) e θ ln P θ ( x ) | θ = θ t P θ t ( dx ) r Z = θ t + δ t w ( P θ t [ y : f ( y )  f ( x )]) e θ ln P θ ( x ) | θ = θ t P θ t ( dx ) r does not depend on r f 21

Maximizing J θ t ( θ ) Information Geometric Optimization Perform natural gradient step on Θ Z θ t + δ t = θ t + δ t e W f θ t ( x ) P θ ( dx ) r θ Z e θ P θ ( x ) r = θ t + δ t W f θ t ( x ) P θ t ( x ) dx P θ t ( x ) Z = θ t + δ t W f θ t ( x ) e θ ln P θ ( x ) | θ = θ t P θ t ( dx ) r Z = θ t + δ t w ( P θ t [ y : f ( y )  f ( x )]) e θ ln P θ ( x ) | θ = θ t P θ t ( dx ) r does not depend on r f IGO flow: δ t → 0 ➊ IGO algorithms: discretization of integrals ➋ 22

IGO gradient flow Information Geometric Optimization set of continuous time trajectories in the - space Θ defined by the ODE: Z d θ t W f θ t ( x ) e dt = θ ln P θ ( x ) | θ = θ t P θ t ( dx ) r [Ollivier et al.] 23

Information Geometric Optimization Algorithm Information Geometric Optimization (IGO) Monte Carlo Approximation of Integrals Sample X i ∼ P θ t , i = 1 , . . . N ⇣ ⌘ rk( X i )+1 / 2 w ( P θ t [ y : f ( y ) ≤ f ( x )]) ≈ w N rk( X i ) = # { j | f ( X j ) < f ( X i ) } IGO Algorithm ✓ rk( X i ) + 1 / 2 ◆ N X θ t + δ t = θ t + δ t 1 e w r θ ln P θ ( X i ) | θ = θ t N N i =1 N X = θ t + δ t w i e ˆ r θ ln P θ ( X i ) | θ = θ t i =1 24

IGO Algorithm [Ollivier et al.] Monte Carlo Approximation of Integrals Sample X i ∼ P θ t , i = 1 , . . . N ⇣ ⌘ rk( X i )+1 / 2 w ( P θ t [ y : f ( y ) ≤ f ( x )]) ≈ w N IGO Algorithm ✓ rk( X i ) + 1 / 2 ◆ N X θ t + δ t = θ t + δ t 1 e w r θ ln P θ ( X i ) | θ = θ t N N i =1 N X = θ t + δ t w i e ˆ r θ ln P θ ( X i ) | θ = θ t i =1 consistent estimator of integral ✓ rk( X i ) + 1 / 2 ◆ w i = 1 ˆ N w N 25

Instantiation of IGO Multivariate Normal Distributions [Akimoto et al. 2010] P θ multivariate normal distribution, θ = ( m, C ) IGO Algorithm N m t + δ t = m t + δ t X w i ( X i − m t ) ˆ i =1 N C t + δ t = C t + δ t X ( X i − m t )( X i − m t ) T − C t � � ˆ w i i =1 Recovers the CMA-ES with rank-mu update algorithm N = λ δ t learning rate for covariance matrix additional learning rate for the mean 26

Instantiation of IGO Bernoulli measures Ω = { 0 , 1 } d P θ ( x ) = p θ 1 ( x 1 ) . . . p θ d ( x d ) family of Bernoulli measures Recovers PBIL (Population based incremental learning) [Baluja, Caruana 1995] cGA (compact Genetic Algorithm) [Harick et al. 1999] 27

Conclusions Information Geometric Optimization framework: a unified picture of discrete and continuous optimization theoretical foundations for existing algorithms CMA-ES state-of-the-art in continuous bb optimization some parts of CMA-ES algorithm not explained by IGO framework step-size adaptation, cumulation New algorithms: large-scale variant of CMA-ES based on IGO, … 28

Information Geometric Optimization How information theory sheds new - PowerPoint PPT Presentation

Information Geometric Optimization How information theory sheds new light on black-box optimization Anne Auger, Inria and CMAP Main reference: Y Ollivier, L. Arnold, A. Auger, N. Hansen, Information-Geometric Optimization Algorithms: A

Geometric Optimization Piotr Indyk April 26, 2005 Lecture 19: Geometric Optimization Geometric

Geometric Algebra A powerful tool for solving geometric problems in visual computing Leandro A.

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Subdivision Surfaces 1 Geometric Modeling Geometric Modeling Sometimes need more than

PDE-based Geometric Modeling and Interactive Sculpting for Graphics Hong Qin Center for Visual

Geometric Interpretation of the Derivative (Review) Geometric Interpretation of the Derivative

Subdivision Surfaces 1 Geometric Modeling Geometric Modeling Sometimes need more than

EXAMPLES OF FOUR-DIMENSIONAL GEOMETRIC TRANSITION Joint with S. Riolo Fribourg, 8th May 2019 W

Data Structures for Moving Objects Pankaj K. Agarwal Center for Geometric Computing Department

Geometric Representations 3D Graphics Motivation Geometric representation What do we want

2D Geometric Transformations Question : How do we represent a geometric object in the plane?

Geometric Firefighting Rolf Klein University of Bonn HMI, June 19, 2018 Rolf Klein Geometric

Chapter 8 Binomial and Geometric Distribu7ons 8.2 Geometric

Batched Dynamic Geometric Problems Jeff Vitter Duke University Center for Geometric and

Geometric Graphs Sathish Govindarajan Indian Institute of Science, Bangalore Workshop on

Geometric Integration Integration Lotka-Volterra Poisson Integrator and the Parareal Algorithm

Code-Based Cryptography for FPGAs Dr. Ruben Niederhagen, February 8, 2018 Introduction Global

Overview Overview Local invariant features (C. Schmid) Matching and recognition with

EECS 4441 Human-Computer Interaction Topic #4: Empirical Research Methods for HCI I. Scott

The Assembly of Disk Galaxies: From Keck to JWST Susan Kassin Space Telescope Science Institute

Everything You Wanted to Know about Moderation (but were afraid to ask) Jeremy F. Dawson

Downscaling as a way to predict hazardous conditions for aviation activities Adil RASHEED,

The Vital Need for Privacy and Security by Design Ann Cavoukian, Ph.D. Executive Director

BACKGROUND // HILLSIDE IS A BRAND THAT FOCUSES ON THE OUTDOORS AND SPECIFICALLY IN THE FIELD OF

Sambuz

Useful Links

Newsletter

Mail Us