outline
play

Outline Continuous Optimization DM812 METAHEURISTICS Lecture 12 - PowerPoint PPT Presentation

Model Based Metaheuristics Outline Continuous Optimization DM812 METAHEURISTICS Lecture 12 1. Model Based Metaheuristics Cross Entropy Method Cross Entropy Method Continuous Optimization 2. Continuous Optimization Marco Chiarandini


  1. Model Based Metaheuristics Outline Continuous Optimization DM812 METAHEURISTICS Lecture 12 1. Model Based Metaheuristics Cross Entropy Method Cross Entropy Method Continuous Optimization 2. Continuous Optimization Marco Chiarandini Numerical Analysis Department of Mathematics and Computer Science University of Southern Denmark, Odense, Denmark <marco@imada.sdu.dk> Model Based Metaheuristics Model Based Metaheuristics Outline CEM Cross Entropy Method CEM Continuous Optimization Continuous Optimization Key idea: use rare event-simulation and importance sampling to proceed 1. Model Based Metaheuristics towards good solutions Cross Entropy Method generate random solution samples according to a specified mechanism update the parameters of the random mechanism to produce a 2. Continuous Optimization better “sample” Numerical Analysis

  2. Model Based Metaheuristics Model Based Metaheuristics CE for Optimization CEM Estimation CEM Continuous Optimization Continuous Optimization � I { f ( s ) ≥ γ } p ( s, θ ′ ) = E θ ′ � � Notation: ℓ = I { f ( s ) ≥ γ } s S finite set of states Monte-Carlo simulation: f real valued performance functions on S draw a random sample max s ∈S f ( s ) = γ ∗ = f ( s ∗ ) (our problem) � N compute unbiased estimator of ℓ : ˆ ℓ = 1 i =1 I { f ( s i ) ≥ γ } N { p ( s, θ ) | θ ∈ Θ } family of discrete probability mass function on if probability to sample I { f ( s i ) ≥ γ } the estimation is not accurate s ∈ S E θ [ f ( s )] = � s ∈S f ( s ) p ( s, θ ) Importance sampling: use a different probability function g on S to sample the solutions We are interested in the probability that f ( s ) is greater than some ℓ = � � � threshold γ under the probability p ( · , θ ∗ ) : s I { f ( s ) ≥ γ } p ( s, θ ′ ) I { f ( s ) ≥ γ } p ( s, θ ′ ) g ( s ) g ( s ) = E g g ( s ) � I { f ( s ) ≥ γ } p ( s, θ ′ ) = E θ ′ � � compute unbiased estimator of ℓ : ℓ = Pr( f ( s ) ≥ γ ) = I { f ( s ) ≥ γ } s N � I { f ( s i ) ≥ γ } p ( s, θ ′ ) ℓ = 1 ˆ if this probability is very small then we call { f ( s ) ≥ γ } a rare event N g ( s ) i =1 Model Based Metaheuristics Model Based Metaheuristics CEM CEM Continuous Optimization Continuous Optimization How to determine g ? Generalizing to probability density functions and Lebesque integrals Best choice would be: � � min D ( g ∗ , g ) = min g ∗ ( s ) ln g ∗ ( s ) ds − g ∗ ( s ) ln g ( s, θ ) ds g ∗ ( s ) := I { f ( s ) ≥ γ } p ( x, θ ′ ) θ , l Minimizing the distance by means of sampling estimation leads to: � N as substituting ˆ i =1 I { f ( s i ) ≥ γ } p ( s, θ ′ ) ℓ = 1 g ∗ ( s ) = ℓ . N θ = argmax θ E θ ′′ I { f ( s i ) ≥ γ } p ( s, θ ′ ) But ℓ is unknwon. � p ( s, θ ′′ ) ln p ( s, θ ) It is convinient to choose g from { p ( · , θ ) } stochastic program (convex). In some cases can be solved in closed form (eg, exponential, Choose the parameter θ such that the difference of g = p ( · , θ ) to g ∗ Bernoulli). is minimal Same result can be obtained by maximum likelihood estimation over Cross entropy or Kullback Leibler distance, measure of the distance the solutions s i with performance ≥ γ between two probability distribution functions, � � N � ln g ∗ ( s ) L = max p ( s i , θ ) D ( g ∗ , g ) = E g ∗ θ g ( s ) i =1

  3. Model Based Metaheuristics Model Based Metaheuristics CEM CEM Continuous Optimization Continuous Optimization Cross Entropy Method (CEM): Estimation via stochastic counterpart: Define � N θ 0 . Set t = 1 � I { f ( s i ) ≥ γ ) } p ( s i , θ ′ ) 1 � θ = argmax θ p ( s i , θ ′′ ) ln p ( s i , θ ) N while termination criterion is not satisfied do i =1 generate a sample ( s 1 , s 2 , . . . s N ) from the pdf p ( · ; ˆ θ t − 1 ) where s 1 , . . . , s N is a random sample from p ( · , θ ′′ ) . set � γ t equal to the (1 − ρ ) -quantile with respect to f γ t = s ( ⌈ (1 − ρ ) N ⌉ ) ) But still problems with sampling due to rare events. ( � Solution: Two-phase iterative approach: use the same sample ( s 1 , s 2 , . . . , s N ) to solve the stochastic program construct a sequence of levels b γ 1 , b γ 2 , . . . , b γ t construct a sequence of parameters b θ 1 , b θ 2 , . . . , b � N θ t � 1 θ t = arg max γ t } ln p ( s i ; θ ) I { f ( s i ) ≤ b N such that � γ t is close to optimal v i =1 and � θ t assigns maximal probability to sample high quality solutions Model Based Metaheuristics Model Based Metaheuristics CEM CEM Continuous Optimization Continuous Optimization Example: TSP Solution representation: permutation representation Termination criterion : if for some t ≥ d with, e.g. , d = 5 , Probabilistic model: matrix P where p ij represents probability of γ t = � γ t − 1 = . . . = � γ t − d � vertex j after vertex i Smoothed Updating : � θ t = α � θ ′ t + (1 − α ) � θ t − 1 with 0 . 4 ≤ α ≤ 0 . 9 Tour construction: specific for tours θ ′ t from the stochastic counterpart Define P (1) = P and X 1 = 1 . Let k = 1 Parameters : while k < n − 1 do obtain P ( k +1) from P ( k ) by setting the X k -th column of P ( k ) N = cn , n size of the problem (number of choices available for each solution component to decide) to zero and normalizing the rows to sum up to 1. c > 1 ( 5 ≤ c ≤ 10 ); Generate X k +1 from the distribution formed by the X k -th row of ρ ≈ 0 . 01 for n ≥ 100 and ρ ≈ ln( n ) /n for n < 100 P ( k ) set k = k + 1 Update: take the fraction of times transition i to j occurred in those paths the cycles that have f ( s ) ≤ γ

  4. Model Based Metaheuristics Model Based Metaheuristics Outline Numerical Analysis Continuous Optimization Numerical Analysis Continuous Optimization Continuous Optimization We look at unconstrained optimization of continuous, non-linear, 1. Model Based Metaheuristics non-convex, non-differentiable functions Cross Entropy Method Many applications above all in statistical estimation, (eg, likelihood estimation) 2. Continuous Optimization Numerical Analysis Typically few variables (curse of dimensionality) Smooth Functions Model Based Metaheuristics Model Based Metaheuristics Standard Test Functions Numerical Analysis Numerical Analysis Continuous Optimization Continuous Optimization Differentiable Gradient Descent f ( x ) decreases Rosenbrock’s banana function fastest moving in the direction of f ( x, y ) = (1 − x ) 2 + 100( y − x 2 ) 2 the negative gradient of f Hence, Global minimum at ( x, y ) = (1 , 1) where f ( x, y ) = 0 x n +1 = x n − γ ∇ f ( x n ) converges for appropriate x 0 and for Multidimensional extension is γ n > 0 small enough numbers. N − 1 � � i ) 2 � (1 − x i ) 2 + 100( x i +1 − x 2 ∀ x ∈ R N . f ( x ) = Problem is choosing γ i =1 Global minimum at ( x 1 , . . . , x N ) = (1 , . . . , 1) Secant Method If only one-dimension and f hard to Rastrigin’s differentiate: Schwefel’s x n − x n − 1 Sphere x n +1 = x n − f ( x n ) − f ( x n − 1 ) f ( x n ) . Continue at: http://www.cs.bham.ac.uk/research/projects/ecb/

  5. Smooth functions Model Based Metaheuristics Model Based Metaheuristics Numerical Analysis Numerical Analysis Continuous Optimization Continuous Optimization Twice differentiable Newton’s method in one dimension Taylor expansion of f ( x ) : f ( x + ∆ x ) = f ( x ) + f ′ ( x )∆ x + 1 2 f ′′ ( x )∆ x 2 , attains its extremum when ∆ x solves the linear equation: f ′ ( x ) + f ′′ ( x )∆ x = 0 and f ′′ ( x ) > 0 Hence, if x 0 is chosen appropriately, the sequence below converges to x ∗ x n +1 = x n − f ′ ( x n ) f ′′ ( x n ) , n ≥ 0 Newton’s method generalized to several dimensions first derivative ← − gradient ∇ f ( x ) , reciprocal of the second derivative ← − inverse of Hessian matrix, Hf ( x ) x n +1 = x n − [ Hf ( x n )] − 1 ∇ f ( x n ) , n ≥ 0 . Model Based Metaheuristics Numerical Analysis Continuous Optimization Newton’s method converges much faster towards a local maximum or minimum than gradient descent. However, finding the inverse of the Hessian may be an expensive operation, so approximations may be used instead Quasi-Newton methods Conjugate Gradient [Fletcher and Reeves (1964)] BFGS (variable metric algorithm) [Broyden, Fletcher, Goldfarb and Shanno (1970)]

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend