Effects of Constant Optimization by Nonlinear Contact: Michael - - PowerPoint PPT Presentation

β–Ά
effects of constant optimization by nonlinear
SMART_READER_LITE
LIVE PREVIEW

Effects of Constant Optimization by Nonlinear Contact: Michael - - PowerPoint PPT Presentation

Effects of Constant Optimization by Nonlinear Contact: Michael Kommenda Least Squares Minimization in Symbolic Regression Heuristic and Evolutionary Algorithms Lab (HEAL) Michael Kommenda, Gabriel Kronberger , Stephan Winkler, Softwarepark 11


slide-1
SLIDE 1

Michael Kommenda, Gabriel Kronberger, Stephan Winkler, Michael Affenzeller, and Stefan Wagner

Effects of Constant Optimization by Nonlinear Least Squares Minimization in Symbolic Regression

Contact: Michael Kommenda Heuristic and Evolutionary Algorithms Lab (HEAL) Softwarepark 11 A-4232 Hagenberg e-mail:

michael.kommenda@fh-hagenberg.at

Web:

http://heal.heuristiclab.com http://heureka.heuristiclab.com

slide-2
SLIDE 2

Symbolic Regression

2 Effects of Constant Optimization by Nonlinear Least Squares Minimization

Model a relationship between input variables x and target variable y without any predefined structure Minimization of Ξ΅ using an evolutionary algorithm

  • Model structure
  • Used variables
  • Constants / weights
slide-3
SLIDE 3

Research Assumption

3 Effects of Constant Optimization by Nonlinear Least Squares Minimization

The correct model structure is found during the algorithm execution, but not recognized due to misleading / wrong constants.

  • 1.2

+ 5.0 * X * 0.3 2.0 * Y

  • 1.2

+ 1.0 * X 0.6 * Y

slide-4
SLIDE 4

Constants in Symbolic Regression

4

Ephemeral Random Constants

  • Randomly initialized constants
  • Remain fixed during the algorithm run

Evolutionary Constants

  • Updated by mutation

ο€­ π·π‘œπ‘“π‘₯ = π·π‘π‘šπ‘’ + 𝑂 0, 𝜏 ο€­ π·π‘œπ‘“π‘₯ = π·π‘π‘šπ‘’ βˆ— 𝑂 1, 𝜏

Finding correct constants

  • combination of existing values
  • mutation of constant symbol nodes

ο€­ undirected changes to values

Effects of Constant Optimization by Nonlinear Least Squares Minimization

  • 1.2

+ 1.0 X * 0.3 2.0

slide-5
SLIDE 5

Summary of Previous Research

5

Faster genetic programming based on local gradient search of numeric leaf values (Topchy and Punch, GECCO 2001) Improving gene expression programming performance by using differential evolution (Zhang et al., ICMLA 2007) Evolution Strategies for Constants Optimization in Genetic Programming (Alonso, ICTAI 2009) Differential Evolution of Constants in Genetic Programming Improves Efficacy and Bloat (Mukherjee and Eppstein, GECCO 2012)

Effects of Constant Optimization by Nonlinear Least Squares Minimization

slide-6
SLIDE 6

Linear Scaling

6

Improving Symbolic Regression with Interval Arithmetic and Linear Scaling (Keijzer, EuroGP 2003) Use Pearson’s RΒ² as fitness function and perform linear scaling

  • Removes necessity to find correct offset and scale
  • Computationally efficient

Outperforms the local gradient search

Effects of Constant Optimization by Nonlinear Least Squares Minimization

slide-7
SLIDE 7

Concept

  • Treat all constants as parameters
  • Local optimization step
  • Multidimensional optimization

Levenberg-Marquardt Algorithm

  • Least squares fitting of model parameters to empirical data
  • π‘π‘—π‘œπ‘—π‘›π‘—π‘¨π‘“ 𝑅 𝛾 =

𝑧𝑗 βˆ’ 𝑔 𝑦𝑗, 𝛾

2 𝑛 𝑗=1

  • Uses gradient and Jacobian matrix information
  • Implemented e.g. by ALGLIB

Constant Optimization

7 Effects of Constant Optimization by Nonlinear Least Squares Minimization

slide-8
SLIDE 8

Gradient Calculation

8

Transformation of symbolic expression tree

  • Extract initial numerical values (starting point)
  • Add scaling tree nodes

Automatic differentiation

  • Provided e.g. by AutoDiff
  • Numerical gradient calculation in one pass
  • Faster compared to symbolic differentiation

Update tree with optimized values

  • Optionally calculate new fitness

Effects of Constant Optimization by Nonlinear Least Squares Minimization

  • 3.12

+

𝑦

*

0.06 1.0

*

𝛾5

+

𝛾6 𝛼𝑔 = πœ–π‘” πœ–π›Ύ1 , πœ–π‘” πœ–π›Ύ2 , … , πœ–π‘” πœ–π›Ύπ‘œ

  • 𝛾4

+

𝛾1𝑦

*

𝛾3 𝛾2

slide-9
SLIDE 9

Constant Optimization Improvement

9 Effects of Constant Optimization by Nonlinear Least Squares Minimization

π‘±π’π’’π’”π’‘π’˜π’‡π’π’‡π’π’– = π‘Ήπ’—π’ƒπ’Žπ’‹π’–π’›π’‘π’’π’–π’‹π’π’‹π’œπ’‡π’† βˆ’ π‘Ήπ’—π’ƒπ’Žπ’‹π’–π’›π’‘π’”π’‹π’‰π’‹π’π’ƒπ’Ž Exemplary GP Run

  • Average & median improvement

stays constantly low

  • Maximum improvement almost

reaches the best quality found

  • Crossover worsens good individuals
  • The quality of few individuals can be

dramatically increased

slide-10
SLIDE 10

Problems

10

Symbolic regression benchmarks

  • Better GP Benchmarks: Community Survey Results and Proposals

(White et al., GPEM 2013)

Effects of Constant Optimization by Nonlinear Least Squares Minimization

Problem Function Training Test Nguyen-7

𝑔 𝑦 = ln 𝑦 + 1 + ln (𝑦2 + 1)

20 500 Keijzer-6

𝑔 𝑦, 𝑧, 𝑨 = 30𝑦𝑨 𝑦 βˆ’ 10 𝑧2

20 120 Vladislavleva-4

𝑔 𝑦1, … , 𝑦5 = 10 5 + 𝑦𝑗 βˆ’ 30 2

1024 5000 Pagie-1

𝑔 𝑦, 𝑧 = 1 1 + π‘¦βˆ’4 + 1 1 + π‘§βˆ’4

676 1000 Poly-10

𝑔 𝑦1, … , 𝑦10 = 𝑦1𝑦2 + 𝑦3𝑦4 + 𝑦5𝑦6 + 𝑦1𝑦7𝑦9 + 𝑦3𝑦6𝑦10

250 250 Friedman-2

𝑔(𝑦1, … , 𝑦10) = 10 sin(π𝑦1𝑦2) + 20 𝑦3 βˆ’ 0.5 2 + 10𝑦4 + 5𝑦5 + 𝑂 0,1

500 5000 Tower

Real world data

3136 1863

slide-11
SLIDE 11

Algorithm Configurations

11

Genetic Programming with strict offspring selection

  • Only child individuals with better quality compared to the fitter parent are accepted

in the new generation

Varying parameters

  • Population size of 500, 1000, and 5000 for runs without constant optimization
  • Probability for constant optimization 25%, 50%, and 100% (population size 500)

All others parameters were not modified

  • Maximum selection pressure of 100 was used as termination criterion
  • Size constraints of tree length 50 and depth 12
  • Mutation rate of 25%
  • Function set consists solely of arithmetic functions (except Nguyen-7)

Effects of Constant Optimization by Nonlinear Least Squares Minimization

slide-12
SLIDE 12

Results - Quality

12

Success rate (test RΒ² > 0.99)

Effects of Constant Optimization by Nonlinear Least Squares Minimization

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Nguyen-7 Keijzer-6 Vladislavleva-4 Pagie-1 Poly-10

OSGP 500 OSGP 1000 OGSP 5000 CoOp 25% CoOp 50% CoOp 100%

slide-13
SLIDE 13

Results - Quality

13

Noisy datasets

  • Success rate not applicable
  • RΒ² of best training solution (ΞΌ Β± Οƒ)

Effects of Constant Optimization by Nonlinear Least Squares Minimization

Configuration Friedman-2 Tower Training Test Training Test OSGP 500

0.836 Β± 0.027 0.768 Β± 0.172 0.877 Β± 0.007 0.876 Β± 0.012

OSGP 1000

0.857 Β± 0.036 0.831 Β± 0.102 0.880 Β± 0.006 0.877 Β± 0.024

OSGP 5000

0.908 Β± 0.035 0.836 Β± 0.191 0.892 Β± 0.006 0.890 Β± 0.008

CoOp 25%

0.959 Β± 0.001 0.871 Β± 0.151 0.919 Β± 0.006 0.916 Β± 0.007

CoOp 50%

0.967 Β± 0.000 0.920 Β± 0.086 0.925 Β± 0.005 0.921 Β± 0.006

CoOp 100%

0.964 Β± 0.000 0.864 Β± 0.142 0.932 Β± 0.005 0.927 Β± 0.005

slide-14
SLIDE 14

Results – LM Iterations

14

Constant optimization probability of 50% Varying iterations for the LM algorithm (3x, 5x, 10x)

  • success rate
  • respectively test RΒ² for noisy datasets

Effects of Constant Optimization by Nonlinear Least Squares Minimization

Problem OGSP 5000 CoOp 50% 3x CoOp 50% 5x CoOp 50% 10x Nguyen-7 1.00 0.92 0.92 0.94 Keijzer-6 0.74 0.92 0.88 0.94 Vladislavleva-4 0.48 0.56 0.82 0.86 Pagie-1 0.20 0.26 0.52 0.74 Poly-10 0.62 0.78 0.88 0.94 Friedman-2

0.836 Β± 0.191 0.946 Β± 0.046 0.943 Β± 0.076 0.920 Β± 0.086

Tower

0.890 Β± 0.009 0.902 Β± 0.010 0.912 Β± 0.008 0.921 Β± 0.006

slide-15
SLIDE 15

Results - Execution Effort

16

Execution effort relative to OSGP 500

Effects of Constant Optimization by Nonlinear Least Squares Minimization

0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 Nguyen-7 Keijzer-6 Valdislavleva-4 Paige-1 Poly-10 Friedmann-2 Tower OSGP 1000 OSGP 5000 CoOp 50% 3x CoOp 50% 5x CoOp 50% 10x

slide-16
SLIDE 16

Feature Selection Problems

17

Artificial datasets

  • 100 input variables Ɲ(0,1)
  • Linear combination of 10/25 variables with weights 𝑉 0,10
  • noisy οƒ  max RΒ² = 0.90
  • Training 120 rows, Test 500 rows
  • Population size 500
  • Constant optimization 50% 5x

Observation

  • Constant optimization can lead to overfitting
  • Selection of correct features is also an issue

Effects of Constant Optimization by Nonlinear Least Squares Minimization 0.6 0.7 0.8 0.9 1 OSGP CoOp OSGP CoOp Training Test 10 Features 25 Features

slide-17
SLIDE 17

Conclusion

19

Constant optimization improves the success rate and quality of models

  • Better results with smaller population size
  • Especially useful for post-processing of models

Removes the effort of evolving correct constants

  • Genetic programming can concentrate on the model structure and feature selection

Ready-to-use implementation in HeuristicLab

  • Configurable probability, iterations, random sampling
  • All experiments available for download
  • http://dev.heuristiclab.com/AdditionalMaterial

Effects of Constant Optimization by Nonlinear Least Squares Minimization

slide-18
SLIDE 18

Michael Kommenda, Gabriel Kronberger, Stephan Winkler, Michael Affenzeller, and Stefan Wagner

Effects of Constant Optimization by Nonlinear Least Squares Minimization in Symbolic Regression

Contact: Michael Kommenda Heuristic and Evolutionary Algorithms Lab (HEAL) Softwarepark 11 A-4232 Hagenberg e-mail:

michael.kommenda@fh-hagenberg.at

Web:

http://heal.heuristiclab.com http://heureka.heuristiclab.com