Applications of Algorithmic Differentiation within Surrogate Model - PowerPoint PPT Presentation

Applications of Algorithmic Differentiation within Surrogate Model Generation Dr. David Toal, Dr. Chris Brooks, Dr. Alex Forrester & Prof. Andy Keane 11 th European Workshop on Automatic Differentiation 9 th December, 2010

Presentation Overview � Surrogate modelling and Kriging � Algorithmic differentiation within surrogate model generation – Standard Kriging – Co-Kriging – Gradient enhanced Kriging 2

Surrogate Modelling � Creation of a model of the response of an expensive black box function (e.g. CFD or FEA analyses) � Such models can be used to: – Drive an optimisation of the objective function – Model constraints – Pass information between partners – Facilitate cross partner trade-off studies 3

Surrogate Modelling Design of Experiments Surrogate Model Construction Surrogate Searched For Good Designs True Objective Function Evaluated No Stopping Criterion Met? Yes Finish A typical surrogate based optimisation process 4

Surrogate Modelling An example of a surrogate based optimisation 5

Kriging � Kriging is a popular method of generating surrogate models – Produces an accurate predictor – Error estimates of the predictor are available � However the construction of a kriging model requires the optimisation of a series of “hyperparameters” – θ - rate of correlation decrease for each dimension – p - the degree of smoothness – λ - regression constant 6

Kriging � These parameters should be optimised after the inclusion of additional true objective function values � However this continual optimisation can form a significant bottleneck in the overall optimisation process Increase in total tuning time with increasing problem dimensionality [1] [1] – Toal, D.J.J., Forrester, A.I.J, Bressloff, N.W., Keane, A.J. & Holden, C.M.E., “An Adjoint for Likelihood 7 Maximization”, Proceedings of the Royal Society A, Vol. 465 (2111), pg 3267-3287, 2009

Kriging � Kriging assumes that the correlation between two sample points is � Where the hyperparameters θ and p are determined by a maximisation of the concentrated log likelihood 8

Kriging � The cost of evaluating the likelihood is mainly a result of the O(n 3 ) factorisation of the correlation matrix � Problems with large sample plans and large no. variables this optimisation can be expensive � Research focused on accelerating this optimisation via – An efficient derivative calculation – Hybridised global optimisation algorithm 9

Kriging � Initial attempt at an efficient derivative calculation focused on reverse algorithmic differentiation of the likelihood function [1] Comparison of relative derivative costs [1] [1] – Toal, D.J.J., Forrester, A.I.J, Bressloff, N.W., Keane, A.J. & Holden, C.M.E., “An Adjoint for Likelihood 10 Maximization”, Proceedings of the Royal Society A, Vol. 465 (2111), pg 3267-3287, 2009

Kriging � Reverse mode calculation proved to be the most efficient � Proved to be less sensitive to increasing sampling density [1] Comparison of relative derivative costs with changing sample size [1] [1] – Toal, D.J.J., Forrester, A.I.J, Bressloff, N.W., Keane, A.J. & Holden, C.M.E., “An Adjoint for Likelihood 11 Maximization”, Proceedings of the Royal Society A, Vol. 465 (2111), pg 3267-3287, 2009

Kriging � This formulation required a reverse differentiation of the Cholesky factorisation � Using the linear algebra results of Giles [2] the adjoint can be calculated more efficiently [3] � The derivative calculation can now make complete use of available libraries for matrix and vector operations [2] – Giles, M., “Collected Matrix Derivative Results for Forward and Reverse Model Algorithmic Differentiation”, Lecture Notes in Computational Science and Engineering, Vol. 64, pg 35-44, 2008 [3] – Toal, D.J.J., Bressloff, N.W., Keane, A.J. & Holden, C.M.E., “The Development of a Hybridized Particle Swarm for Kriging Hyperparameter Tuning”, Engineering Optimization, (Accepted for Publication) 12

Kriging � From the likelihood function the adjoints of the variance and the determinant of the correlation matrix are � Using Giles’ result for the adjoint of the second quadratic matrix product � The component of the adjoint of R due to the variance is 13

Kriging � Likewise, from Giles’ result for the determinant � The component of the adjoint of R due to the determinant is � Combining with the previous component gives 14

Kriging � The derivatives of the hyperparameters are therefore � Although , must be calculated components of have already been calculated in the forward pass and have already been used to calculate the variance 15

Kriging � This results in an increase in efficiency over the previous formulation ( ≈ 10%) Comparison of relative derivative costs 16

Kriging � However the likelihood function is multi-modal and therefore requires a global optimisation � Derivative information was employed within a hybridised particle swarm algorithm [3] � Used successfully in the optimisation of: – Analytical test functions [3] – Single & Multipoint aerofoil design optimisations [3,4] [3] – Toal, D.J.J., Bressloff, N.W., Keane, A.J. & Holden, C.M.E., “The Development of a Hybridized Particle Swarm for Kriging Hyperparameter Tuning”, Engineering Optimization, (Accepted for Publication) [4] – Toal, D.J.J. & Keane, A.J., “Efficient Multi-point Aerodynamic Design Optimization Via Co-Kriging”, Journal of Aircraft, (Under Review) 17

Co-Kriging � Multiple levels of simulation fidelity can be employed to enhance the accuracy of a surrogate model Co-Kriging example [5] [5] – Forrester, A.I.J., Sóbester, A. & Keane, A.J., “Engineering Design via Surrogate Modelling - A Practical Guide”, John Wiley & Sons, August 2008 18

Co-Kriging � A surrogate of the expensive function is constructed from � Where Z c denotes a kriging model of the cheap function and Z d a kriging model of the difference between cheap & expensive � The derivatives of the hyperparameters of Z c are identical to those of standard kriging � As are the derivatives of θ , p and λ for Z d 19

Co-Kriging � The only difference is the inclusion of the scaling factor ρ � A Kriging model is built of � Using the results of Giles’ � Which gives an overall derivative of � As before has already been calculated on the forward pass 20

Co-Kriging � This formulation has been successfully employed in: – Multipoint aerofoil optimisation [4] – Compressor rotor optimisation [6] Baseline compressor rotor design and rotor optimised via co-kriging [6] [4] – Toal, D.J.J. & Keane, A.J., “Efficient Multi-point Aerodynamic Design Optimization Via Co-Kriging”, Journal of Aircraft, (Under Review) [6] – Brooks, C.J., Forrester, A.I.J., Keane, A.J. & Shahpar, S., “Multifidelity Optimisation of a Transonic Compressor Rotor”, 9 th European Turbomachinery Conference, 21-25 th March, 2011, Istanbul Turkey, (Under Review) 21

Gradient Enhanced Kriging � Employs gradient information at each sample point � Gradient information can be obtained from AD � Significantly improves surrogate model accuracy Gradient enhanced kriging example 22

Gradient Enhanced Kriging � The improvement in accuracy comes at an increased hyperparameter tuning cost � The inclusion of gradient information enlarges the correlation matrix – In traditional kriging the matrix is n × n – The matrix is now (d+1)n × (d+1)n � This is often cited as a drawback of this method � An adjoint formulation may accelerate the tuning process 23

Conclusions � Presented a brief introduction to surrogate modelling � Illustrated the problem of hyperparameter tuning within surrogate based design optimisation � Presented an adjoint of the concentrated likelihood function for both kriging and co-kriging � Presented the need to accelerate the hyperparameter tuning of gradient enhanced kriging models 24

Questions?

Applications of Algorithmic Differentiation within Surrogate Model - PowerPoint PPT Presentation

Applications of Algorithmic Differentiation within Surrogate Model Generation Dr. David Toal, Dr. Chris Brooks, Dr. Alex Forrester & Prof. Andy Keane 11 th European Workshop on Automatic Differentiation 9 th December, 2010 Presentation

Algorithmic Differentiation of a Basket Option Code Results GPU Accelerated Application Race

Treewidth reduction and algorithmic applications Treewidth reduction and algorithmic applications

Differentiation Differentiation stems from beliefs about differences among learners, how they

JUST THE MATHS SLIDES NUMBER 10.3 DIFFERENTIATION 3 (Elementary techniques of

JUST THE MATHS SLIDES NUMBER 10.4 DIFFERENTIATION 4 (Products and quotients) &

4.4. Vertical Differentiation Matilde Machado Industrial Organization- Matilde Machado Vertical

Beautiful differentiation Conal Elliott LambdaPix 1 September, 2009 ICFP Conal Elliott

Algorithmic Complexity Algorithmic Complexity "Algorithmic Complexity", also called

Algorithmic Differentiation of Structured Mesh Applications G abor D aniel Balogh

Creating Meaningful Difference -a workshop about successful differentiation Workshop Team

HW2o Image Differentiation COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

Implicit Differentiation Michael Freeze MAT 151 UNC Wilmington Summer 2013 1 / 14 Section 6.4

Numerical Differentiation & Integration Numerical Differentiation I Numerical Analysis (9th

OVERVIEW Contact-induced change Contact-induced differentiation (CID) A cognitive model

Numerical Differentiation CIS 541 - Differentiation The mathematical definition: Roger

Numerical Differentiation & Integration Numerical Differentiation II Numerical Analysis (9th

Filter Design Selin Aviyente Department of Electrical and Computer Engineering Michigan State

Multid Multidimens imensional ional Science Asse Science Assessment ssment Design: A V Des

CREATING COHERENCE THROUGH THE PRACTICE OF INSTRUCTIONAL ROUNDS AdvanceED Fall Conference

Decision Support Systems SYSTeMS Ghent University Utrecht University Linda Gert van der Gaag

NUCLEAR SITE INTEGRATED CHARACTERIZATION FOR RADIOACTIVE WASTE MINIMIZATION: THE INSIDER PROJECT

TRANSFORMING THE HEALTH & WELLNESS INDUSTRY October 2019 1 NASDAQ/TSX: NEPT

PAPER DELIVERED AT THE BREAKOUT SESSION ON: COLLAPSE OF

Investor Presentation MA Y 2020 Forward-Looking Statements and Other Disclaimers These materials

Applications of Algorithmic Differentiation within Surrogate Model - PowerPoint PPT Presentation

Applications of Algorithmic Differentiation within Surrogate Model Generation Dr. David Toal, Dr. Chris Brooks, Dr. Alex Forrester & Prof. Andy Keane 11 th European Workshop on Automatic Differentiation 9 th December, 2010 Presentation

Algorithmic Differentiation of a Basket Option Code Results GPU Accelerated Application Race

Treewidth reduction and algorithmic applications Treewidth reduction and algorithmic applications

Differentiation Differentiation stems from beliefs about differences among learners, how they

JUST THE MATHS SLIDES NUMBER 10.3 DIFFERENTIATION 3 (Elementary techniques of

JUST THE MATHS SLIDES NUMBER 10.4 DIFFERENTIATION 4 (Products and quotients) &amp;

4.4. Vertical Differentiation Matilde Machado Industrial Organization- Matilde Machado Vertical

Beautiful differentiation Conal Elliott LambdaPix 1 September, 2009 ICFP Conal Elliott

Algorithmic Complexity Algorithmic Complexity &quot;Algorithmic Complexity&quot;, also called

Algorithmic Differentiation of Structured Mesh Applications G abor D aniel Balogh

Creating Meaningful Difference -a workshop about successful differentiation Workshop Team

HW2o Image Differentiation COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

Implicit Differentiation Michael Freeze MAT 151 UNC Wilmington Summer 2013 1 / 14 Section 6.4

Numerical Differentiation &amp; Integration Numerical Differentiation I Numerical Analysis (9th

OVERVIEW Contact-induced change Contact-induced differentiation (CID) A cognitive model

Numerical Differentiation CIS 541 - Differentiation The mathematical definition: Roger

Numerical Differentiation &amp; Integration Numerical Differentiation II Numerical Analysis (9th

Filter Design Selin Aviyente Department of Electrical and Computer Engineering Michigan State

Multid Multidimens imensional ional Science Asse Science Assessment ssment Design: A V Des

CREATING COHERENCE THROUGH THE PRACTICE OF INSTRUCTIONAL ROUNDS AdvanceED Fall Conference

Decision Support Systems SYSTeMS Ghent University Utrecht University Linda Gert van der Gaag

NUCLEAR SITE INTEGRATED CHARACTERIZATION FOR RADIOACTIVE WASTE MINIMIZATION: THE INSIDER PROJECT

TRANSFORMING THE HEALTH &amp; WELLNESS INDUSTRY October 2019 1 NASDAQ/TSX: NEPT

PAPER DELIVERED AT THE BREAKOUT SESSION ON: COLLAPSE OF

Investor Presentation MA Y 2020 Forward-Looking Statements and Other Disclaimers These materials

JUST THE MATHS SLIDES NUMBER 10.4 DIFFERENTIATION 4 (Products and quotients) &

Algorithmic Complexity Algorithmic Complexity "Algorithmic Complexity", also called

Numerical Differentiation & Integration Numerical Differentiation I Numerical Analysis (9th

Numerical Differentiation & Integration Numerical Differentiation II Numerical Analysis (9th

TRANSFORMING THE HEALTH & WELLNESS INDUSTRY October 2019 1 NASDAQ/TSX: NEPT