Computing Sparse Hessians with AD Andrea Walther Institute of - PowerPoint PPT Presentation

Computing Sparse Hessians with AD Andrea Walther Institute of Scientific Computing TU Dresden Hatfield, May 21, 2007

Overview Motivation 1 Computing Sparsity Patterns 2 Evaluating Hessian-matrix Products 3 Numerical Results 4 Conclusion and Outlook 5 Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 1 / 17

Motivation Efficient computation of Hessian H ( x ) 3 1 2 f P S B = H ( x ) S sparsity seed calculation function pattern matrix of entries Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 2 / 17

Motivation Efficient computation of Hessian H ( x ) 3 1 2 f P S B = H ( x ) S sparsity seed calculation function pattern matrix of entries Remarks: 1 and 2 are performed only once Steps 3 evaluated for each x Step Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 2 / 17

Motivation Efficient computation of Hessian H ( x ) 3 1 2 f P S B = H ( x ) S sparsity seed calculation function pattern matrix of entries Remarks: 1 and 2 are performed only once Steps 3 evaluated for each x Step Topics of this talk: 1 : Generation of sparsity pattern Step 3 : Evaluating Hessian-matrix products Step Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 2 / 17

Motivation Assumptions: f : R n → R , y = f ( x ) , twice continuously differentiable function evaluation consists of unary or binary operations Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 3 / 17

Motivation Assumptions: f : R n → R , y = f ( x ) , twice continuously differentiable function evaluation consists of unary or binary operations Algorithm I: Function Evaluation for i = 1 , . . . , n v i − n = x i for i = 1 , . . . , l v i = ϕ i ( v j ) j ≺ i y = v l with precedence relation j ≺ i : ϕ i ( v j ) j ≺ i = ϕ i ( v j ) or ϕ i ( v j ) j ≺ i = ϕ i ( v j , v l ) with j , l < i Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 3 / 17

Computing Sparsity Patterns Determining the sparsity pattern Index domains [Griewank 2000]: X i ≡ { j ≤ n : j − n ≺ ∗ i } for i = 1 − n , . . . , l One has: � j ≤ n : ∂ v i � ⊆ X i � = 0 ∂ x j Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 4 / 17

Computing Sparsity Patterns Determining the sparsity pattern Index domains [Griewank 2000]: X i ≡ { j ≤ n : j − n ≺ ∗ i } for i = 1 − n , . . . , l One has: � j ≤ n : ∂ v i � ⊆ X i � = 0 ∂ x j For sparse Hessians additionally nonlinear interaction domains ∂ 2 y � � j ≤ n : ⊆ N i � = 0 ∂ x i ∂ x j for i = 1 , . . . , n . Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 4 / 17

Computing Sparsity Patterns Algorithm II: Computation of X i and N i for i = 1 , . . . , n X i − n ← { i } , N i ← ∅ for i = 1 , . . . , l X i ← � j ≺ i X j if ϕ i nonlinear then if v i = ϕ i ( v j ) then ∀ k ∈ X i : N k ← N k ∪ X i if v i = ϕ i ( v j , v ˆ  ) then if v i linear in v j then ∀ k ∈ X j : N k ← N k ∪ X ˆ  else ∀ k ∈ X j : N k ← N k ∪ X i if v i linear in v ˆ  then ∀ k ∈ X ˆ  : N k ← N k ∪ X j else ∀ k ∈ X ˆ  : N k ← N k ∪ X i Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 5 / 17

Computing Sparsity Patterns Algorithm II: Computation of X i and N i for i = 1 , . . . , n X i − n ← { i } , N i ← ∅ for i = 1 , . . . , l X i ← � j ≺ i X j if ϕ i nonlinear then if v i = ϕ i ( v j ) then ∀ k ∈ X i : N k ← N k ∪ X i if v i = ϕ i ( v j , v ˆ  ) then if v i linear in v j then ∀ k ∈ X j : N k ← N k ∪ X ˆ  else ∀ k ∈ X j : N k ← N k ∪ X i if v i linear in v ˆ  then ∀ k ∈ X ˆ  : N k ← N k ∪ X j else ∀ k ∈ X ˆ  : N k ← N k ∪ X i Complexity ?? Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 5 / 17

Computing Sparsity Patterns Algorithm II: Computation of X i and N i for i = 1 , . . . , n X i − n ← { i } , N i ← ∅ for i = 1 , . . . , l X i ← � j ≺ i X j (1) if ϕ i nonlinear then if v i = ϕ i ( v j ) then ∀ k ∈ X i : N k ← N k ∪ X i (2) if v i = ϕ i ( v j , v ˆ  ) then if v i linear in v j then ∀ k ∈ X j : N k ← N k ∪ X ˆ (3)  else ∀ k ∈ X j : N k ← N k ∪ X i (4) if v i linear in v ˆ  then ∀ k ∈ X ˆ  : N k ← N k ∪ X j (5) else ∀ k ∈ X ˆ  : N k ← N k ∪ X i (6) Complexity ?? Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 5 / 17

Computing Sparsity Patterns Theorem (Complexity result for Algorithm II) Let OPS ( NID ) denote the number of operations needed to generate all N i , 1 ≤ i ≤ n by Algorithm II. Then, the inequality l � � � ≤ 6 ( 1 + ˆ ¯ OPS NID n ) n i i = 1 is valid, where l is the number of elemental functions evaluated to compute the function value, ¯ n i = |X i | , and ˆ n = max 1 ≤ i ≤ n |N i | . Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 6 / 17

Computing Sparsity Patterns Theorem (Complexity result for Algorithm II) Let OPS ( NID ) denote the number of operations needed to generate all N i , 1 ≤ i ≤ n by Algorithm II. Then, the inequality l � � � ≤ 6 ( 1 + ˆ ¯ OPS NID n ) n i i = 1 is valid, where l is the number of elemental functions evaluated to compute the function value, ¯ n i = |X i | , and ˆ n = max 1 ≤ i ≤ n |N i | . Proof: Analyse set operations: j ≺ i X j ) ≤ 3 ¯ (1): OPS( X i = � n i (2): OPS( ∀ k ∈ X i : N k = N k ∪ X i ) ≤ ¯ n i ( 2 ˆ n + ˆ n ) = 3 ˆ n ¯ n i (3) - (6): same as (2) Details see [Walther 2007] Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 6 / 17

Evaluating Hessian-matrix Products Second order adjoints computed with AD reverse forward ˙ x + ˙ y ⊤ f ′ ( x ) y ⊤ f ′′ ( x ) ˙ y ⊤ f ′ ( x ) x = ¯ ¯ x = ¯ ¯ ¯ y = f ( x ) diff diff Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 7 / 17

Evaluating Hessian-matrix Products Second order adjoints computed with AD reverse forward ˙ x + ˙ y ⊤ f ′ ( x ) y ⊤ f ′′ ( x ) ˙ y ⊤ f ′ ( x ) x = ¯ ¯ x = ¯ ¯ ¯ y = f ( x ) diff diff Complexity [Griewank 2000]: TIME ( H ( x ) ˙ x ) ≤ ω soad TIME ( f ( x )) ω soad ∈ [ 7 , 10 ] with TIME ( H ( x ) S ) ≤ ω soad p TIME ( f ( x )) with ω soad ∈ [ 7 , 10 ] . Reduction possible ?? Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 7 / 17

Evaluating Hessian-matrix Products Computation of H ( x ) ˙ x ∈ R n for i = 1 , . . . , n ˙ ˙ = ˙ ¯ ¯ v i − n = x i , v i − n x i , v i − n = 0 , v i − n = 0 for i = 1 , . . . , l ∂ � = ϕ i ( v j ) j ≺ i , ˙ ϕ i ( v j ) j ≺ i ˙ v i v i = v j , ∂ v j j ≺ i y = ˙ ˙ ¯ v l = ¯ y = v l , v l , y for i = l , . . . , 1 ∂ v j += ¯ ¯ v i ϕ i ( v j ) j ≺ i for j ≺ i ∂ v j ∂ 2 ˙ � ¯ v j += ¯ ϕ i ( v j ) j ≺ i ˙ v i v k for j ≺ i ∂ v j ∂ v k k ≺ i for i = 1 , . . . , n ˙ ˙ ¯ ¯ ¯ ¯ x i = v i − n , x i = v i − n Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 8 / 17

Evaluating Hessian-matrix Products Computation of H ( x ) ˙ X ∈ R n × p for i = 1 , . . . , n ˙ ˙ = ˙ ¯ ¯ v i − n = x i , V i − n X i , v i − n = 0 , V i − n = 0 for i = 1 , . . . , l ∂ ˙ ϕ i ( v j ) j ≺ i ˙ � v i = ϕ i ( v j ) j ≺ i , V i = V j , ∂ v j j ≺ i Y = ˙ ˙ ¯ v l = ¯ y = v l , V l , y for i = l , . . . , 1 ∂ v j += ¯ ¯ v i ϕ i ( v j ) j ≺ i for j ≺ i ∂ v j ∂ 2 ˙ ¯ ϕ i ( v j ) j ≺ i ˙ � V j += ¯ v i V k for j ≺ i ∂ v j ∂ v k k ≺ i for i = 1 , . . . , n ˙ ˙ ¯ ¯ ¯ ¯ x i = v i − n , X i = V i − n Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 8 / 17

Evaluating Hessian-matrix Products Complexity ? TIME ( ˙ ¯ X ) ≤ ω soadp TIME ( f ( x )) with ω soadp ∈ [ 4 + 3 p , 4 + 6 p ] . Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 9 / 17

Evaluating Hessian-matrix Products Complexity ? TIME ( ˙ ¯ X ) ≤ ω soadp TIME ( f ( x )) with ω soadp ∈ [ 4 + 3 p , 4 + 6 p ] . Reduction of run time complexity: From [ 7 p , 10 p ] to [ 4 + 3 p , 4 + 6 p ] due to reduced number of recalculations. For details see [Walther 2007] Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 9 / 17

Numerical Results Implementation New drivers for AD-tool ADOL-C hess_pat(tag, n, x, P, option); Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 10 / 17

Computing Sparse Hessians with AD Andrea Walther Institute of - PowerPoint PPT Presentation

Computing Sparse Hessians with AD Andrea Walther Institute of Scientific Computing TU Dresden Hatfield, May 21, 2007 Overview Motivation 1 Computing Sparsity Patterns 2 Evaluating Hessian-matrix Products 3 Numerical Results 4

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Optimal Design Scenario Problem: Definition and Evaluation of Projected Hessians Min f ( y, u

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

CNBC Matlab Mini-Course Sparse Matrices Sparse matrices provide an efficient means to store

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

Extremal results for sparse pseudorandom graphs Yufei Zhao Massachusetts Institute of Technology

Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, 2016 Parallel Computing &

Sparse sampling sampling design in design in Sparse population PK/PD studies studies

A User-Friendly Hybrid Sparse Matrix Class in C++ Conrad Sanderson, Ryan R. Curtin July 19, 2018

Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of

High-Dimensional Pattern Recognition via Sparse Representation Allen Y. Yang University of

Tutorial: Sparse Recovery Using Sparse Matrices Piotr Indyk MIT Problem Formulation

Sparse Recovery and Fourier Sampling Eric Price MIT Eric Price (MIT) Sparse Recovery and

Are sample means in multi-armed bandits positively or negatively biased? Jaehyeok Shin 1 , Aaditya

Lecture 1.5: Multisets and multichoosing Matthew Macauley Department of Mathematical Sciences

The Representability of Partial Recursive Yan Steimle Functions in Arithmetical Theories and

Order Statistics Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola

Slides Dynamic and precise Product catalog 2017 Latest version of the catalogs You can always

Cyclic Coded Integer-Forcing Equalization Or Ordentlich Joint work with Uri Erez EE-Systems, Tel

Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition

Clustering with k-means and Gaussian mixture distributions Machine Learning and Category

Computing Sparse Hessians with AD Andrea Walther Institute of - PowerPoint PPT Presentation

Computing Sparse Hessians with AD Andrea Walther Institute of Scientific Computing TU Dresden Hatfield, May 21, 2007 Overview Motivation 1 Computing Sparsity Patterns 2 Evaluating Hessian-matrix Products 3 Numerical Results 4

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Optimal Design Scenario Problem: Definition and Evaluation of Projected Hessians Min f ( y, u

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

CNBC Matlab Mini-Course Sparse Matrices Sparse matrices provide an efficient means to store

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

Extremal results for sparse pseudorandom graphs Yufei Zhao Massachusetts Institute of Technology

Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, 2016 Parallel Computing &amp;

Sparse sampling sampling design in design in Sparse population PK/PD studies studies

A User-Friendly Hybrid Sparse Matrix Class in C++ Conrad Sanderson, Ryan R. Curtin July 19, 2018

Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of

High-Dimensional Pattern Recognition via Sparse Representation Allen Y. Yang University of

Tutorial: Sparse Recovery Using Sparse Matrices Piotr Indyk MIT Problem Formulation

Sparse Recovery and Fourier Sampling Eric Price MIT Eric Price (MIT) Sparse Recovery and

Are sample means in multi-armed bandits positively or negatively biased? Jaehyeok Shin 1 , Aaditya

Lecture 1.5: Multisets and multichoosing Matthew Macauley Department of Mathematical Sciences

The Representability of Partial Recursive Yan Steimle Functions in Arithmetical Theories and

Order Statistics Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola

Slides Dynamic and precise Product catalog 2017 Latest version of the catalogs You can always

Cyclic Coded Integer-Forcing Equalization Or Ordentlich Joint work with Uri Erez EE-Systems, Tel

Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition

Clustering with k-means and Gaussian mixture distributions Machine Learning and Category

Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, 2016 Parallel Computing &