computing sparse hessians with ad
play

Computing Sparse Hessians with AD Andrea Walther Institute of - PowerPoint PPT Presentation

Computing Sparse Hessians with AD Andrea Walther Institute of Scientific Computing TU Dresden Hatfield, May 21, 2007 Overview Motivation 1 Computing Sparsity Patterns 2 Evaluating Hessian-matrix Products 3 Numerical Results 4


  1. Computing Sparse Hessians with AD Andrea Walther Institute of Scientific Computing TU Dresden Hatfield, May 21, 2007

  2. Overview Motivation 1 Computing Sparsity Patterns 2 Evaluating Hessian-matrix Products 3 Numerical Results 4 Conclusion and Outlook 5 Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 1 / 17

  3. Motivation Efficient computation of Hessian H ( x ) 3 1 2 f P S B = H ( x ) S sparsity seed calculation function pattern matrix of entries Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 2 / 17

  4. Motivation Efficient computation of Hessian H ( x ) 3 1 2 f P S B = H ( x ) S sparsity seed calculation function pattern matrix of entries Remarks: 1 and 2 are performed only once Steps 3 evaluated for each x Step Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 2 / 17

  5. Motivation Efficient computation of Hessian H ( x ) 3 1 2 f P S B = H ( x ) S sparsity seed calculation function pattern matrix of entries Remarks: 1 and 2 are performed only once Steps 3 evaluated for each x Step Topics of this talk: 1 : Generation of sparsity pattern Step 3 : Evaluating Hessian-matrix products Step Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 2 / 17

  6. Motivation Assumptions: f : R n → R , y = f ( x ) , twice continuously differentiable function evaluation consists of unary or binary operations Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 3 / 17

  7. Motivation Assumptions: f : R n → R , y = f ( x ) , twice continuously differentiable function evaluation consists of unary or binary operations Algorithm I: Function Evaluation for i = 1 , . . . , n v i − n = x i for i = 1 , . . . , l v i = ϕ i ( v j ) j ≺ i y = v l with precedence relation j ≺ i : ϕ i ( v j ) j ≺ i = ϕ i ( v j ) or ϕ i ( v j ) j ≺ i = ϕ i ( v j , v l ) with j , l < i Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 3 / 17

  8. Computing Sparsity Patterns Determining the sparsity pattern Index domains [Griewank 2000]: X i ≡ { j ≤ n : j − n ≺ ∗ i } for i = 1 − n , . . . , l One has: � j ≤ n : ∂ v i � ⊆ X i � = 0 ∂ x j Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 4 / 17

  9. Computing Sparsity Patterns Determining the sparsity pattern Index domains [Griewank 2000]: X i ≡ { j ≤ n : j − n ≺ ∗ i } for i = 1 − n , . . . , l One has: � j ≤ n : ∂ v i � ⊆ X i � = 0 ∂ x j For sparse Hessians additionally nonlinear interaction domains ∂ 2 y � � j ≤ n : ⊆ N i � = 0 ∂ x i ∂ x j for i = 1 , . . . , n . Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 4 / 17

  10. Computing Sparsity Patterns Algorithm II: Computation of X i and N i for i = 1 , . . . , n X i − n ← { i } , N i ← ∅ for i = 1 , . . . , l X i ← � j ≺ i X j if ϕ i nonlinear then if v i = ϕ i ( v j ) then ∀ k ∈ X i : N k ← N k ∪ X i if v i = ϕ i ( v j , v ˆ  ) then if v i linear in v j then ∀ k ∈ X j : N k ← N k ∪ X ˆ  else ∀ k ∈ X j : N k ← N k ∪ X i if v i linear in v ˆ  then ∀ k ∈ X ˆ  : N k ← N k ∪ X j else ∀ k ∈ X ˆ  : N k ← N k ∪ X i Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 5 / 17

  11. Computing Sparsity Patterns Algorithm II: Computation of X i and N i for i = 1 , . . . , n X i − n ← { i } , N i ← ∅ for i = 1 , . . . , l X i ← � j ≺ i X j if ϕ i nonlinear then if v i = ϕ i ( v j ) then ∀ k ∈ X i : N k ← N k ∪ X i if v i = ϕ i ( v j , v ˆ  ) then if v i linear in v j then ∀ k ∈ X j : N k ← N k ∪ X ˆ  else ∀ k ∈ X j : N k ← N k ∪ X i if v i linear in v ˆ  then ∀ k ∈ X ˆ  : N k ← N k ∪ X j else ∀ k ∈ X ˆ  : N k ← N k ∪ X i Complexity ?? Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 5 / 17

  12. Computing Sparsity Patterns Algorithm II: Computation of X i and N i for i = 1 , . . . , n X i − n ← { i } , N i ← ∅ for i = 1 , . . . , l X i ← � j ≺ i X j (1) if ϕ i nonlinear then if v i = ϕ i ( v j ) then ∀ k ∈ X i : N k ← N k ∪ X i (2) if v i = ϕ i ( v j , v ˆ  ) then if v i linear in v j then ∀ k ∈ X j : N k ← N k ∪ X ˆ (3)  else ∀ k ∈ X j : N k ← N k ∪ X i (4) if v i linear in v ˆ  then ∀ k ∈ X ˆ  : N k ← N k ∪ X j (5) else ∀ k ∈ X ˆ  : N k ← N k ∪ X i (6) Complexity ?? Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 5 / 17

  13. Computing Sparsity Patterns Theorem (Complexity result for Algorithm II) Let OPS ( NID ) denote the number of operations needed to generate all N i , 1 ≤ i ≤ n by Algorithm II. Then, the inequality l � � � ≤ 6 ( 1 + ˆ ¯ OPS NID n ) n i i = 1 is valid, where l is the number of elemental functions evaluated to compute the function value, ¯ n i = |X i | , and ˆ n = max 1 ≤ i ≤ n |N i | . Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 6 / 17

  14. Computing Sparsity Patterns Theorem (Complexity result for Algorithm II) Let OPS ( NID ) denote the number of operations needed to generate all N i , 1 ≤ i ≤ n by Algorithm II. Then, the inequality l � � � ≤ 6 ( 1 + ˆ ¯ OPS NID n ) n i i = 1 is valid, where l is the number of elemental functions evaluated to compute the function value, ¯ n i = |X i | , and ˆ n = max 1 ≤ i ≤ n |N i | . Proof: Analyse set operations: j ≺ i X j ) ≤ 3 ¯ (1): OPS( X i = � n i (2): OPS( ∀ k ∈ X i : N k = N k ∪ X i ) ≤ ¯ n i ( 2 ˆ n + ˆ n ) = 3 ˆ n ¯ n i (3) - (6): same as (2) Details see [Walther 2007] Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 6 / 17

  15. Evaluating Hessian-matrix Products Second order adjoints computed with AD reverse forward ˙ x + ˙ y ⊤ f ′ ( x ) y ⊤ f ′′ ( x ) ˙ y ⊤ f ′ ( x ) x = ¯ ¯ x = ¯ ¯ ¯ y = f ( x ) diff diff Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 7 / 17

  16. Evaluating Hessian-matrix Products Second order adjoints computed with AD reverse forward ˙ x + ˙ y ⊤ f ′ ( x ) y ⊤ f ′′ ( x ) ˙ y ⊤ f ′ ( x ) x = ¯ ¯ x = ¯ ¯ ¯ y = f ( x ) diff diff Complexity [Griewank 2000]: TIME ( H ( x ) ˙ x ) ≤ ω soad TIME ( f ( x )) ω soad ∈ [ 7 , 10 ] with TIME ( H ( x ) S ) ≤ ω soad p TIME ( f ( x )) with ω soad ∈ [ 7 , 10 ] . Reduction possible ?? Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 7 / 17

  17. Evaluating Hessian-matrix Products Computation of H ( x ) ˙ x ∈ R n for i = 1 , . . . , n ˙ ˙ = ˙ ¯ ¯ v i − n = x i , v i − n x i , v i − n = 0 , v i − n = 0 for i = 1 , . . . , l ∂ � = ϕ i ( v j ) j ≺ i , ˙ ϕ i ( v j ) j ≺ i ˙ v i v i = v j , ∂ v j j ≺ i y = ˙ ˙ ¯ v l = ¯ y = v l , v l , y for i = l , . . . , 1 ∂ v j += ¯ ¯ v i ϕ i ( v j ) j ≺ i for j ≺ i ∂ v j ∂ 2 ˙ � ¯ v j += ¯ ϕ i ( v j ) j ≺ i ˙ v i v k for j ≺ i ∂ v j ∂ v k k ≺ i for i = 1 , . . . , n ˙ ˙ ¯ ¯ ¯ ¯ x i = v i − n , x i = v i − n Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 8 / 17

  18. Evaluating Hessian-matrix Products Computation of H ( x ) ˙ x ∈ R n for i = 1 , . . . , n ˙ ˙ = ˙ ¯ ¯ v i − n = x i , v i − n x i , v i − n = 0 , v i − n = 0 for i = 1 , . . . , l ∂ � = ϕ i ( v j ) j ≺ i , ˙ ϕ i ( v j ) j ≺ i ˙ v i v i = v j , ∂ v j j ≺ i y = ˙ ˙ ¯ v l = ¯ y = v l , v l , y for i = l , . . . , 1 ∂ v j += ¯ ¯ v i ϕ i ( v j ) j ≺ i for j ≺ i ∂ v j ∂ 2 ˙ � ¯ v j += ¯ ϕ i ( v j ) j ≺ i ˙ v i v k for j ≺ i ∂ v j ∂ v k k ≺ i for i = 1 , . . . , n ˙ ˙ ¯ ¯ ¯ ¯ x i = v i − n , x i = v i − n Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 8 / 17

  19. Evaluating Hessian-matrix Products Computation of H ( x ) ˙ X ∈ R n × p for i = 1 , . . . , n ˙ ˙ = ˙ ¯ ¯ v i − n = x i , V i − n X i , v i − n = 0 , V i − n = 0 for i = 1 , . . . , l ∂ ˙ ϕ i ( v j ) j ≺ i ˙ � v i = ϕ i ( v j ) j ≺ i , V i = V j , ∂ v j j ≺ i Y = ˙ ˙ ¯ v l = ¯ y = v l , V l , y for i = l , . . . , 1 ∂ v j += ¯ ¯ v i ϕ i ( v j ) j ≺ i for j ≺ i ∂ v j ∂ 2 ˙ ¯ ϕ i ( v j ) j ≺ i ˙ � V j += ¯ v i V k for j ≺ i ∂ v j ∂ v k k ≺ i for i = 1 , . . . , n ˙ ˙ ¯ ¯ ¯ ¯ x i = v i − n , X i = V i − n Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 8 / 17

  20. Evaluating Hessian-matrix Products Complexity ? TIME ( ˙ ¯ X ) ≤ ω soadp TIME ( f ( x )) with ω soadp ∈ [ 4 + 3 p , 4 + 6 p ] . Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 9 / 17

  21. Evaluating Hessian-matrix Products Complexity ? TIME ( ˙ ¯ X ) ≤ ω soadp TIME ( f ( x )) with ω soadp ∈ [ 4 + 3 p , 4 + 6 p ] . Reduction of run time complexity: From [ 7 p , 10 p ] to [ 4 + 3 p , 4 + 6 p ] due to reduced number of recalculations. For details see [Walther 2007] Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 9 / 17

  22. Numerical Results Implementation New drivers for AD-tool ADOL-C hess_pat(tag, n, x, P, option); Andrea Walther Computing Sparse Hessians with AD Hatfield, May 21 2007 10 / 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend