second order reverse mode of ad a vertex elimination
play

Second Order Reverse Mode of AD : A Vertex Elimination Perspective - PowerPoint PPT Presentation

Second Order Reverse Mode of AD : A Vertex Elimination Perspective Mu Wang, Alex Pothen and Paul Hovland Computer Science, Purdue University MCS Division, Argonne National Lab Thanks : NSF, DOE, Intel October 10, 2016 Wang et.al (Purdue


  1. Second Order Reverse Mode of AD : A Vertex Elimination Perspective Mu Wang, Alex Pothen and Paul Hovland Computer Science, Purdue University MCS Division, Argonne National Lab Thanks : NSF, DOE, Intel October 10, 2016 Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 1 / 21

  2. Outline ◮ Second order reverse mode of Automatic Differentiation ◮ Vertex elimination for evaluating the Gradient and the Hessian ◮ The correspondence between second order reverse mode and vertex elimination ◮ Discussion and board picture Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 2 / 21

  3. AD Fundamentals ◮ Automatic Differentiation (AD) is a technique that augments a computer program so that the augmented program computes the derivatives as well as the values of the function defined by the original program. ◮ Scalar Objective Function f : R n → R 1 ◮ Implemented as a computer program ◮ The evaluation is on a sequence of decomposed elemental functions For k = 1 , 2 , · · · , l v k = ϕ k ( v i ) { v i : v i ≺ v k } Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 3 / 21

  4. AD Fundamentals ◮ Automatic Differentiation (AD) is a technique that augments a computer program so that the augmented program computes the derivatives as well as the values of the function defined by the original program. ◮ Scalar Objective Function f : R n → R 1 ◮ Implemented as a computer program ◮ The evaluation is on a sequence of decomposed elemental functions For k = 1 , 2 , · · · , l v k = ϕ k ( v i ) { v i : v i ≺ v k } Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 3 / 21

  5. AD Fundamentals ◮ Automatic Differentiation (AD) is a technique that augments a computer program so that the augmented program computes the derivatives as well as the values of the function defined by the original program. ◮ Scalar Objective Function f : R n → R 1 ◮ Implemented as a computer program ◮ The evaluation is on a sequence of decomposed elemental functions For k = 1 , 2 , · · · , l v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ y = pow(pow(x*x, 2.0), x), ( x > 0 , y = x 4 x ) ◮ v 0 << = x ◮ v 1 = ϕ 1 ( v 0 ) = v 0 ∗ v 0 ◮ v 2 = ϕ 2 ( v 1 ) = pow ( v 1 , 2 . 0) ◮ v 3 = ϕ 3 ( v 2 , v 0 ) = pow ( v 2 , v 0 ) ◮ v 3 >> = y Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 3 / 21

  6. AD Fundamentals ◮ Automatic Differentiation (AD) is a technique that augments a computer program so that the augmented program computes the derivatives as well as the values of the function defined by the original program. ◮ Scalar Objective Function f : R n → R 1 ◮ Implemented as a computer program ◮ The evaluation is on a sequence of decomposed elemental functions For k = 1 , 2 , · · · , l v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ Indexing convention : ◮ Independent variables : v 1 − n , · · · , v 0 ◮ Intermediate variables : v 1 , · · · , v l − 1 ◮ Dependent variable : v l Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 3 / 21

  7. Second Order Reverse Mode : Story Line ◮ First Proposed by Gower and Mello 1 ◮ Called Edge Pushing initially ◮ From the closed form of second order derivative for composite functions ◮ Wang, Gebremedhin, and Pothen provided a second perspective by adopting live variable analysis 2 from compiler theory. ◮ Better complexity bound ◮ Correct Implementation ◮ Further improved with preaccumulation ◮ The new proof can be extended into general high orders. 1 Gower, Robert Mansel, and Margarida P. Mello. Hessian matrices via automatic differentiation. Universidade Estadual de Campinas, Instituto de Matemtica, Estatstica e Computao Cientfica, 2010. 2 Wang, Mu, Assefaw Gebremedhin, and Alex Pothen. ”Capitalizing on live variables: new algorithms for efficient Hessian computation via automatic differentiation.” Mathematical Programming Computation (2016): 1-41. Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 4 / 21

  8. Second Order Reverse Mode : Story Line ◮ First Proposed by Gower and Mello 1 ◮ Called Edge Pushing initially ◮ From the closed form of second order derivative for composite functions ◮ Wang, Gebremedhin, and Pothen provided a second perspective by adopting live variable analysis 2 from compiler theory. ◮ Better complexity bound ◮ Correct Implementation ◮ Further improved with preaccumulation ◮ The new proof can be extended into general high orders. 1 Gower, Robert Mansel, and Margarida P. Mello. Hessian matrices via automatic differentiation. Universidade Estadual de Campinas, Instituto de Matemtica, Estatstica e Computao Cientfica, 2010. 2 Wang, Mu, Assefaw Gebremedhin, and Alex Pothen. ”Capitalizing on live variables: new algorithms for efficient Hessian computation via automatic differentiation.” Mathematical Programming Computation (2016): 1-41. Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 4 / 21

  9. Second Order Reverse Mode : Story Line ◮ First Proposed by Gower and Mello 1 ◮ Called Edge Pushing initially ◮ From the closed form of second order derivative for composite functions ◮ Wang, Gebremedhin, and Pothen provided a second perspective by adopting live variable analysis 2 from compiler theory. ◮ Better complexity bound ◮ Correct Implementation ◮ Further improved with preaccumulation ◮ The new proof can be extended into general high orders. 1 Gower, Robert Mansel, and Margarida P. Mello. Hessian matrices via automatic differentiation. Universidade Estadual de Campinas, Instituto de Matemtica, Estatstica e Computao Cientfica, 2010. 2 Wang, Mu, Assefaw Gebremedhin, and Alex Pothen. ”Capitalizing on live variables: new algorithms for efficient Hessian computation via automatic differentiation.” Mathematical Programming Computation (2016): 1-41. Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 4 / 21

  10. Reverse Mode of AD ◮ Function evaluation : evaluate each elemental function for k = 1 , 2 , · · · , l v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ Reverse mode of AD : process sequence of elemental functions in reverse order for k = l , l − 1 , · · · , 1 do something with v k = ϕ k ( v i ) { v i ≺ v k } ◮ Equivalent function f k ( S k ) : a function defined by the elemental functions ϕ l , · · · , ϕ k that have been processed at the end of step k , in reverse mode ◮ f = ϕ l ◦ · · · ◦ ϕ k ◦ ϕ k − 1 ◦ · · · ◦ ϕ 1 . � �� � f k ( S k ) ◮ The independent variables of f k are denoted by S k . Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 5 / 21

  11. Reverse Mode of AD ◮ Function evaluation : evaluate each elemental function for k = 1 , 2 , · · · , l v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ Reverse mode of AD : process sequence of elemental functions in reverse order for k = l , l − 1 , · · · , 1 do something with v k = ϕ k ( v i ) { v i ≺ v k } ◮ Equivalent function f k ( S k ) : a function defined by the elemental functions ϕ l , · · · , ϕ k that have been processed at the end of step k , in reverse mode ◮ f = ϕ l ◦ · · · ◦ ϕ k ◦ ϕ k − 1 ◦ · · · ◦ ϕ 1 . � �� � f k ( S k ) ◮ The independent variables of f k are denoted by S k . Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 5 / 21

  12. Reverse Mode of AD ◮ Function evaluation : evaluate each elemental function for k = 1 , 2 , · · · , l v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ Reverse mode of AD : process sequence of elemental functions in reverse order for k = l , l − 1 , · · · , 1 do something with v k = ϕ k ( v i ) { v i ≺ v k } ◮ Equivalent function f k ( S k ) : a function defined by the elemental functions ϕ l , · · · , ϕ k that have been processed at the end of step k , in reverse mode ◮ f = ϕ l ◦ · · · ◦ ϕ k ◦ ϕ k − 1 ◦ · · · ◦ ϕ 1 . � �� � f k ( S k ) ◮ The independent variables of f k are denoted by S k . Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 5 / 21

  13. Reverse Mode of AD For k = l , l − 1 , · · · , 1 do something with v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ f k ( S k ) = f k +1 ( S k +1 \ { v k } , v k = ϕ k ( v i ) { v i : v i ≺ v k } ) Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 6 / 21

  14. Reverse Mode of AD For k = l , l − 1 , · · · , 1 do something with v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ f k ( S k ) = f k +1 ( S k +1 \ { v k } , v k = ϕ k ( v i ) { v i : v i ≺ v k } ) f k +1 ( S k +1 ) � �� � f = ϕ l ◦ · · · ◦ ϕ k +1 ◦ ϕ k ◦ ϕ k − 1 ◦ · · · ◦ ϕ 1 f = ϕ l ◦ · · · ◦ ϕ k +1 ◦ ϕ k ◦ ϕ k − 1 ◦ · · · ◦ ϕ 1 � �� � f k ( S k ) Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 6 / 21

  15. Reverse Mode of AD For k = l , l − 1 , · · · , 1 do something with v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ f k ( S k ) = f k +1 ( S k +1 \ { v k } , v k = ϕ k ( v i ) { v i : v i ≺ v k } ) ◮ First order chain rule : ∂ f k ∂ v i = ∂ f k +1 + ∂ v k ∂ f k +1 ∂ v i ∂ v i ∂ v k Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 6 / 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend