Second Order Reverse Mode of AD : A Vertex Elimination Perspective - PowerPoint PPT Presentation

Second Order Reverse Mode of AD : A Vertex Elimination Perspective Mu Wang, Alex Pothen and Paul Hovland Computer Science, Purdue University MCS Division, Argonne National Lab Thanks : NSF, DOE, Intel October 10, 2016 Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 1 / 21

Outline ◮ Second order reverse mode of Automatic Differentiation ◮ Vertex elimination for evaluating the Gradient and the Hessian ◮ The correspondence between second order reverse mode and vertex elimination ◮ Discussion and board picture Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 2 / 21

AD Fundamentals ◮ Automatic Differentiation (AD) is a technique that augments a computer program so that the augmented program computes the derivatives as well as the values of the function defined by the original program. ◮ Scalar Objective Function f : R n → R 1 ◮ Implemented as a computer program ◮ The evaluation is on a sequence of decomposed elemental functions For k = 1 , 2 , · · · , l v k = ϕ k ( v i ) { v i : v i ≺ v k } Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 3 / 21

AD Fundamentals ◮ Automatic Differentiation (AD) is a technique that augments a computer program so that the augmented program computes the derivatives as well as the values of the function defined by the original program. ◮ Scalar Objective Function f : R n → R 1 ◮ Implemented as a computer program ◮ The evaluation is on a sequence of decomposed elemental functions For k = 1 , 2 , · · · , l v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ y = pow(pow(x*x, 2.0), x), ( x > 0 , y = x 4 x ) ◮ v 0 << = x ◮ v 1 = ϕ 1 ( v 0 ) = v 0 ∗ v 0 ◮ v 2 = ϕ 2 ( v 1 ) = pow ( v 1 , 2 . 0) ◮ v 3 = ϕ 3 ( v 2 , v 0 ) = pow ( v 2 , v 0 ) ◮ v 3 >> = y Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 3 / 21

AD Fundamentals ◮ Automatic Differentiation (AD) is a technique that augments a computer program so that the augmented program computes the derivatives as well as the values of the function defined by the original program. ◮ Scalar Objective Function f : R n → R 1 ◮ Implemented as a computer program ◮ The evaluation is on a sequence of decomposed elemental functions For k = 1 , 2 , · · · , l v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ Indexing convention : ◮ Independent variables : v 1 − n , · · · , v 0 ◮ Intermediate variables : v 1 , · · · , v l − 1 ◮ Dependent variable : v l Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 3 / 21

Second Order Reverse Mode : Story Line ◮ First Proposed by Gower and Mello 1 ◮ Called Edge Pushing initially ◮ From the closed form of second order derivative for composite functions ◮ Wang, Gebremedhin, and Pothen provided a second perspective by adopting live variable analysis 2 from compiler theory. ◮ Better complexity bound ◮ Correct Implementation ◮ Further improved with preaccumulation ◮ The new proof can be extended into general high orders. 1 Gower, Robert Mansel, and Margarida P. Mello. Hessian matrices via automatic differentiation. Universidade Estadual de Campinas, Instituto de Matemtica, Estatstica e Computao Cientfica, 2010. 2 Wang, Mu, Assefaw Gebremedhin, and Alex Pothen. ”Capitalizing on live variables: new algorithms for efficient Hessian computation via automatic differentiation.” Mathematical Programming Computation (2016): 1-41. Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 4 / 21

Reverse Mode of AD ◮ Function evaluation : evaluate each elemental function for k = 1 , 2 , · · · , l v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ Reverse mode of AD : process sequence of elemental functions in reverse order for k = l , l − 1 , · · · , 1 do something with v k = ϕ k ( v i ) { v i ≺ v k } ◮ Equivalent function f k ( S k ) : a function defined by the elemental functions ϕ l , · · · , ϕ k that have been processed at the end of step k , in reverse mode ◮ f = ϕ l ◦ · · · ◦ ϕ k ◦ ϕ k − 1 ◦ · · · ◦ ϕ 1 . � �� f k ( S k ) ◮ The independent variables of f k are denoted by S k . Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 5 / 21

Reverse Mode of AD For k = l , l − 1 , · · · , 1 do something with v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ f k ( S k ) = f k +1 ( S k +1 \ { v k } , v k = ϕ k ( v i ) { v i : v i ≺ v k } ) Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 6 / 21

Reverse Mode of AD For k = l , l − 1 , · · · , 1 do something with v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ f k ( S k ) = f k +1 ( S k +1 \ { v k } , v k = ϕ k ( v i ) { v i : v i ≺ v k } ) f k +1 ( S k +1 ) � �� f = ϕ l ◦ · · · ◦ ϕ k +1 ◦ ϕ k ◦ ϕ k − 1 ◦ · · · ◦ ϕ 1 f = ϕ l ◦ · · · ◦ ϕ k +1 ◦ ϕ k ◦ ϕ k − 1 ◦ · · · ◦ ϕ 1 � �� f k ( S k ) Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 6 / 21

Reverse Mode of AD For k = l , l − 1 , · · · , 1 do something with v k = ϕ k ( v i ) { v i : v i ≺ v k } ◮ f k ( S k ) = f k +1 ( S k +1 \ { v k } , v k = ϕ k ( v i ) { v i : v i ≺ v k } ) ◮ First order chain rule : ∂ f k ∂ v i = ∂ f k +1 + ∂ v k ∂ f k +1 ∂ v i ∂ v i ∂ v k Wang et.al (Purdue University) Second Order Reverse AD October 10, 2016 6 / 21

Second Order Reverse Mode of AD : A Vertex Elimination Perspective - PowerPoint PPT Presentation

Second Order Reverse Mode of AD : A Vertex Elimination Perspective Mu Wang, Alex Pothen and Paul Hovland Computer Science, Purdue University MCS Division, Argonne National Lab Thanks : NSF, DOE, Intel October 10, 2016 Wang et.al (Purdue

11.4 The Pricing Method: Vertex Cover Weighted Vertex Cover Weighted vertex cover. Given a

Graphs Vertex Cover Vertex Cover A vertex cover of a graph G=(V ,E) is a set C of vertices such

Control Points Switch Office Information Server Fixed Network DB Base Station Vechicle

Second Order Cut-Elimination Mikheil Rukhaia Supervisor: Prof. Alexander Leitsch Introduction

Polygon decomposition into monotone polygons Vertex types START vertex (2 edges on the right and

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Control of switch-mode converters Current Programmed Mode control CPM Mor M. Peretz, Switch-Mode

Cut Elimination and Second-Order Quantifier Elimination Alessandra Palmigiano 7 December 2017

Cuts and Connectivity Cuts and Connectivity CSE, IIT KGP Vertex Cut and Connectivity Vertex Cut

Vertex reconstruction Vertex reconstruction in large liquid scintillator detectors in large

Reverse Osmosis Reverse Osmosis Background to Market and to Market and Background Technology

Reverse Ordering in Dynamical Reverse Ordering in Dynamical Two- -Dimensional Hopper Flow

NATIVE MODE PROGRAMMING Fiona Reid Overview What is native mode? What codes are suitable

Dead Code Elimination & Dead code elimination Constant Propagation Conceptually similar

Higher-Order Reverse Topology James Hunter (hunter@math.wisc.edu) University of Wisconsin Logic

FY1 0 Second Quarter FY1 0 Second Quarter FY1 0 Second Quarter FY1 0 Second Quarter

Implicit Differentiation Michael Freeze MAT 151 UNC Wilmington Summer 2013 1 / 14 Section 6.4

CS 285 Instructor: Sergey Levine UC Berkeley The goal of reinforcement learning well come

Glasnost: Enabling End Users to Detect Traffic Differentiation Marcel Dischinger , Massimiliano

Numerical differentiation: Code numerical_diff.m function [approx deriv,error] = ... 1

1 Calculus with Vectors and Matrices Here are two rules that will help us out with the

Auto-Differentiation, Computation Graphs, and Evaluation Traces Instructor: Sham Kakade 1

Shape derivative of geometric constraints without integration along rays Florian Feppon Gr

Continuity of f 1 . We can derive properties of the graph of y = f 1 ( x ) from properties

Second Order Reverse Mode of AD : A Vertex Elimination Perspective - PowerPoint PPT Presentation

Second Order Reverse Mode of AD : A Vertex Elimination Perspective Mu Wang, Alex Pothen and Paul Hovland Computer Science, Purdue University MCS Division, Argonne National Lab Thanks : NSF, DOE, Intel October 10, 2016 Wang et.al (Purdue

11.4 The Pricing Method: Vertex Cover Weighted Vertex Cover Weighted vertex cover. Given a

Graphs Vertex Cover Vertex Cover A vertex cover of a graph G=(V ,E) is a set C of vertices such

Control Points Switch Office Information Server Fixed Network DB Base Station Vechicle

Second Order Cut-Elimination Mikheil Rukhaia Supervisor: Prof. Alexander Leitsch Introduction

Polygon decomposition into monotone polygons Vertex types START vertex (2 edges on the right and

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Control of switch-mode converters Current Programmed Mode control CPM Mor M. Peretz, Switch-Mode

Cut Elimination and Second-Order Quantifier Elimination Alessandra Palmigiano 7 December 2017

Cuts and Connectivity Cuts and Connectivity CSE, IIT KGP Vertex Cut and Connectivity Vertex Cut

Vertex reconstruction Vertex reconstruction in large liquid scintillator detectors in large

Reverse Osmosis Reverse Osmosis Background to Market and to Market and Background Technology

Reverse Ordering in Dynamical Reverse Ordering in Dynamical Two- -Dimensional Hopper Flow

NATIVE MODE PROGRAMMING Fiona Reid Overview What is native mode? What codes are suitable

Dead Code Elimination &amp; Dead code elimination Constant Propagation Conceptually similar

Higher-Order Reverse Topology James Hunter (hunter@math.wisc.edu) University of Wisconsin Logic

FY1 0 Second Quarter FY1 0 Second Quarter FY1 0 Second Quarter FY1 0 Second Quarter

Implicit Differentiation Michael Freeze MAT 151 UNC Wilmington Summer 2013 1 / 14 Section 6.4

CS 285 Instructor: Sergey Levine UC Berkeley The goal of reinforcement learning well come

Glasnost: Enabling End Users to Detect Traffic Differentiation Marcel Dischinger , Massimiliano

Numerical differentiation: Code numerical_diff.m function [approx deriv,error] = ... 1

1 Calculus with Vectors and Matrices Here are two rules that will help us out with the

Auto-Differentiation, Computation Graphs, and Evaluation Traces Instructor: Sham Kakade 1

Shape derivative of geometric constraints without integration along rays Florian Feppon Gr

Continuity of f 1 . We can derive properties of the graph of y = f 1 ( x ) from properties

Dead Code Elimination & Dead code elimination Constant Propagation Conceptually similar