certification of derivatives computed by automatic
play

Certification of Derivatives Computed by Automatic Differentiation - PowerPoint PPT Presentation

. Certification of Derivatives Computed by Automatic Differentiation Mauricio Araya Polo & Laurent Hasco et Project TROPICS WSEAS, Canc un, M exico, May 13, 2005. 1 . Plan Introduction (Background) Automatic


  1. . Certification of Derivatives Computed by Automatic Differentiation Mauricio Araya Polo & Laurent Hasco¨ et Project TROPICS WSEAS, Canc´ un, M´ exico, May 13, 2005. 1

  2. . Plan • Introduction (Background) • Automatic Differentiation – Direct Mode – Reverse Mode The Problem • – Example • Our Approach – Description – Numerical Result • Conclusions • Future Work 2

  3. . Introduction (Background) • Automatic Differentiation (A.D.) : Given program that evaluates function F, builds new program that evaluates derivatives of F. Scientific Applications : Derivatives are useful in optimization, • sensitivity analysis and inverse problems. • Non-differentiability : Introduced in programs by conditional statements (tests). May produced wrong derivatives. • Lack of Validation : A.D. models (neither A.D Tools) include verification of the differentiability of the functions. Novel A.D. model with Validation : We evaluate interval around • input data where no non-differentiability problem arises, this information propagated through conditional statements. 3

  4. . Automatic Differentiation Programs Structure: set of concatenated sequence of instructions I i P = I 1 ; I 2 ; ... ; I p − 1 ; I p but control flow (flowgraph): depending on the inputs the exam- I ple program might be: 3 P = I 1; T 1; I 2; I 4 I T I or 1 1 4 P = I 1; T 1; I 3; I 4 instruction T 1 represents the con- I 2 ditional statement (test). Mathematical Models: composition of elementary functions f i Y = F ( X ) = f p ◦ f p − 1 ◦ ... ◦ f 2 ◦ f 1 Program P evaluates the model F, for every function f i we have a computational representation I i , in right order. 4

  5. . Automatic Differentiation (2) Direct Mode: directional derivatives. Y ′ = F ′ ( X ) · dX = f ′ p ( x p − 1 ) · f ′ p − 1 ( x p − 2 ) · ... · f ′ 1 ( x 0 ) · dX with x i = f i ◦ ... ◦ f 1 , and f ′ i () jacobians. then the new program P’, P ′ = I ′ 1 ; I 1 ; I ′ 2 ; I 2 ; ... ; I ′ p − 1 ; I p − 1 ; I ′ p with I ′ i corresponding to f ′ i () depending on the inputs the diffe- flowgraph again: rentiated example program might be: ’ 3 ; I I 3 P = I ′ 1; I 1; T 1; I ′ 2; I 2; I ′ 4; I 4 or ’ ; ’ ; I I T I I P = I ′ 1; I 1; T 1; I ′ 3; I 3; I ′ 4; I 4 1 1 1 4 4 the differentiated example pro- ’ ; I I gram retains the control flow struc- 2 2 ture of the original program. 5

  6. . Automatic Differentiation (3) Original Code Direct Differentiated Code subroutine sub1(x,y,o1) subroutine sub1 d(x, xd, y, yd, o1, o1d) I 1 x = y ∗ x I ′ I 2 o 1 = x ∗ x + y ∗ y xd = yd ∗ x + y ∗ xd 1 I 1 x = y ∗ x if ( o 1 > 190 ) then T 1 I ′ I 3 o 1 = − o 1 ∗ o 1 / 2 o 1 d = 2 ∗ x ∗ xd + 2 ∗ y ∗ yd 2 else I 2 o 1 = x ∗ x + y ∗ y I 4 o 1 = o 1 ∗ o 1 ∗ 20 endif if ( o 1 > 190 ) then T 1 I ′ end o 1 d = − ( o 1 d ∗ o 1) 3 I 3 o 1 = − ( o 1 ∗ o 1 / 2) else I ′ o 1 d = 40 ∗ o 1 d ∗ o 1 4 I 4 o 1 = o 1 ∗ o 1 ∗ 20 endif end Table 1: Example of Direct Mode of AD. 6

  7. . Automatic Differentiation (3) Reverse Mode: adjoints, gradients. ′ ∗ ( X ) · ¯ ′ ∗ ′ ∗ ′ ∗ ¯ p ( x p − 1 ) · ¯ X = F Y = f 1 ( x 0 ) · f 2 ( x 1 ) · ... · f Y then the new program ¯ P , P = − → P ; ← − P = I 1 ; I 2 ; . . . ; I p − 1 ; I p ; ¯ ¯ I p ; ¯ I p − 1 ; . . . ; ¯ I 2 ; ¯ ¯ I 1 or P ′ t with ¯ I i corresponding to f i () . The reverse sweep ( ← − Remark: P ) eventually needs some values of the ( − → forward sweep P ), but and x 0 others xi might be modified by the forward sweep, thus we have to store them, which for some pro- grams leads to important memory consumption. 7

  8. . Automatic Differentiation (4) Original Code Reverse Differentiated Code subroutine sub1(x,y,o1) subroutine sub1 b(x, xb, y, yb, o1, o1b) I 1 x = y ∗ x PUSH(x) I 2 o 1 = x ∗ x + y ∗ y I 1 x = y ∗ x if ( o 1 > 190 ) then T 1 I 2 o 1 = x ∗ x + y ∗ y I 3 o 1 = − o 1 ∗ o 1 / 2 else if ( o 1 > 190) then T 1 ← − I 4 o 1 = o 1 ∗ o 1 ∗ 20 I 3 o 1 b = − ( o 1 ∗ o 1 b ) endif else ← − end I 4 o 1 b = 40 ∗ o 1 ∗ o 1 b endif 8 xb = xb + 2 ∗ x ∗ o 1 b ← − < I 2 yb = yb + 2 ∗ y ∗ o 1 b : POP(x) 8 yb = yb + x ∗ xb ← − < I 1 xb = y ∗ xb : end Table 2: Example of Reverse Mode of AD. 8

  9. . The Problem Motivation: The question of derivatives being valid only in a certain domain is a crucial problem of AD. If derivatives returned by AD are used outside their domain of validity, this can result in errors that are very hard to detect. Description: Programs have control flow structure, including conditional • statements (tests). Some of the test are introduced by intrinsic functions like abs, min, max, etc. • Differentiated program keeps the control flow structure of given program. Sometimes the derivatives depends in the control flow structure. • When some input is too close to a switch of the control flow, the resulting derivative may be very different or wrong, to the point of be useless. 9

  10. . The Problem (2) Evaluation of program P’, xd,yd = 1,1. Evaluation of program P. o1 o1 1.5e+06 1e+06 0 -1e+06 1e+06 -2e+06 -3e+06 -4e+06 -5e+06 -6e+06 500000 -7e+06 -8e+06 -9e+06 o1d 0 0 1 -500000 2 3 4 x 5 -1e+06 6 7 8 7 5 6 4 8 0 3 1 2 y -1.5e+06 0 1 2 3 4 5 6 x Plot of left shows the evaluation of program example with discontinuity problem. Plot of right shows the evaluation of differentiated program example with input space direction (1,1). (x=3.64,o1d=1512117.125) and (x=3.65,o1d=-38513.449) !!! 10

  11. . The Problem (3) Main cases of problems introduced by conditional statements. (from B. Kearfott paper) 11

  12. . Our Approach • every test (t) is analyzed, under small change in the input the test must remain in the same “side” of the inequality. variables used by instructions for example if t i ≥ 0 then ∆ t i + t i ≥ 0 (1) needed to built the current test • the variation of t ( ∆ t i ) have to be expressed in terms of the intermediates variables ( B i ). ∆ t i = J ( T i ) · ∆ B i • and the variation of the intermediates variables is ∆ B i = J ( B i ; . . . ; B 0 ) · ∆ X = J ( B i ) · ... · J ( B 0 ) · ∆ X where ∆ X represents the variation of the inputs values. • re-composing the expression ∆ t i + t i ≥ 0 from (1), (2) < J ( T i ) · J ( B i ) · ... · J ( B 0 ) · ∆ X | e j > ≥ − < t i | e j > 12

  13. . Our Approach (2) • we want isolate ∆ X , a good way to do that is transpose the jacobians in (2) < ∆ X · J ( B 0 ) ∗ · ... · J ( B i ) ∗ · J ( T i ) ∗ · e j > ≥ − < t i | e j > (3) • we can use the reverse mode of AD to compute J ( B 0 ) ∗ · ... · J ( B i ) ∗ · J ( T i ) ∗ · e j in (3). • unfortunately, in real situations the number of tests is so large that the computation of this approach is not practical. Solutions: • – combine constraints to propagate just one. half-spaces. – reduce the size of the problem. less tests or less inputs, or both. 13

  14. . Our Approach (3) • we analyze one test ( t 0 ), under small change in the input the test must remain in the same “side” of the inequality. if t 0 ≥ 0 then ∆ t 0 + t 0 ≥ 0 (4) • the variation of t ( ∆ t 0 ) have to be expressed in terms of the intermediates variables ( B 0 ). ∆ t 0 = J ( T 0 ) · ∆ B 0 and the variation of the intermediates variables is • ∆ B 0 = J ( B 0 ) · β · ˙ X where β · ˙ X represents the variation of the inputs values. β ˙ the magnitude and X the direction of the variation. • re-composing the expression (4), β · J ( T 0 ) · J ( B 0 ) · ˙ X ≥ − t 0 14

  15. . Our Approach (4) the following expression give us the magnitude of change of the input values, without change the sign of the test. − t 0 (5) β ≥ ˙ J ( T 0) · J ( B 0) · X to compute expression (5) we introduced a function call that propagate the effect of every test trough the program, resulting in a interval of validity, as follows: Direct Differentiated Code Direct Differentiated Code with Validation subroutine sub1 d(x,xd,y,yd,o1,o1d) subroutine sub1 dva(x,xd,y,yd,o1,o1d) I ′ I ′ xd = yd ∗ x + y ∗ xd xd = yd ∗ x + y ∗ xd 1 1 I 1 x = y ∗ x I 1 x = y ∗ x I ′ I ′ o 1 d = 2 ∗ x ∗ xd + 2 ∗ y ∗ yd o 1 d = 2 ∗ x ∗ xd + 2 ∗ y ∗ yd 2 2 I 2 o 1 = x ∗ x + y ∗ y I 2 o 1 = x ∗ x + y ∗ y if ( o 1 > 190 ) then CALL VALIDITY TEST(o1 - 190, o1d) T 1 V 1 I ′ if ( o 1 > 190 ) then o 1 d = − ( o 1 d ∗ o 1) T 1 3 I ′ o 1 d = − ( o 1 d ∗ o 1) I 3 o 1 = − ( o 1 ∗ o 1 / 2) 3 else I 3 o 1 = − ( o 1 ∗ o 1 / 2) I ′ else o 1 d = 40 ∗ o 1 d ∗ o 1 4 I ′ I 4 o 1 = o 1 ∗ o 1 ∗ 20 o 1 d = 40 ∗ o 1 d ∗ o 1 4 endif I 4 o 1 = o 1 ∗ o 1 ∗ 20 end endif end 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend