Precimonious Tuning Assistant for Floating- Point Precision - PowerPoint PPT Presentation

Precimonious Tuning Assistant for Floating- Point Precision Ignacio Laguna, Harshitha Menon, Tristan Vanderbruggen Lawrence Livermore National Laboratory Michael Bentley, Ian Briggs, Ganesh Gopalakrishnan University of Utah Cindy Rubio-González University of California at Davis http://fpanalysistools.org/ 1 This work was supported by through the X-Stack program funded by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research under collaborative agreement SC0008699, NSF grant 1750983, and a gift from Oracle.

Floating-Point Precision Tuning Floating-point (FP) arithmetic used in variety of domains • Reasoning about FP programs is difficult • Large variety of numerical problems o Most programmers are not experts in FP o Common practice: use highest available precision • Disadvantage: more expensive! o Goal: automated technique to assist in tuning floating-point precision • http://fpanalysistools.org/ 2

Example: Arc Length • Consider the problem of finding the arc length of the function 2 − k sin(2 k x ) X g ( x ) = x + 0 ≤ k ≤ 5 • Summing for into n subintervals x k ∈ (0 , π ) n − 1 h 2 + ( g ( x k +1 ) − g ( x k )) 2 X p h = π /n x k = kh with and k =0 Slowdown Result Precision 1 double-double 20X 5.795776322412856 double 1X 5.795776322413031 2 mixed precision < 2X 5.795776322412856 3 http://fpanalysistools.org/ 3

Example: Arc Length long double g(long double x) { int k, n = 5; long double t1 = x; long double d1 = 1.0L; for(k = 1; k <= n; k++) { ... } return t1; } int main() { int i, n = 1000000; long double h, t1, t2, dppi; long double s1; ... for(i = 1; i <= n; i++) { t2 = g(i * h); Mixed Precision s1 = s1 + sqrt(h*h + (t2 - t1)*(t2 - t1)); t1 = t2; Program } // final answer stored in variable s1 return 0; } http://fpanalysistools.org/ 4

Precimonious “Parsimonious or Frugal with Precision” Dynamic Analysis for Floating-Point Precision Tuning Annotated with TEST SOURCE error threshold INPUTS CODE P RECIMONIOUS Less Precision Modified program in TYPE MODIFIED executable format CONFIGURATION PROGRAM Speedup http://fpanalysistools.org/ 5

Challenges for Precision Tuning ● Searching efficiently over variable types and function implementations ○ Naïve approach -> exponential time ○ 19,683 configurations for arclength program (3 9 ) Automated ○ 11 hours 5 minutes ○ Global minimum vs. Local minimum ● Evaluating type configurations o Less precision not necessarily faster o Based on runtime, energy consumption, etc. ● Determining accuracy constraints o How accurate must the final result be? Specified by the user o What error threshold to use? http://fpanalysistools.org/ 6

Precimonious Search Algorithm ● Based on Delta Debugging Algorithm (TSE’02) ● Our definition of a change ○ Lowering the precision of a floating-point variable in the program § Example: double x -> float x ● Main idea o We can do better than making a change at the time o Start by dividing the change set into two equally sized subsets o Narrow the search to the subset that satisfies the success criteria o Otherwise, increase the number of subsets ● Our success criteria o Resulting program produces an answer within the given error threshold o Resulting program is faster than original program ● Find local minimum o Lowering the precision of any one more variable violates the success criteria http://fpanalysistools.org/ 7

Searching for Type Configuration double precision ✘ single precision http://fpanalysistools.org/ 8

Searching for Type Configuration double precision ✘ ✘ ✘ single precision http://fpanalysistools.org/ 9

Searching for Type Configuration double precision ✘ ✘ ✘ single precision http://fpanalysistools.org/ 10

Searching for Type Configuration double double precision precision ✘ ✘ ✘ single precision http://fpanalysistools.org/ 11

Searching for Type Configuration double precision ✘ ✘ ✘ ✘ single precision http://fpanalysistools.org/ 12

Searching for Type Configuration double precision ✘ ✘ ✘ ✘ single precision http://fpanalysistools.org/ 13

Searching for Type Configuration double precision ✘ ✘ Proposed configuration ✘ … Failed configurations ✘ single precision http://fpanalysistools.org/ 14

Applying Type Configuration ● Automatically generate program variants ○ Reflect type configurations produced by the algorithm ● Intermediate representation o LLVM IR ● Transformation rules for each LLVM instruction o alloca, load, store, fadd, fsub, fpext, fptrunc, etc. o Changes equivalent to modifying the program at the source level o Clang plugin to provide modified source code (not discussed today) ● Able to run resulting modified program o Evaluate type configuration: accuracy & performance http://fpanalysistools.org/ 15

Limitations ● Type configurations rely on inputs tested ○ No guarantees if worse conditioned input ○ Could be combined with input generation tools (e.g., S3FP) ● Getting trapped in local minimum ● Analysis scalability o Approach does not scale well for long-running applications o Need to reduce search space and reduce number of runs o Check out our follow up work on Blame Analysis (ICSE’16) ● Analysis effectiveness o Approach does not exploit relationship among variables o Check out our follow up work on HiFPTuner (ISSTA’18) http://fpanalysistools.org/ 16

Source code available: https://github.com/corvette/precimonious Questions? http://fpanalysistools.org/ 17

Exercises http://fpanalysistools.org/ 18

Exercises with Precimonious 1. Run Precimonious on sample program funarc 2. Run Precimonious on sample program simpsons Directory Structure /Module-Precimonious |---/exercise-1 |---/exercise-2 http://fpanalysistools.org/ 19

Exercise 1 http://fpanalysistools.org/ 20

Step 1: Build Precimonious ● Open setup.sh file ● Precimonious uses LLVM and is built using scons ● Execute : ○ $ ./setup.sh Success building and running tests http://fpanalysistools.org/ 21

Step 2: Annotate Program (already done) The program we will tune: ● Execute : ○ $ cd exercise-1 ○ $ ls ● Open funarc.c file Accuracy logging & checking Performance logging http://fpanalysistools.org/ 22

Step 3: Compile Program with Clang ● Execute : ○ $ make clean ○ $ make ● Creates LLVM bitcode file and optimized executable for later use http://fpanalysistools.org/ 23

Step 4: Run Analysis on Program Sample output: ● Execute : ○ $ ./run-analysis.sh funarc Type changes are listed for each explored configuration Suggested type configuration http://fpanalysistools.org/ 24

Step 4: Run Analysis – Configuration File ● Open config_funarc.json ● Original type configuration http://fpanalysistools.org/ 25

Step 4: Run Analysis – Search File ● Open search_funarc.json ● Search space file ● To exclude functions edit exclude.txt ● To exclude variables edit exclude_local.txt ● Or you can directly edit search file prior to analysis http://fpanalysistools.org/ 26

Step 4: Run Analysis – Output Files ● Execute : ○ $ cd results ○ $ ls http://fpanalysistools.org/ 27

Step 4: Run Analysis – Output Files ● Open dd2_valid_funarc.bc.json: suggested configuration file in JSON format ● Open dd2_diff_funarc.bc.json: summary of type changes http://fpanalysistools.org/ 28

Step 5: Apply Result Configuration & Compare Performance ● Execute : ○ $ ./run-config.sh funarc ● Execute : ○ $ time ./original_funarc.out ○ $ time ./tuned_funarc.out http://fpanalysistools.org/ 29

Exercise 2 http://fpanalysistools.org/ 30

Exercise 2: Run Precimonious on simpsons program ● Open exercise-2/simpsons.c to see annotated program ● Execute : ○ cd ../exercise-2 ○ make clean ○ make ○ ./run-analysis.sh simpsons ○ ./run-config.sh simpsons ● Open results/dd2_valid_simpsons.bc.json to see configuration in JSON format ● Open results/dd2_diff_simpsons.bc.json to see difference between original program and proposed configuration http://fpanalysistools.org/ 31

Collaborators University of California, Berkeley Cuong Diep Ben James William Koushik Nguyen Nguyen Mehne Demmel Kahan Sen Lawrence Berkeley National Lab Oracle Costin David Wim David Iancu Bailey Lavrijsen Hough http://fpanalysistools.org/ 32

Source code available: https://github.com/corvette/precimonious Questions? http://fpanalysistools.org/ 33

Precimonious Tuning Assistant for Floating- Point Precision - PowerPoint PPT Presentation

Precimonious Tuning Assistant for Floating- Point Precision Ignacio Laguna, Harshitha Menon, Tristan Vanderbruggen Lawrence Livermore National Laboratory Michael Bentley, Ian Briggs, Ganesh Gopalakrishnan University of Utah Cindy

Precimonious & HiFPTuner Tuning Assistant for Floating-Point Precision Ignacio Laguna,

Retinal Dystrophies: A Machine- Learning Model Dana Schlegel, MS, MPH, CGC; Edmond Cunningham;

ChartLan Yibo Zhu, Xiuming Dou, Xiang Ma, Ziyue Chen, Xiao Xu Overview O Based on C-Like language

Software Engineering and Architecture Test Stubs and Doubles getting the world under test

CEPC Key Technology R&D Yunlong Chi Ins$tute of High Energy Physics, CAS DPF2017, 31 July

T ransformational Opportunity High-intensity, long-baseline beam aimed at deep underground lab

Program Development In your CS102 project, and in many others, the basic activities are:

Logo slide Paul, called to be an apostle of Christ Jesus by the will of God, and our brother

{ , } { , } John Winn and Tom Minka Machine Learning

WEAK KNEED For I know that nothing good dwells in me, that is, in my flesh; for the wishing is

Faithful Servant Matthew 25:14-30 Phillip Reinke, , Minister of Family Life December 31, 2017

Week 1 -Wednesday What did we talk about last time? Course overview Policies

Welcome to CSSE 220 We are excited that you are here: Start your computer Do NOT start

Pain Practice Management Increasing Value, Efficiency and Health in your Pain Practice Ashley M.

Steve Bratkovich Dave Gamstetter Project Manager, Recycling and Reuse Natural Resource Manager

ADD BoF Intro How we got here Old History Traditional DNS uses plaintext to port 53 It

The Joys and Pains of a Long Lived Codebase Jeremy D. Miller November 20 th , 2008 About Me

Lecture 6 The Acceptance Problem for TMs A TM = { <M,w> | M is a TM & w L(M) }

Chris Price OPNFV TSC Chairperson 4/5/16 OPNFV is a carrier-grade, OPNFV is a

Contracts and the Object Constraint Language Perdita Stevens School of Informatics University of

Orange A

80 2007 35 45 Innovation Fostering creativity will build core competence and

Introduction to Seismic Essentials in Groningen 7.3 Timber Structures By Prof Andr Jorissen

NER FOR NELL EXPLOITING MORPHOLOGICAL PATTERNS IN CATEGORIES Reza Bosagh Zadeh October 29, 2009

Precimonious Tuning Assistant for Floating- Point Precision - PowerPoint PPT Presentation

Precimonious Tuning Assistant for Floating- Point Precision Ignacio Laguna, Harshitha Menon, Tristan Vanderbruggen Lawrence Livermore National Laboratory Michael Bentley, Ian Briggs, Ganesh Gopalakrishnan University of Utah Cindy

Precimonious &amp; HiFPTuner Tuning Assistant for Floating-Point Precision Ignacio Laguna,

Retinal Dystrophies: A Machine- Learning Model Dana Schlegel, MS, MPH, CGC; Edmond Cunningham;

ChartLan Yibo Zhu, Xiuming Dou, Xiang Ma, Ziyue Chen, Xiao Xu Overview O Based on C-Like language

Software Engineering and Architecture Test Stubs and Doubles getting the world under test

CEPC Key Technology R&amp;D Yunlong Chi Ins$tute of High Energy Physics, CAS DPF2017, 31 July

T ransformational Opportunity High-intensity, long-baseline beam aimed at deep underground lab

Program Development In your CS102 project, and in many others, the basic activities are:

Logo slide Paul, called to be an apostle of Christ Jesus by the will of God, and our brother

{ , } { , } John Winn and Tom Minka Machine Learning

WEAK KNEED For I know that nothing good dwells in me, that is, in my flesh; for the wishing is

Faithful Servant Matthew 25:14-30 Phillip Reinke, , Minister of Family Life December 31, 2017

Week 1 -Wednesday What did we talk about last time? Course overview Policies

Welcome to CSSE 220 We are excited that you are here: Start your computer Do NOT start

Pain Practice Management Increasing Value, Efficiency and Health in your Pain Practice Ashley M.

Steve Bratkovich Dave Gamstetter Project Manager, Recycling and Reuse Natural Resource Manager

ADD BoF Intro How we got here Old History Traditional DNS uses plaintext to port 53 It

The Joys and Pains of a Long Lived Codebase Jeremy D. Miller November 20 th , 2008 About Me

Lecture 6 The Acceptance Problem for TMs A TM = { &lt;M,w&gt; | M is a TM &amp; w L(M) }

Chris Price OPNFV TSC Chairperson 4/5/16 OPNFV is a carrier-grade, OPNFV is a

Contracts and the Object Constraint Language Perdita Stevens School of Informatics University of

Orange A

80 2007 35 45 Innovation Fostering creativity will build core competence and

Introduction to Seismic Essentials in Groningen 7.3 Timber Structures By Prof Andr Jorissen

NER FOR NELL EXPLOITING MORPHOLOGICAL PATTERNS IN CATEGORIES Reza Bosagh Zadeh October 29, 2009

Precimonious & HiFPTuner Tuning Assistant for Floating-Point Precision Ignacio Laguna,

CEPC Key Technology R&D Yunlong Chi Ins$tute of High Energy Physics, CAS DPF2017, 31 July

Lecture 6 The Acceptance Problem for TMs A TM = { <M,w> | M is a TM & w L(M) }