1 Reduction in Strength Compiler-Generated Code Motion (-O1) void - PDF document

Today  Overview  Generally Useful Optimizations Program Optimization  Code motion/precomputation  Strength reduction  Sharing of common subexpressions CSci 2021: Machine Architecture and Organization  Removing unnecessary procedure calls April 6th-15th, 2020  Optimization Blockers Your instructor: Stephen McCamant  Procedure calls  Memory aliasing Based on slides originally by:  Exploiting Instruction-Level Parallelism Randy Bryant, Dave O’Hallaron  Dealing with Conditionals 1 2 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Optimizing Compilers Performance Realities  Provide efficient mapping of program to machine There’s more to performance than asymptotic complexity  register allocation  code selection and ordering (scheduling)  dead code elimination  Constant factors matter too!  eliminating minor inefficiencies  Easily see 10:1 performance range depending on how code is written  Don’t (usually) improve asymptotic efficiency  Must optimize at multiple levels:  up to programmer to select best overall algorithm  algorithm, data representations, procedures, and loops  big-O savings are (often) more important than constant factors  Must understand system to optimize performance  but constant factors also matter  How programs are compiled and executed  How modern processors + memory systems operate  Have difficulty overcoming “optimization blockers”  potential memory aliasing  How to measure program performance and identify bottlenecks  How to improve performance without destroying code modularity and  potential procedure side-effects generality 3 4 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Limitations of Optimizing Compilers Generally Useful Optimizations Operate under fundamental constraint   Must not cause any change in program behavior  Optimizations that you or the compiler should do regardless  Except, possibly when program making use of nonstandard language of processor / compiler features  Often prevents it from making optimizations that would only affect behavior under pathological conditions.  Code Motion Behavior that may be obvious to the programmer can be obfuscated by  Reduce frequency with which computation performed  languages and coding styles  If it will always produce same result  e.g., Data ranges may be more limited than variable types suggest  Especially moving code out of loop Most analysis is performed only within procedures   Whole-program analysis is too expensive in most cases void set_row(double *a, double *b, long i, long n)  Newer versions of GCC do interprocedural analysis within individual files { long j;  But, not between code in different files long j; int ni = n*i ; for (j = 0; j < n; j++) Most analysis is based only on static information for (j = 0; j < n; j++)  a[n*i+j] = b[j]; a[ni+j] = b[j]; }  Compiler has difficulty anticipating run-time inputs When in doubt, the compiler must be conservative  5 6 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 1

Reduction in Strength Compiler-Generated Code Motion (-O1) void set_row(double *a, double *b,  Replace costly operation with simpler one long j; long i, long n) long ni = n*i; {  Shift, add instead of multiply or divide double *rowp = a+ni; long j; for (j = 0; j < n; j++) for (j = 0; j < n; j++) 16*x --> x << 4 *rowp++ = b[j]; a[n*i+j] = b[j]; }  Utility machine dependent  Depends on cost of multiply or divide instruction – On Intel Nehalem, integer multiply requires 3 CPU cycles set_row:  Most valuable when it can be done within a loop testq %rcx, %rcx # Test n jle .L1 # If 0, goto done  “Induction variable” has value linear in loop execution count imulq %rcx, %rdx # ni = n*i leaq (%rdi,%rdx,8), %rdx # rowp = A + ni*8 movl $0, %eax # j = 0 int ni = 0; .L3: # loop: for (i = 0; i < n; i++) { for (i = 0; i < n; i++) { movsd (%rsi,%rax,8), %xmm0 # t = b[j] for (j = 0; j < n; j++) int ni = n*i; movsd %xmm0, (%rdx,%rax,8) # M[A+ni*8 + j*8] = t a[ni + j] = b[j]; for (j = 0; j < n; j++) addq $1, %rax # j++ a[ni + j] = b[j]; ni += n; cmpq %rcx, %rax # j:n } } jne .L3 # if !=, goto loop .L1: # done: rep ; ret 7 8 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Optimization Blocker #1: Procedure Calls Share Common Subexpressions  Reuse portions of expressions  GCC will do this with – O1  Procedure to Convert String to Lower Case void lower(char *s) { /* Sum neighbors of i,j */ long inj = i*n + j; up = val[(i-1)*n + j ]; up = val[inj - n]; size_t i; down = val[(i+1)*n + j ]; down = val[inj + n]; for (i = 0; i < strlen(s); i++) left = val[i*n + j-1]; left = val[inj - 1]; if (s[i] >= 'A' && s[i] <= 'Z') right = val[i*n + j+1]; right = val[inj + 1]; sum = up + down + left + right; sum = up + down + left + right; s[i] -= ('A' - 'a'); } 3 multiplications: i*n, (i – 1)*n, (i+1)*n 1 multiplication: i*n leaq 1(%rsi), %rax # i+1 imulq %rcx, %rsi # i*n leaq -1(%rsi), %r8 # i-1 addq %rdx, %rsi # i*n+j  Extracted from CMU 213 lab submissions, Fall, 1998 imulq %rcx, %rsi # i*n movq %rsi, %rax # i*n+j  Similar pattern seen in UMN 2018 HA1 imulq %rcx, %rax # (i+1)*n subq %rcx, %rax # i*n+j-n imulq %rcx, %r8 # (i-1)*n leaq (%rsi,%rcx), %rcx # i*n+j+n addq %rdx, %rsi # i*n+j addq %rdx, %rax # (i+1)*n+j addq %rdx, %r8 # (i-1)*n+j 9 10 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Lower Case Conversion Performance Convert Loop To Goto Form void lower(char *s) {  Time quadruples when double string length size_t i = 0;  Quadratic performance if (i >= strlen(s)) goto done; loop: 250 if (s[i] >= 'A' && s[i] <= 'Z') s[i] -= ('A' - 'a'); 200 i++; CPU seconds if (i < strlen(s)) 150 lower1 goto loop; done: 100 } 50  strlen executed every iteration 0 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 String length 11 12 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 2

1 Reduction in Strength Compiler-Generated Code Motion (-O1) void - PDF document

Today Overview Generally Useful Optimizations Program Optimization Code motion/precomputation Strength reduction Sharing of common subexpressions CSci 2021: Machine Architecture and Organization Removing unnecessary

Convex Codes and Minimal Embedding Dimensions Megan Franke UC Santa Barbara July 17, 2017

On the Failure of BD- N Robert S. Lubarsky Florida Atlantic University Constructive Mathematics:

Be Beyond Poly olyhedral Analysis of of OpenStream Programs Nun uno Mi Migu guel l Nob

Compiling for Parallelism & Locality Last time SSA and its uses Today

Lecture 21: Tree codes David Bindel 12 Apr 2010 Logistics April 19, SCAN seminar: Padma

Formal Loop Merging for Signal Transforms Franz Franchetti Yevgen S. Voronenko Markus Pschel

Mehdi Azarmi Your default shell? (Bash or Tcsh) %echo $SHELL You may change your

Troubleshooting Java Connection to MySQL About a week ago David Busby posted some Java code in

Database Application Development JDBC and SQLJ CS430/630 Lecture 14 Slides based on Database

CS314 Software Engineering Clean Code Dave Matthews Clean Code: A Handbook of Agile Software

CSE 510 Web Data Engineering Connection Pool UB CSE 510 Web Data Engineering Handling Database

Oracle PL/SQL & JDBC Basic Structure Block DECLARE /* Declarative section: variables,

Combined Static and Dynamic Automated Test Generation Sai Zhang University of Washington Joint

SQL for NoSQL and how Apache Calcite can help FOSDEM 2017 Christian Tzolov Engineer at Pivotal

Evolving Data Access Evolving Data Access Evolving Data Access Evolving Data Access

JDBC Tutorial MIE456 - Information Systems Infrastructure II Vinod Muthusamy November 4, 2004

CIS 330: Applied Database Systems Lecture 8: SQL Johannes Gehrke johannes@cs.cornell.edu

High-Level Wrapper for CloudKeeper Architecture Configuration Architecture High-Level Workflow

Domain Driven Domain Driven Design with relational Design with relational Databases and Spring

Principles of Software Construction: The Design of the Collections API Parts 1 & 2 Josh

Coroutines Update Seva Tolstopyatov @qwwdfsad October 13, 2020 Coroutines debugging Coroutines

OpenJDK & What it means for the Java Developer Dalibor Topi Java F/OSS Ambassador Sun

Generic Types in Java 4003-232-06 (Winter 2006-2007) Week 5: Generics, (Ch. 21 in Liang) Java

Finding Concurrency Bugs in Java David Hovemeyer and William Pugh July 25, 2004 David Hovemeyer

Sambuz

Useful Links

Newsletter

Mail Us

1 Reduction in Strength Compiler-Generated Code Motion (-O1) void - PDF document

Today Overview Generally Useful Optimizations Program Optimization Code motion/precomputation Strength reduction Sharing of common subexpressions CSci 2021: Machine Architecture and Organization Removing unnecessary

Convex Codes and Minimal Embedding Dimensions Megan Franke UC Santa Barbara July 17, 2017

On the Failure of BD- N Robert S. Lubarsky Florida Atlantic University Constructive Mathematics:

Be Beyond Poly olyhedral Analysis of of OpenStream Programs Nun uno Mi Migu guel l Nob

Compiling for Parallelism &amp; Locality Last time SSA and its uses Today

Lecture 21: Tree codes David Bindel 12 Apr 2010 Logistics April 19, SCAN seminar: Padma

Formal Loop Merging for Signal Transforms Franz Franchetti Yevgen S. Voronenko Markus Pschel

Mehdi Azarmi Your default shell? (Bash or Tcsh) %echo $SHELL You may change your

Troubleshooting Java Connection to MySQL About a week ago David Busby posted some Java code in

Database Application Development JDBC and SQLJ CS430/630 Lecture 14 Slides based on Database

CS314 Software Engineering Clean Code Dave Matthews Clean Code: A Handbook of Agile Software

CSE 510 Web Data Engineering Connection Pool UB CSE 510 Web Data Engineering Handling Database

Oracle PL/SQL &amp; JDBC Basic Structure Block DECLARE /* Declarative section: variables,

Combined Static and Dynamic Automated Test Generation Sai Zhang University of Washington Joint

SQL for NoSQL and how Apache Calcite can help FOSDEM 2017 Christian Tzolov Engineer at Pivotal

Evolving Data Access Evolving Data Access Evolving Data Access Evolving Data Access

JDBC Tutorial MIE456 - Information Systems Infrastructure II Vinod Muthusamy November 4, 2004

CIS 330: Applied Database Systems Lecture 8: SQL Johannes Gehrke johannes@cs.cornell.edu

High-Level Wrapper for CloudKeeper Architecture Configuration Architecture High-Level Workflow

Domain Driven Domain Driven Design with relational Design with relational Databases and Spring

Principles of Software Construction: The Design of the Collections API Parts 1 &amp; 2 Josh

Coroutines Update Seva Tolstopyatov @qwwdfsad October 13, 2020 Coroutines debugging Coroutines

OpenJDK &amp; What it means for the Java Developer Dalibor Topi Java F/OSS Ambassador Sun

Generic Types in Java 4003-232-06 (Winter 2006-2007) Week 5: Generics, (Ch. 21 in Liang) Java

Finding Concurrency Bugs in Java David Hovemeyer and William Pugh July 25, 2004 David Hovemeyer

Sambuz

Useful Links

Newsletter

Mail Us

Compiling for Parallelism & Locality Last time SSA and its uses Today

Oracle PL/SQL & JDBC Basic Structure Block DECLARE /* Declarative section: variables,

Principles of Software Construction: The Design of the Collections API Parts 1 & 2 Josh

OpenJDK & What it means for the Java Developer Dalibor Topi Java F/OSS Ambassador Sun