Great Reality Great Reality CS 105 Tour of the Black Holes of - PowerPoint PPT Presentation

✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ Great Reality Great Reality CS 105 “Tour of the Black Holes of Computing” There’s more to performance than asymptotic complexity Constant factors matter too! Code Optimization and Performance Code Optimization and Performance Easily see 10:1 performance range depending on how code is written Must optimize at multiple levels: � Algorithm, data representations, procedures, and loops Must understand system to optimize performance How programs are compiled and executed How to measure program performance and identify bottlenecks How to improve performance without destroying code modularity, generality, readability CS 105 – 2 – Optimizing Compilers Optimizing Compilers Limitations of Optimizing Compilers Limitations of Optimizing Compilers Operate Under Fundamental Constraint Provide efficient mapping of program to machine Must not cause any change in program behavior under any possible condition Register allocation Often prevents making optimizations that would only affect behavior under pathological Code selection and ordering conditions Eliminating minor inefficiencies Behavior that may be obvious to the programmer can be obfuscated by languages and Don’t (usually) improve asymptotic efficiency coding styles E.g., data ranges may be more limited than variable types suggest Up to programmer to select best overall algorithm Big-O savings are (often) more important than constant factors Most analysis is performed only within procedures � But constant factors also matter Whole-program analysis is too expensive in most cases (gcc does quite a bit of interprocedural analysis—but not across files) Have difficulty overcoming “optimization blockers” Most analysis is based only on static information Potential memory aliasing Compiler has difficulty anticipating run-time inputs Potential procedure side effects When in doubt, the compiler must be conservative CS 105 CS 105 – 3 – – 4 –

✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ Generally Useful Optimizations Generally Useful Optimizations Compiler-Generated Code Motion (-O1) Compiler-Generated Code Motion (-O1) Optimizations you should do regardless of processor / compiler void set_row(double *a, double *b, long j; long i, long n) long ni = n*i; Code Motion { double *rowp = a+ni; long j; for (j = 0; j < n; j++) Reduce frequency with which computation performed for (j = 0; j < n; j++) rowp[j] = b[j]; a[n*i+j] = b[j]; � If it will always produce same result } � Especially moving code out of loop � Gcc often does this for you (so check assembly) set_row: testq %rcx, %rcx # Test n for (i = 0; i < n; i++) { jle .L1 # If 0, goto done for (i = 0; i < n; i++) int ni = n*i; imulq %rcx, %rdx # ni = n*i for (j = 0; j < n; j++) leaq (%rdi,%rdx,8), %rdx # rowp = A + ni*8 for (j = 0; j < n; j++) a[n*i + j] = b[j]; movl $0, %eax # j = 0 a[ni + j] = b[j]; .L3: # loop: } movsd (%rsi,%rax,8), %xmm0 # t = b[j] movsd %xmm0, (%rdx,%rax,8) # M[A+ni*8 + j*8] = t addq $1, %rax # j++ cmpq %rcx, %rax # j:n jne .L3 # if !=, goto loop .L1: # done: rep ; ret CS 105 CS 105 – 5 – – 6 – Strength Reduction Strength Reduction Share Common Subexpressions Share Common Subexpressions Replace costly operation with simpler one Reuse portions of expressions Shift, add instead of multiply or divide Gcc will do this with –O1 and up 16*x x << 4 � /* Sum neighbors of i,j */ long inj = i*n + j; � Utility is machine-dependent up = val[(i-1)*n + j ]; up = val[inj - n]; � Depends on cost of multiply or divide instruction down = val[(i+1)*n + j ]; down = val[inj + n]; left = val[i*n + j-1]; left = val[inj - 1]; � On Intel Nehalem, integer multiply requires 3 CPU cycles right = val[i*n + j+1]; right = val[inj + 1]; Recognize sequence of products sum = up + down + left + right; sum = up + down + left + right; Again, gcc often does it �� int ni = 0; for (i = 0; i < n; i++) for (i = 0; i < n; i++) { leaq 1(%rsi), %rax # i+1 imulq %rcx, %rsi # i*n for (j = 0; j < n; j++) leaq -1(%rsi), %r8 # i-1 addq %rdx, %rsi # i*n+j for (j = 0; j < n; j++) a[n*i + j] = b[j]; a[ni + j] = b[j]; imulq %rcx, %rsi # i*n movq %rsi, %rax # i*n+j imulq %rcx, %rax # (i+1)*n subq %rcx, %rax # i*n+j-n ni += n; } imulq %rcx, %r8 # (i-1)*n leaq (%rsi,%rcx), %rcx # i*n+j+n addq %rdx, %rsi # i*n+j addq %rdx, %rax # (i+1)*n+j addq %rdx, %r8 # (i-1)*n+j CS 105 CS 105 – 7 – – 8 –

✁ ✁ ✁ ✁ ✁ ✁ ✁ Optimization Blocker #1: Procedure Calls Optimization Blocker #1: Procedure Calls Lower-Case Conversion Performance Lower-Case Conversion Performance Procedure to Convert String to Lower Case Time quadruples when double string length Quadratic performance #include <ctype.h> void lower(char *s) 250 { size_t i; 200 for (i = 0; i < strlen(s); i++) if (isupper(s[i])) CPU seconds 150 s[i] = tolower(s[i]); lower1 } 100 50 Extracted from many student programs 0 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 String length CS 105 CS 105 – 9 – – 10 – Convert Loop To Goto Form Convert Loop To Goto Form Calling Strlen Calling Strlen /* My version of strlen */ void lower(char *s) size_t strlen(const char *s) { { size_t i = 0; size_t length = 0; if (i >= strlen(s)) while (*s != '\0') { goto done; s++; loop: length++; if (isupper(s[i])) } s[i] = tolower(s[i]); return length; i++; } if (i < strlen(s)) goto loop; Strlen performance done: } Only way to determine length of string is to scan its entire length, looking for NUL character. Overall performance, string of length N strlen executed every iteration N calls to strlen, each takes O(N) time Overall O(N 2 ) performance CS 105 CS 105 – 11 – – 12 –

✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ Improving Performance Improving Performance Lower-Case Conversion Performance Lower-Case Conversion Performance void lower(char *s) Time doubles when double string length { Linear performance of lower2 size_t i; size_t len = strlen(s); for (i = 0; i < len; i++) if (s[i] >= 'A' && s[i] <= 'Z') 250 s[i] -= ('A' - 'a'); } 200 CPU seconds 150 lower1 Move call to strlen outside of loop 100 Since result does not change from one iteration to another Form of code motion 50 lower2 0 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 String length CS 105 CS 105 – 13 – – 14 – Optimization Blocker: Procedure Calls Optimization Blocker: Procedure Calls Memory Matters Memory Matters /* Sum rows is of n X n matrix a Why couldn’t compiler move strlen out of inner loop? and store in vector b */ void sum_rows1(double *a, double *b, long n) { Procedure may have side effects long i, j; Might alter global state each time called for (i = 0; i < n; i++) { Function may not return same value for given arguments b[i] = 0; for (j = 0; j < n; j++) Depends on other parts of global state b[i] += a[i*n + j]; Procedure lower could interact with strlen } size_t lencnt = 0; } Warning: size_t strlen(const char *s) { Compiler treats procedure call as a black box # sum_rows1 inner loop size_t length = 0; .L4: Weak optimizations near them movsd (%rsi,%rax,8), %xmm0 # FP load while (*s != '\0') { addsd (%rdi), %xmm0 # FP add Remedies: s++; movsd %xmm0, (%rsi,%rax,8) # FP store length++; addq $8, %rdi Use inline functions cmpq %rcx, %rdi } � GCC does this with –O1 jne .L4 lencnt += length; » But only within single file return length; Code updates b[i] on every iteration Do your own code motion } Why couldn’t compiler optimize this away? CS 105 CS 105 – 15 – – 16 –

Great Reality Great Reality CS 105 Tour of the Black Holes of - PowerPoint PPT Presentation

Great Reality Great Reality CS 105 Tour of the Black Holes of Computing Theres more to performance than asymptotic complexity Constant factors

Chapter 105 General Permits for Chapter 105 General Permits for Stream and Wetland I mpacts

Virtual Reality & Interaction Virtual Reality Virtual Reality Input Devices Input Devices

Slide 1 / 105 Slide 2 / 105 Algebra Based Physics Electric Current & DC Circuits 2015-11-30

BUILDING INFORMATION: 103-105 GREENE STREET ADDRESS: 103-105 GREENE STREET AKA 101 GREENE

Study 105 Atazanavir + [Cobicistat or Ritonavir] + TDF-FTC (Phase 2) Study 105: Study Design

EXPERIENCE VIRTUAL REALITY VIRTUAL REALITY MARKET VR will be bigger than TV Virtual

Spatial Illusions: From Mirrors to Virtual Reality David Chalmers Virtual Reality Virtual

Rockem Reality Rockem Reality | Punch Detection Method Hand Acceleration Measurement

Elected Officials 1 Public Hearing Presentation: SH 105: From 10th 2/21/2019 Street to Business

Thermal Physics Slide 2 / 105 Topics to be covered Temperature and Thermal Equilibrium Kinetic

Bureau of Waterways Engineering Regulatory Proposal Chapter 105 Chapter 105 Dam Safety and

FAA Order 8110.105 FAA Order 8110.105 Simple And Complex Electronic Hardware Approval

Thermal Physics Temperature and Thermal Equilibrium Kinetic Theory Gas Laws Internal Energy

IETF Chair and IESG Report, IETF 105 This report, sent out before IETF 105 begins, is an effort to

CS 105 Perl: Completing the Toolbox Curtis Dunham March 4, 2013 Curtis Dunham CS 105 Perl:

Welcome to EE 105! Please fi nd a seat and fi ll out a name tent with the name you go by. EE 105

Two attacks on rank metric code-based Jean-Pierre Tillich schemes: RankSign and an IBE scheme

List Decoding of Concatenated Codes: Improved Performance Estimates Alexander Barg Andrew

Lecture 4: MIPS Instruction Set No class on Tuesday Todays topic: MIPS

Making code thread-safe Kyle J. Knoepfel 25 June 2019 LArSoft Workshop 2019 So youre going

An improvement to the Gilbert-Varshamov bound for permutation codes Yiting Yang Department of

Algorithmic Complexity II Department of Computer Science University of Maryland, College Park

Compiling for Parallelism & Locality Last time SSA and its uses Today

Be Beyond Poly olyhedral Analysis of of OpenStream Programs Nun uno Mi Migu guel l Nob

Great Reality Great Reality CS 105 Tour of the Black Holes of - PowerPoint PPT Presentation

Great Reality Great Reality CS 105 Tour of the Black Holes of Computing Theres more to performance than asymptotic complexity Constant factors

Chapter 105 General Permits for Chapter 105 General Permits for Stream and Wetland I mpacts

Virtual Reality &amp; Interaction Virtual Reality Virtual Reality Input Devices Input Devices

Slide 1 / 105 Slide 2 / 105 Algebra Based Physics Electric Current &amp; DC Circuits 2015-11-30

BUILDING INFORMATION: 103-105 GREENE STREET ADDRESS: 103-105 GREENE STREET AKA 101 GREENE

Study 105 Atazanavir + [Cobicistat or Ritonavir] + TDF-FTC (Phase 2) Study 105: Study Design

EXPERIENCE VIRTUAL REALITY VIRTUAL REALITY MARKET VR will be bigger than TV Virtual

Spatial Illusions: From Mirrors to Virtual Reality David Chalmers Virtual Reality Virtual

Rockem Reality Rockem Reality | Punch Detection Method Hand Acceleration Measurement

Elected Officials 1 Public Hearing Presentation: SH 105: From 10th 2/21/2019 Street to Business

Thermal Physics Slide 2 / 105 Topics to be covered Temperature and Thermal Equilibrium Kinetic

Bureau of Waterways Engineering Regulatory Proposal Chapter 105 Chapter 105 Dam Safety and

FAA Order 8110.105 FAA Order 8110.105 Simple And Complex Electronic Hardware Approval

Thermal Physics Temperature and Thermal Equilibrium Kinetic Theory Gas Laws Internal Energy

IETF Chair and IESG Report, IETF 105 This report, sent out before IETF 105 begins, is an effort to

CS 105 Perl: Completing the Toolbox Curtis Dunham March 4, 2013 Curtis Dunham CS 105 Perl:

Welcome to EE 105! Please fi nd a seat and fi ll out a name tent with the name you go by. EE 105

Two attacks on rank metric code-based Jean-Pierre Tillich schemes: RankSign and an IBE scheme

List Decoding of Concatenated Codes: Improved Performance Estimates Alexander Barg Andrew

Lecture 4: MIPS Instruction Set No class on Tuesday Todays topic: MIPS

Making code thread-safe Kyle J. Knoepfel 25 June 2019 LArSoft Workshop 2019 So youre going

An improvement to the Gilbert-Varshamov bound for permutation codes Yiting Yang Department of

Algorithmic Complexity II Department of Computer Science University of Maryland, College Park

Compiling for Parallelism &amp; Locality Last time SSA and its uses Today

Be Beyond Poly olyhedral Analysis of of OpenStream Programs Nun uno Mi Migu guel l Nob

Virtual Reality & Interaction Virtual Reality Virtual Reality Input Devices Input Devices

Slide 1 / 105 Slide 2 / 105 Algebra Based Physics Electric Current & DC Circuits 2015-11-30

Compiling for Parallelism & Locality Last time SSA and its uses Today