SPolly: Speculative Optimizations in the Polyhedral Model Johannes - - PowerPoint PPT Presentation

spolly speculative optimizations in the polyhedral model
SMART_READER_LITE
LIVE PREVIEW

SPolly: Speculative Optimizations in the Polyhedral Model Johannes - - PowerPoint PPT Presentation

SPolly: Speculative Optimizations in the Polyhedral Model Johannes Doerfert, Clemens Hammacher, Kevin Streit, Sebastian Hack Saarland University, Germany January 21, 2013 The Problem int A[256][256], B[256][256], C[256][256]; void matmul() {


slide-1
SLIDE 1

SPolly: Speculative Optimizations in the Polyhedral Model

Johannes Doerfert, Clemens Hammacher, Kevin Streit, Sebastian Hack

Saarland University, Germany

January 21, 2013

slide-2
SLIDE 2

The Problem

int A[256][256], B[256][256], C[256][256]; void matmul() { for (int i=0; i<256; i++) for (int j=0; j<256; j++) for (int k=0; k<256; k++) C[i][j] += A[k][i] * B[j][k]; }

2/16

slide-3
SLIDE 3

The Problem

int A[65536], B[65536], C[65536]; void matmul() { for (int i=0; i<256; i++) for (int j=0; j<256; j++) for (int k=0; k<256; k++) C[i*256+j] += A[k*256+i] * B[j*256+k]; }

2/16

slide-4
SLIDE 4

The Problem

void matmul(int* A, int* B, int* C) { for (int i=0; i<256; i++) for (int j=0; j<256; j++) for (int k=0; k<256; k++) C[i*256+j] += A[k*256+i] * B[j*256+k]; }

2/16

slide-5
SLIDE 5

The Problem

void matmul(int* A, int* B, int* C, int N) { for (int i=0; i<N; i++) for (int j=0; j<N; j++) for (int k=0; k<N; k++) C[i*N+j] += A[k*N+i] * B[j*N+k]; }

2/16

slide-6
SLIDE 6

How urgent is this problem?

85.2% 14.8%

Valid Regions Invalid Regions

3/16

slide-7
SLIDE 7

How urgent is this problem?

85.2% 14.8%

Valid Regions Invalid Regions

3/16

slide-8
SLIDE 8

How urgent is this problem?

Setup

◮ Polly

◮ state-of-the-art polyhedral optimizer integrated in LLVM

◮ SPEC 2000

◮ industry standard benchmark suite ◮ nine real world programs:

ammp, art, bzip2, crafty, equake, gzip, mcf, mesa, twolf

4/16

slide-9
SLIDE 9

How urgent is this problem?

Setup

◮ Polly

◮ state-of-the-art polyhedral optimizer integrated in LLVM

◮ SPEC 2000

◮ industry standard benchmark suite ◮ nine real world programs:

ammp, art, bzip2, crafty, equake, gzip, mcf, mesa, twolf

◮ Research questions

◮ number of Static Control Parts

(SCoPs := code regions amenable to polyhedral optimizations)

◮ impact of individual rejection causes

4/16

slide-10
SLIDE 10

How urgent is this problem?

SCoP rejection causes found in 1862 regions

i Rejection cause A B C Non-affine expressions 1 Aliasing 2 Non-affine loop bounds 3 Function call 4 Non-canonical indvars 5 Complex CFG 6 Unsigned comparison 7 Others

5/16

slide-11
SLIDE 11

How urgent is this problem?

SCoP rejection causes found in 1862 regions

i Rejection cause A B C Non-affine expressions 1 Aliasing 2 Non-affine loop bounds 3 Function call 4 Non-canonical indvars 5 Complex CFG 6 Unsigned comparison 7 Others for (i = 0; i < N; i++) A[ ] += B[i]; i*N

5/16

slide-12
SLIDE 12

How urgent is this problem?

SCoP rejection causes found in 1862 regions

i Rejection cause A B C Non-affine expressions 1 Aliasing 2 Non-affine loop bounds 3 Function call 4 Non-canonical indvars 5 Complex CFG 6 Unsigned comparison 7 Others void f( , ){ A[0] = B[5]; int* A int* B

5/16

slide-13
SLIDE 13

How urgent is this problem?

SCoP rejection causes found in 1862 regions

i Rejection cause A B C Non-affine expressions 1 Aliasing 2 Non-affine loop bounds 3 Function call 4 Non-canonical indvars 5 Complex CFG 6 Unsigned comparison 7 Others for (i = 0; i < ; i++) A[i] += B[i]; N*M

5/16

slide-14
SLIDE 14

How urgent is this problem?

SCoP rejection causes found in 1862 regions

i Rejection cause A B C Non-affine expressions 1 Aliasing 2 Non-affine loop bounds 3 Function call 4 Non-canonical indvars 5 Complex CFG 6 Unsigned comparison 7 Others for (i = 0; i < N; i++) A[i] += ; g(i)

5/16

slide-15
SLIDE 15

How urgent is this problem?

SCoP rejection causes found in 1862 regions

i Rejection cause A B C Non-affine expressions 1 Aliasing 2 Non-affine loop bounds 3 Function call 4 Non-canonical indvars 5 Complex CFG 6 Unsigned comparison 7 Others for (i=0; i<N; ) A[i] += A[i+1]; i+=i/2+1

5/16

slide-16
SLIDE 16

How urgent is this problem?

SCoP rejection causes found in 1862 regions

i Rejection cause A B C Non-affine expressions 1 Aliasing 2 Non-affine loop bounds 3 Function call 4 Non-canonical indvars 5 Complex CFG 6 Unsigned comparison 7 Others

5/16

slide-17
SLIDE 17

How urgent is this problem?

SCoP rejection causes found in 1862 regions

i Rejection cause A B C Non-affine expressions 1230 (66%) 1 Aliasing 1093 (59%) 2 Non-affine loop bounds 840 (45%) 3 Function call 532 (29%) 4 Non-canonical indvars 384 (21%) 5 Complex CFG 253 (14%) 6 Unsigned comparison 199 (11%) 7 Others 1 ( 0%) A #regions where condition i is violated.

5/16

slide-18
SLIDE 18

How urgent is this problem?

SCoP rejection causes found in 1862 regions

i Rejection cause A B C Non-affine expressions 1230 (66%) 1 Aliasing 1093 (59%) 2 Non-affine loop bounds 840 (45%) 3 Function call 532 (29%) 4 Non-canonical indvars 384 (21%) 5 Complex CFG 253 (14%) 6 Unsigned comparison 199 (11%) 7 Others 1 ( 0%) A #regions where condition i is violated.

5/16

slide-19
SLIDE 19

How urgent is this problem?

SCoP rejection causes found in 1862 regions

i Rejection cause A B C Non-affine expressions 1230 (66%) 84 ( 5%) 1 Aliasing 1093 (59%) 207 (11%) 2 Non-affine loop bounds 840 (45%) 6 ( 0%) 3 Function call 532 (29%) 72 ( 4%) 4 Non-canonical indvars 384 (21%) 0 ( 0%) 5 Complex CFG 253 (14%) 31 ( 2%) 6 Unsigned comparison 199 (11%) 0 ( 0%) 7 Others 1 ( 0%) 0 ( 0%) A #regions where condition i is violated. B #regions where only condition i is violated.

5/16

slide-20
SLIDE 20

How urgent is this problem?

SCoP rejection causes found in 1862 regions

i Rejection cause A B C Non-affine expressions 1230 (66%) 84 ( 5%) 1 Aliasing 1093 (59%) 207 (11%) 2 Non-affine loop bounds 840 (45%) 6 ( 0%) 3 Function call 532 (29%) 72 ( 4%) 4 Non-canonical indvars 384 (21%) 0 ( 0%) 5 Complex CFG 253 (14%) 31 ( 2%) 6 Unsigned comparison 199 (11%) 0 ( 0%) 7 Others 1 ( 0%) 0 ( 0%) A #regions where condition i is violated. B #regions where only condition i is violated.

5/16

slide-21
SLIDE 21

How urgent is this problem?

SCoP rejection causes found in 1862 regions

i Rejection cause A B C Non-affine expressions 1230 (66%) 84 ( 5%) 84 ( 5%) 1 Aliasing 1093 (59%) 207 (11%) 510 (27%) 2 Non-affine loop bounds 840 (45%) 6 ( 0%) 660 (35%) 3 Function call 532 (29%) 72 ( 4%) 928 (50%) 4 Non-canonical indvars 384 (21%) 0 ( 0%) 1174 (63%) 5 Complex CFG 253 (14%) 31 ( 2%) 1387 (74%) 6 Unsigned comparison 199 (11%) 0 ( 0%) 1586 (85%) 7 Others 1 ( 0%) 0 ( 0%) 1587 (85%) A #regions where condition i is violated. B #regions where only condition i is violated. C #regions where only conditions 0 to i are violated.

5/16

slide-22
SLIDE 22

How urgent is this problem?

SCoP rejection causes found in 1862 regions

i Rejection cause A B C Non-affine expressions 1230 (66%) 84 ( 5%) 84 ( 5%) 1 Aliasing 1093 (59%) 207 (11%) 510 (27%) 2 Non-affine loop bounds 840 (45%) 6 ( 0%) 660 (35%) 3 Function call 532 (29%) 72 ( 4%) 928 (50%) 4 Non-canonical indvars 384 (21%) 0 ( 0%) 1174 (63%) 5 Complex CFG 253 (14%) 31 ( 2%) 1387 (74%) 6 Unsigned comparison 199 (11%) 0 ( 0%) 1586 (85%) 7 Others 1 ( 0%) 0 ( 0%) 1587 (85%) A #regions where condition i is violated. B #regions where only condition i is violated. C #regions where only conditions 0 to i are violated.

5/16

slide-23
SLIDE 23

How urgent is this problem?

SCoP rejection causes found in 1862 regions

i Rejection cause A B C Non-affine expressions 1230 (66%) 84 ( 5%) 84 ( 5%) 1 Aliasing 1093 (59%) 207 (11%) 510 (27%) 2 Non-affine loop bounds 840 (45%) 6 ( 0%) 660 (35%) 3 Function call 532 (29%) 72 ( 4%) 928 (50%) 4 Non-canonical indvars 384 (21%) 0 ( 0%) 1174 (63%) 5 Complex CFG 253 (14%) 31 ( 2%) 1387 (74%) 6 Unsigned comparison 199 (11%) 0 ( 0%) 1586 (85%) 7 Others 1 ( 0%) 0 ( 0%) 1587 (85%) A #regions where condition i is violated. B #regions where only condition i is violated. C #regions where only conditions 0 to i are violated.

5/16

slide-24
SLIDE 24

How urgent is this problem?

Conclusion

49.8% 35.4% 14.8%

Valid Regions Targeted Regions Invalid Regions

6/16

slide-25
SLIDE 25

How to allow more polyhedral optimizations?

Example

void f(int* A, int* B) { for (int i=0; i < 2048; i++) A[i] += B[i]; }

7/16

slide-26
SLIDE 26

How to allow more polyhedral optimizations?

Example

  • 1. speculatively assume properties (e.g., constant parameters)

void f(int* A, int* B) { for (int i=0; i < 2048; i++) A[i] += B[i]; }

7/16

slide-27
SLIDE 27

How to allow more polyhedral optimizations?

Example

  • 1. speculatively assume properties (e.g., constant parameters)
  • 2. derive specialized versions

void f_spec(int* restrict A, int* restrict B) { for (int i=0; i < 2048; i++) A[i] += B[i]; }

7/16

slide-28
SLIDE 28

How to allow more polyhedral optimizations?

Example

  • 1. speculatively assume properties (e.g., constant parameters)
  • 2. derive specialized versions
  • 3. apply polyhedral optimizations

void f_opt(int* restrict A, int* restrict B) { parfor (int j=0; j < 2048; j+=32) for (int i=j; i < 32 + j; i++) A[i] += B[i]; }

7/16

slide-29
SLIDE 29

How to allow more polyhedral optimizations?

Example

  • 1. speculatively assume properties (e.g., constant parameters)
  • 2. derive specialized versions
  • 3. apply polyhedral optimizations
  • 4. add runtime dispatcher

void f_dispatcher(int* A, int* B) { if (overlap(A, B, 2048)) f(A, B); else f_opt(A, B); }

7/16

slide-30
SLIDE 30

How to allow more polyhedral optimizations?

Implementation 8/16

slide-31
SLIDE 31

How to allow more polyhedral optimizations?

Implementation

Valid SCoPs Polly SPolly LLVM-IR SCoP Detection Polyhedral Optimizations Code Generation

Program

8/16

slide-32
SLIDE 32

How to allow more polyhedral optimizations?

Implementation

Valid SCoPs Invalid SCoPs Polly SPolly LLVM-IR SCoP Detection sSCoP Detection Polyhedral Optimizations Code Generation

Program

8/16

slide-33
SLIDE 33

How to allow more polyhedral optimizations?

Implementation

Valid SCoPs Invalid SCoPs Valid sSCoPs Polly SPolly LLVM-IR SCoP Detection sSCoP Detection Polyhedral Optimizations Region Speculation Code Generation

Program

8/16

slide-34
SLIDE 34

How to allow more polyhedral optimizations?

Implementation

Valid SCoPs Invalid SCoPs Valid sSCoPs Polly SPolly LLVM-IR SCoP Detection sSCoP Detection Polyhedral Optimizations Region Speculation Code Generation

Program

Specialized Versions

8/16

slide-35
SLIDE 35

How to allow more polyhedral optimizations?

Implementation

Valid SCoPs Invalid SCoPs Valid sSCoPs Polly SPolly LLVM-IR SCoP Detection sSCoP Detection Polyhedral Optimizations Region Speculation Code Generation

Program

Specialized Versions Runtime Dispatcher

8/16

slide-36
SLIDE 36

How to allow more polyhedral optimizations?

Implementation

Valid SCoPs Invalid SCoPs Valid sSCoPs Polly SPolly LLVM-IR SCoP Detection sSCoP Detection Polyhedral Optimizations Region Speculation Code Generation

Program

Specialized Versions Runtime Dispatcher Profiling Versions

8/16

slide-37
SLIDE 37

How to allow more polyhedral optimizations?

Implementation

Valid SCoPs Invalid SCoPs Valid sSCoPs Polly SPolly LLVM-IR SCoP Detection sSCoP Detection Polyhedral Optimizations Region Speculation Code Generation

Program

Specialized Versions Runtime Dispatcher Profiling Versions JIT-Environment

8/16

slide-38
SLIDE 38

How to allow more polyhedral optimizations?

Implementation

Valid SCoPs Invalid SCoPs Valid sSCoPs Polly SPolly LLVM-IR SCoP Detection sSCoP Detection Polyhedral Optimizations Region Speculation Code Generation

Program

Specialized Versions Runtime Dispatcher Profiling Versions Profiling Information JIT-Environment

8/16

slide-39
SLIDE 39

How to allow more polyhedral optimizations?

Implementation

Valid SCoPs Invalid SCoPs Valid sSCoPs Polly SPolly LLVM-IR SCoP Detection sSCoP Detection Polyhedral Optimizations Region Speculation Code Generation

Program

Specialized Versions Runtime Dispatcher Profiling Versions Profiling Information JIT-Environment

8/16

slide-40
SLIDE 40

How to allow more polyhedral optimizations?

Implementation

Valid SCoPs Invalid SCoPs Valid sSCoPs Polly SPolly LLVM-IR SCoP Detection sSCoP Detection Polyhedral Optimizations Region Speculation Code Generation

Program

Specialized Versions Runtime Dispatcher Profiling Versions JIT-Environment Profiling Information

8/16

slide-41
SLIDE 41

Does it work?

9/16

slide-42
SLIDE 42

Does it work?

Almost.

9/16

slide-43
SLIDE 43

Does it work?

Applicability on SPEC 2000

65.9% 19.3% 14.8%

Valid SCoPs Additional sSCoPs Invalid sSCoPs

10/16

slide-44
SLIDE 44

Does it work?

Applicability on SPEC 2000

ammp art bzip2 crafty equake gzip mcf mesa twolf 50 100 150 200 250 300 Number of valid regions 22 7 30 41 10 29 1 106 29 50 28 37 58 24 31 1 283 123

Polly SPolly

11/16

slide-45
SLIDE 45

Does it work?

Runtime Results on SPEC 2000

ammp art bzip2 crafty equake gzip mcf mesa twolf 1 2 Speedup relatively to Polly

Polly SPolly

12/16

slide-46
SLIDE 46

Does it work?

Runtime Results on SPEC 2000

ammp art bzip2 crafty equake gzip mcf mesa twolf 1 2 Speedup relatively to Polly Polly crashes SPolly crashes SPolly crashes

Polly SPolly

12/16

slide-47
SLIDE 47

Does it work?

Runtime Results on SPEC 2000

ammp art bzip2 crafty equake gzip mcf mesa twolf 1 2 Speedup relatively to Polly Polly crashes Polly crashes with additional information Polly crashes with additional information Polly crashes with additional information SPolly crashes SPolly crashes

Polly SPolly

12/16

slide-48
SLIDE 48

Does it work?

Case Study – Setup

Algorithm 2D derivation computation (basic image processing block) Inputs are given in 2 different resolutions Evaluated speedup of SPolly normalized against Polly

13/16

slide-49
SLIDE 49

Does it work?

Case Study – Results

512x4096 512x4096 4096x512 4096x512 512x4096 4096x512 Input image size 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Speedup relatively to clang

Polly SPolly

14/16

slide-50
SLIDE 50

Does it work?

Case Study – Results

512x4096 512x4096 4096x512 4096x512 512x4096 4096x512 Input image size 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Speedup relatively to clang

Polly SPolly

14/16

slide-51
SLIDE 51

Does it work?

Case Study – Results

512x4096 512x4096 4096x512 4096x512 512x4096 4096x512 Input image size 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Speedup relatively to clang

Polly SPolly

14/16

slide-52
SLIDE 52

Does it work?

Case Study – Results

512x4096 512x4096 4096x512 4096x512 512x4096 4096x512 Input image size 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Speedup relatively to clang

Polly SPolly

14/16

slide-53
SLIDE 53

Does it work?

Case Study – Results

512x4096 512x4096 4096x512 4096x512 512x4096 4096x512 Input image size 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Speedup relatively to clang

Polly SPolly

14/16

slide-54
SLIDE 54

Does it work?

Case Study – Results

512x4096 512x4096 4096x512 4096x512 512x4096 4096x512 Input image size 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Speedup relatively to clang

Polly SPolly

14/16

slide-55
SLIDE 55

Does it work?

Case Study – Results

512x4096 512x4096 4096x512 4096x512 512x4096 4096x512 Input image size 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Speedup relatively to clang

Polly SPolly

14/16

slide-56
SLIDE 56

Does it work?

Runtime Results on Polybench

bicg syrk jacobi-1d-imper trmm symm syr2k cholesky fdtd-apml 3mm lu doitgen seidel-1d gemm adi gesummv floyd-warshall ludcmp covariance jacobi-2d-imper atax fdtd-2d trisolv 2mm mvt gemver reg_detect correlation gramschmidt dynprog durbin

1 2 3 4 5 6 7 8 9 10 11 12 Speedup relatively to clang

SPolly 1st run SPolly 2nd run

Geomean: [1st run] 1.134 [2nd run] 1.481

15/16

slide-57
SLIDE 57

Summary

85.2% 14.8% 65.9% 19.3% 14.8%

bicg syrk jacobi-1d-imper trmm symm syr2k cholesky fdtd-apml 3mm lu doitgen seidel-1d gemm adi gesummv floyd-warshall ludcmp covariance jacobi-2d-imper atax fdtd-2d trisolv 2mm mvt gemver reg_detect correlation gramschmidt dynprog durbin

1 2 3 4 5 6 7 8 9 10 11 12 Speedup relatively to clang SPolly 1st run SPolly 2nd run

16/16

slide-58
SLIDE 58

Summary

85.2% 14.8% 65.9% 19.3% 14.8%

bicg syrk jacobi-1d-imper trmm symm syr2k cholesky fdtd-apml 3mm lu doitgen seidel-1d gemm adi gesummv floyd-warshall ludcmp covariance jacobi-2d-imper atax fdtd-2d trisolv 2mm mvt gemver reg_detect correlation gramschmidt dynprog durbin

1 2 3 4 5 6 7 8 9 10 11 12 Speedup relatively to clang SPolly 1st run SPolly 2nd run

16/16

slide-59
SLIDE 59

Summary

85.2% 14.8% 65.9% 19.3% 14.8%

Valid SCoPs Invalid SCoPs Valid sSCoPs Polly SPolly LLVM-IR SCoP Detection sSCoP Detection Polyhedral Optimizations Region Speculation Code Generation

Program

Specialized Versions Runtime Dispatcher Profiling Versions JIT-Environment Profiling Information bicg syrk jacobi-1d-imper trmm symm syr2k cholesky fdtd-apml 3mm lu doitgen seidel-1d gemm adi gesummv floyd-warshall ludcmp covariance jacobi-2d-imper atax fdtd-2d trisolv 2mm mvt gemver reg_detect correlation gramschmidt dynprog durbin

1 2 3 4 5 6 7 8 9 10 11 12 Speedup relatively to clang SPolly 1st run SPolly 2nd run

16/16

slide-60
SLIDE 60

Summary

85.2% 14.8% 65.9% 19.3% 14.8%

Valid SCoPs Invalid SCoPs Valid sSCoPs Polly SPolly LLVM-IR SCoP Detection sSCoP Detection Polyhedral Optimizations Region Speculation Code Generation

Program

Specialized Versions Runtime Dispatcher Profiling Versions JIT-Environment Profiling Information bicg syrk jacobi-1d-imper trmm symm syr2k cholesky fdtd-apml 3mm lu doitgen seidel-1d gemm adi gesummv floyd-warshall ludcmp covariance jacobi-2d-imper atax fdtd-2d trisolv 2mm mvt gemver reg_detect correlation gramschmidt dynprog durbin

1 2 3 4 5 6 7 8 9 10 11 12 Speedup relatively to clang SPolly 1st run SPolly 2nd run

16/16

slide-61
SLIDE 61

Summary

85.2% 14.8% 65.9% 19.3% 14.8%

Valid SCoPs Invalid SCoPs Valid sSCoPs Polly SPolly LLVM-IR SCoP Detection sSCoP Detection Polyhedral Optimizations Region Speculation Code Generation

Program

Specialized Versions Runtime Dispatcher Profiling Versions JIT-Environment Profiling Information bicg syrk jacobi-1d-imper trmm symm syr2k cholesky fdtd-apml 3mm lu doitgen seidel-1d gemm adi gesummv floyd-warshall ludcmp covariance jacobi-2d-imper atax fdtd-2d trisolv 2mm mvt gemver reg_detect correlation gramschmidt dynprog durbin

1 2 3 4 5 6 7 8 9 10 11 12 Speedup relatively to clang SPolly 1st run SPolly 2nd run

16/16

slide-62
SLIDE 62
slide-63
SLIDE 63

Does it work?

Case Study – Setup continued

◮ Convolution kernel of size 3x3 ◮ Applied to all channels of an RGBA image (e.g., png) ◮ Measured on a Intel(R) Core(TM) i5 CPU M 560

Image source:

https://sonnati.wordpress.com/2010/10/06/flash-h-264-h-264-squared-%E2%80%93-part-iii/

slide-64
SLIDE 64

Concept

Alias tests

slide-65
SLIDE 65

Concept

Alias tests

f o r ( i = 0; i < N; i ++) { f o r ( j = 0 ; j < N; j++) { // I 1 C[ i ] [ j ] = 0; f o r ( k = 0 ; k < N; k++) { // I 2 I 3 I 4 C[ i ] [ j ] += A[ i ] [ k ] ∗ B[ k ] [ j ] ; } } }

slide-66
SLIDE 66

Concept

Alias tests

f o r ( i = 0; i < N; i ++) { f o r ( j = 0 ; j < N; j++) { // I 1 C[ i ] [ j ] = 0; f o r ( k = 0 ; k < N; k++) { // I 2 I 3 I 4 C[ i ] [ j ] += A[ i ] [ k ] ∗ B[ k ] [ j ] ; } } }

Acc bp ma Ma I1 and I2 C N∗N−1 I3 A N∗N−1 I4 B N∗N−1

slide-67
SLIDE 67

Concept

Alias tests

f o r ( i = 0; i < N; i ++) { f o r ( j = 0 ; j < N; j++) { // I 1 C[ i ] [ j ] = 0; f o r ( k = 0 ; k < N; k++) { // I 2 I 3 I 4 C[ i ] [ j ] += A[ i ] [ k ] ∗ B[ k ] [ j ] ; } } }

Acc bp ma Ma I1 and I2 C N∗N−1 I3 A N∗N−1 I4 B N∗N−1 bool ab = B[N∗N−1] < A [ 0 ] | | B [ 0 ] > A[N∗N−1]; bool ac = C[N∗N−1] < A [ 0 ] | | C [ 0 ] > A[N∗N−1]; bool bc = B[N∗N−1] < C [ 0 ] | | B [ 0 ] > C[N∗N−1]; bool n o a l i a s f o u n d = ab && ac && bc ;

slide-68
SLIDE 68