DEALING WITH ALIASING USING DEALING WITH ALIASING USING CONTRACTS - - PowerPoint PPT Presentation

dealing with aliasing using dealing with aliasing using
SMART_READER_LITE
LIVE PREVIEW

DEALING WITH ALIASING USING DEALING WITH ALIASING USING CONTRACTS - - PowerPoint PPT Presentation

DEALING WITH ALIASING USING DEALING WITH ALIASING USING CONTRACTS CONTRACTS BEATING FORTRAN'S PERFORMANCE BEATING FORTRAN'S PERFORMANCE , PhD Student, Etvs Lornd University Gbor Horvth xazax.hun@gmail.com 1 ALIASING ALIASING int


slide-1
SLIDE 1

DEALING WITH ALIASING USING DEALING WITH ALIASING USING CONTRACTS CONTRACTS

BEATING FORTRAN'S PERFORMANCE BEATING FORTRAN'S PERFORMANCE , PhD Student, Eötvös Loránd University Gábor Horváth xazax.hun@gmail.com

1

slide-2
SLIDE 2

ALIASING ALIASING

int f(int &a, float &b) { a = 2; b = 3; return a; } define i32 f(i32*, float*) { store i32 2, i32* %a store float 3, float* %b ret i32 2 }

2

slide-3
SLIDE 3

ALIASING ALIASING

int f(int &a, float &b) { a = 2; b = 3; return a; } define i32 f(i32*, float*) { store i32 2, i32* %a store float 3, float* %b ret i32 2 } int f(int &a, int &b) { a = 2; b = 3; return a; } define i32 f(i32*, i32*) { store i32 2, i32* %a store i32 3, i32* %b %tmp = load i32, i32* %a ret i32 %tmp }

2

slide-4
SLIDE 4

ALIASING ALIASING

Some parameters might alias! Type based alias analysis

int f(int &a, float &b) { a = 2; b = 3; return a; } define i32 f(i32*, float*) { store i32 2, i32* %a store float 3, float* %b ret i32 2 } int f(int &a, int &b) { a = 2; b = 3; return a; } define i32 f(i32*, i32*) { store i32 2, i32* %a store i32 3, i32* %b %tmp = load i32, i32* %a ret i32 %tmp }

2

slide-5
SLIDE 5

WHY DOES ALIASING MATTER? WHY DOES ALIASING MATTER?

LATENCY NUMBERS LATENCY NUMBERS

L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache

3

slide-6
SLIDE 6

WHY DOES ALIASING MATTER? WHY DOES ALIASING MATTER?

LATENCY NUMBERS LATENCY NUMBERS OPTIMIZATIONS OPTIMIZATIONS

L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache

3

slide-7
SLIDE 7

WHY DOES ALIASING MATTER? WHY DOES ALIASING MATTER?

LORE: LOop Repository for the Evaluation of compilers Numbers from P1296R0 Loops Sped up Mean Speedup Slowed Mean slowdown GCC 1939 734 (38%) 2.39x 155 (8%) 0.766 ICC 1861 843 (45%) 2.59x 94 (5%) 0.61 In some cases __restrict__ provides ~40X pref

4

slide-8
SLIDE 8

FORTRAN FORTRAN

Procedure arguments and variables may not alias Inception when CPU time was expensive To convince people not to write in assembly... ...you need to generate blazing fast code

5

slide-9
SLIDE 9

FORTRAN FORTRAN

Procedure arguments and variables may not alias Inception when CPU time was expensive To convince people not to write in assembly... ...you need to generate blazing fast code

C++ C++

No standard way (other than types) to give aliasing related hints.

5

slide-10
SLIDE 10

NOT VECTORIZED NOT VECTORIZED

void f(int *a, int *b, const int& num) { for(int i = 0; i < num; ++i) { a[i] = b[i] * b[i] + 1; } }

6

slide-11
SLIDE 11

NOT VECTORIZED NOT VECTORIZED VECTORIZED VECTORIZED

void f(int *a, int *b, const int& num) { for(int i = 0; i < num; ++i) { a[i] = b[i] * b[i] + 1; } } void f(int *a, int *b, int num) { for(int i = 0; i < num; ++i) { a[i] = b[i] * b[i] + 1; } }

6

slide-12
SLIDE 12

WHO WRITES CODE LIKE THAT? WHO WRITES CODE LIKE THAT?

7

slide-13
SLIDE 13

WHO WRITES CODE LIKE THAT? WHO WRITES CODE LIKE THAT?

template<typename T, ...> void foo(..., const T&) { ... }

7

slide-14
SLIDE 14

WHO WRITES CODE LIKE THAT? WHO WRITES CODE LIKE THAT?

Rings some bells?

template<typename T, ...> void foo(..., const T&) { ... }

7

slide-15
SLIDE 15

JASON'S EXAMPLE JASON'S EXAMPLE

void extend(std::uint8_t *src, std::uint32_t *dst) { for(int i = 0; i < 16; ++i) { dst[i] = src[i]; } }

8

slide-16
SLIDE 16

JASON'S EXAMPLE JASON'S EXAMPLE

Loop versioned, large unrolled code twice

void extend(std::uint8_t *src, std::uint32_t *dst) { for(int i = 0; i < 16; ++i) { dst[i] = src[i]; } }

8

slide-17
SLIDE 17

JASON'S EXAMPLE JASON'S EXAMPLE

enum struct Data : std::uint8_t {}; void extend(Data *src, std::uint32_t *dst) { for(int i = 0; i < 16; ++i) { dst[i] = (std::uint8_t)src[i]; } }

9

slide-18
SLIDE 18

JASON'S EXAMPLE JASON'S EXAMPLE

Only the vectorized version

enum struct Data : std::uint8_t {}; void extend(Data *src, std::uint32_t *dst) { for(int i = 0; i < 16; ++i) { dst[i] = (std::uint8_t)src[i]; } }

9

slide-19
SLIDE 19

IS IT ALWAYS POSSIBLE TO IS IT ALWAYS POSSIBLE TO UTILIZE THE TYPE BASED UTILIZE THE TYPE BASED ALIASING RULES? ALIASING RULES?

10

slide-20
SLIDE 20

NOT VECTORIZED NOT VECTORIZED

void g(int *result, int **matrix, int height, int width) { for(int i = 0; i < height; ++i) for(int j = 0; j < width; ++j) result[i] += matrix[i][j]; }

11

slide-21
SLIDE 21

NOT VECTORIZED NOT VECTORIZED VECTORIZED VECTORIZED

void g(int *result, int **matrix, int height, int width) { for(int i = 0; i < height; ++i) for(int j = 0; j < width; ++j) result[i] += matrix[i][j]; } void g(int * restrict result, int * restrict * matrix, int height, int width) { for(int i = 0; i < height; ++i) for(int j = 0; j < width; ++j) result[i] += matrix[i][j]; }

11

slide-22
SLIDE 22

restrict restrict

During each execution of a block in which a restricted pointer P is declared, if some object that is accessible through P (directly or indirectly) is modified, by any means, then all accesses to that object (both reads and writes) in that block must occur through P (directly or indirectly), otherwise the behavior is undefined.

12

slide-23
SLIDE 23

LET'S JUST ADD RESTRICT TO C++? LET'S JUST ADD RESTRICT TO C++?

How to annotate the code below?

void g(vector<int> &result, vector<vector<int>> &matrix) { for(int i = 0; i < matrix.size(); ++i) for(int j = 0; j < matrix[0].size(); ++j) result[i] += matrix[i][j]; }

13

slide-24
SLIDE 24

LET'S JUST ADD RESTRICT TO C++? LET'S JUST ADD RESTRICT TO C++?

How to annotate the code below? What would

  • r

mean?

void g(vector<int> &result, vector<vector<int>> &matrix) { for(int i = 0; i < matrix.size(); ++i) for(int j = 0; j < matrix[0].size(); ++j) result[i] += matrix[i][j]; } vector<int restrict> vector<int> restrict

13

slide-25
SLIDE 25

ADDING ADDING restrict restrict TO C++ TO C++

Many failed attempts, lots of unanswered questions Should restrict change the overload sets? Should restrict participate in name mangling? restrict was never designed to work with the class abstraction How should restrict carried through templates? Members, lambda captures, unions, ... C2X, n2260, clarifying restrict

14

slide-26
SLIDE 26

WHAT DO YOU THINK ABOUT THIS CODE? WHAT DO YOU THINK ABOUT THIS CODE?

void f(int * restrict x, int * restrict y); void g() { int x; f(&x, &x); }

15

slide-27
SLIDE 27

WHAT DO YOU THINK ABOUT THIS CODE? WHAT DO YOU THINK ABOUT THIS CODE?

Adding restrict to f makes it harder to use. It is now the caller's responsibility to ensure no aliasing is happening.

void f(int * restrict x, int * restrict y); void g() { int x; f(&x, &x); }

15

slide-28
SLIDE 28

WHAT DO YOU THINK ABOUT THIS CODE? WHAT DO YOU THINK ABOUT THIS CODE?

Adding restrict to f makes it harder to use. It is now the caller's responsibility to ensure no aliasing is happening. Restrict is a precondition!

void f(int * restrict x, int * restrict y); void g() { int x; f(&x, &x); }

15

slide-29
SLIDE 29

WHAT DO YOU THINK ABOUT THIS CODE? WHAT DO YOU THINK ABOUT THIS CODE?

Adding restrict to f makes it harder to use. It is now the caller's responsibility to ensure no aliasing is happening. Restrict is a precondition! Only if we had a way to describe preconditions in C++...

void f(int * restrict x, int * restrict y); void g() { int x; f(&x, &x); }

15

slide-30
SLIDE 30

WHAT DO YOU THINK ABOUT THIS CODE? WHAT DO YOU THINK ABOUT THIS CODE?

Adding restrict to f makes it harder to use. It is now the caller's responsibility to ensure no aliasing is happening. Restrict is a precondition! Only if we had a way to describe preconditions in C++... Voted into C++20 in June (Rapperswil meeting)

void f(int * restrict x, int * restrict y); void g() { int x; f(&x, &x); }

15

slide-31
SLIDE 31

CONTRACTS TO THE RESCUE? CONTRACTS TO THE RESCUE?

EXPLORING THE DESIGN SPACE EXPLORING THE DESIGN SPACE

16

slide-32
SLIDE 32

SIMPLE PRECONDITIONS SIMPLE PRECONDITIONS

f(x, x); is undefined The precondition is documented We have two mitigations: Runtime checks (with axiom removed) Static analysis

int f(int &a, int &b) [[expects axiom: &a != &b]] { a = 2; b = 3; return a; }

17

slide-33
SLIDE 33

SIMPLE PRECONDITIONS (LAMBDAS) SIMPLE PRECONDITIONS (LAMBDAS)

auto f = [](int &a, int &b) [[expects axiom: &a != &b]] { a = 2; b = 3; return a; }

18

slide-34
SLIDE 34

ARRAYS ARRAYS

int *merge(int *a, int *b, int num) [[expects: ???]];

19

slide-35
SLIDE 35

ARRAYS ARRAYS

Extend the language?

int *merge(int *a, int *b, int num) [[expects: ???]];

19

slide-36
SLIDE 36

ARRAYS ARRAYS

Extend the language?

int *merge(int *a, int *b, int num) [[expects: ???]]; int *merge(int *a, int *b, int num) [[expects: __disjoint(a, b, num)]];

19

slide-37
SLIDE 37

ARRAYS ARRAYS

Extend the language? __disjoint(a, b, c, ..., num)?

int *merge(int *a, int *b, int num) [[expects: ???]]; int *merge(int *a, int *b, int num) [[expects: __disjoint(a, b, num)]];

19

slide-38
SLIDE 38

ARRAYS ARRAYS

Extend the language? __disjoint(a, b, c, ..., num)?

int *merge(int *a, int *b, int num) [[expects: ???]]; int *merge(int *a, int *b, int num) [[expects: __disjoint(a, b, num)]]; int *merge(int *a, int *b, int num) [[expects: __distinct(a) && __distinct(b)]];

19

slide-39
SLIDE 39

POSSIBLE IMPLEMENTATION FOR POSSIBLE IMPLEMENTATION FOR __disjoint __disjoint?

// From: P1296R0 template<typename T, typename U> bool __disjoint(const T *pt, const U *pu, size_t n) { intptr_t bt = (intptr_t)pt, et = (intptr_t)(pt + n); intptr_t bu = (intptr_t)pu, eu = (intptr_t)(pu + n); return (et <= bu) || (eu <= bt); }

20

slide-40
SLIDE 40

POSSIBLE IMPLEMENTATION FOR POSSIBLE IMPLEMENTATION FOR __disjoint __disjoint?

Are we sure this is well defined? Compilers might want to have intrinsics instead.

// From: P1296R0 template<typename T, typename U> bool __disjoint(const T *pt, const U *pu, size_t n) { intptr_t bt = (intptr_t)pt, et = (intptr_t)(pt + n); intptr_t bu = (intptr_t)pu, eu = (intptr_t)(pu + n); return (et <= bu) || (eu <= bt); }

20

slide-41
SLIDE 41

USER DEFINED TYPES USER DEFINED TYPES

int f(S a, S b) [[expects: __disjoint(a.member, b.member)]];

21

slide-42
SLIDE 42

USER DEFINED TYPES USER DEFINED TYPES

int f(S a, S b) [[expects: __disjoint(a.member, b.member)]]; int f(S a, S b) [[expects: __disjoint(a.method(), b.method())]];

21

slide-43
SLIDE 43

USER DEFINED TYPES USER DEFINED TYPES

What if we need arguments? Use dummy symbols? Existentially or universally quantified?

int f(S a, S b) [[expects: __disjoint(a.member, b.member)]]; int f(S a, S b) [[expects: __disjoint(a.method(), b.method())]]; int f(S a, S b) [[expects: __disjoint(a.method(???), b.method(???))]];

21

slide-44
SLIDE 44

VIEWS TO THE RESCUE? VIEWS TO THE RESCUE?

22

slide-45
SLIDE 45

NON-ALIASING VIEW EXAMPLE NON-ALIASING VIEW EXAMPLE

template <typename ... > class unique_span { unique_span(...) [[expects: ???]]; reference operator[](index_type idx) const [[ensures x: __distinct(x, this, idx)]]; }; f(unique_span(vec), unique_span(vec2));

23

slide-46
SLIDE 46

BACK TO THE MATRIX EXAMPLE BACK TO THE MATRIX EXAMPLE

void g(unique_span<int> result, vector<unique_span<int>> &matrix) { for(int i = 0; i < matrix.size(); ++i) for(int j = 0; j < matrix[0].size(); ++j) result[i] += matrix[i][j]; }

24

slide-47
SLIDE 47

BACK TO THE MATRIX EXAMPLE BACK TO THE MATRIX EXAMPLE

Note that in real code we may want a multidimensional view or one dimensional matrix representation to avoid copying at the call site.

void g(unique_span<int> result, vector<unique_span<int>> &matrix) { for(int i = 0; i < matrix.size(); ++i) for(int j = 0; j < matrix[0].size(); ++j) result[i] += matrix[i][j]; }

24

slide-48
SLIDE 48

A NEW TYPE? ISN'T THAT HEAVY A NEW TYPE? ISN'T THAT HEAVY WEIGHT? WEIGHT?

25

slide-49
SLIDE 49

ARE THESE FUNCTIONS THE SAME? ARE THESE FUNCTIONS THE SAME?

double my_sqrt(double x) { return sqrt(x); } double my_sqrt(double x) { if (x < 0) return 0; return sqrt(x); } double my_sqrt(double x) { if (x < 0) throw ...; return sqrt(x); }

26

slide-50
SLIDE 50

ARE THESE FUNCTIONS THE SAME? ARE THESE FUNCTIONS THE SAME?

double my_sqrt(double x); double my_sqrt(double x) [[expects: x >= 0]]; double my_sqrt(double x) [[expects: x >= 0]] [[ensures ret: ret >= 0]];

27

slide-51
SLIDE 51

ARE THESE TYPES THE SAME? ARE THESE TYPES THE SAME?

unique_span<int> span<int>

28

slide-52
SLIDE 52

Exercise: how different are these types?

29

slide-53
SLIDE 53

Exercise: how different are these types? Hint: How many methods need to be annotated?

29

slide-54
SLIDE 54

Exercise: how different are these types? Hint: How many methods need to be annotated? Hint2: How many other things need to be annotated? Iterators?

29

slide-55
SLIDE 55

Exercise: how different are these types? Hint: How many methods need to be annotated? Hint2: How many other things need to be annotated? Iterators? Is it feasible to do all that inline?

29

slide-56
SLIDE 56

It might be a lot of work to create such types, but... These can be vocabulary types We should use such classes sparingly, as they impose burden on the caller Those methods/functions are now screaming that they are special and error prone We can do overloads!

30

slide-57
SLIDE 57

WE ALREADY HAVE TO REASON ABOUT WE ALREADY HAVE TO REASON ABOUT ALIASING ALIASING

std::copy* memcpy vs memmove We would get mitigations for existing UB!

31

slide-58
SLIDE 58

RELATED WORK RELATED WORK

p0856r0: Restrict as a library feature n3635, n4150: Annotating alias sets P1296R0: Very similar design, cooperating with the authors The malloc attribute of GCC, noalias attribute of Clang All major compilers has restrict like features as extensions IBM XL's #pragma disjoint

32

slide-59
SLIDE 59

P1296R0 P1296R0

std::disjoint only Discussed at San Diego meeting In ealry stages, no way to get into C++20

33

slide-60
SLIDE 60

ALIAS SETS ALIAS SETS

void * [[alias_set()]] malloc(size_t); int * [[alias_set(Foo)]] p1 = ...; int * [[alias_set(Bar), alias_set(Baz)]] p2 = ...; int * p3 = ...;

34

slide-61
SLIDE 61

THANKS FOR YOUR ATTENTION! THANKS FOR YOUR ATTENTION!

35

 