Optimization part 1
1
Optimization part 1 1 Changelog Changes made in this version not - - PowerPoint PPT Presentation
Optimization part 1 1 Changelog Changes made in this version not seen in fjrst lecture: 29 Feb 2018: loop unrolling performance: remove bogus instruction cache overhead remark 1 29 Feb 2018: spatial locality in Akj: correct reference to B k +1
1
1
2
3
4
5
6
7
8
9
10
11
12
13
13
13
13
14
100 200 300 400 500 N 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
15
100 200 300 400 500 N 0.0 0.5 1.0 1.5 2.0
cycles per multiply/add [less optimized loop] unblocked blocked 200 400 600 800 1000 N 0.0 0.1 0.2 0.3 0.4 0.5
cycles per multiply/add [optimized loop] unblocked blocked 16
2000 4000 6000 8000 10000 N 0.0 0.2 0.4 0.6 0.8 1.0 matrix in L3 cache
17
18
18
19
20
21
22
23
24
25
26
27
28
29
29
200 400 600 800 1000 N 0.0 0.5 1.0 1.5 2.0 2.5 3.0 cycles/count
30
200 400 600 800 1000 N 0.0 0.5 1.0 1.5 2.0 2.5 3.0 cycles/count
31
32
33
34
35
36
37
38
39
39
40
41
42
43
44
45
45
46
47
48
49
49
50
51
52
52
53