Changelog
Changes made in this version not seen in fjrst lecture:
15 November: vector addr picture: make order of result consistent with
- rder of inputs
Changelog Changes made in this version not seen in fjrst lecture: - - PowerPoint PPT Presentation
Changelog Changes made in this version not seen in fjrst lecture: 15 November: vector addr picture: make order of result consistent with order of inputs 15 November: correct square to matmul on several vector slides 15 November: correct mixups
1
2
3
4
5
6
7
A[3] B[3] A[4] B[4] A[5] B[5] A[6] B[6] A[7] B[7] A[8] B[8] A[9] B[9] A[10] B[10] A[11] B[11] A[12] B[12] A[13] B[13] A[14] B[14] A[15] B[15] A[16] B[16] A[17] B[17]
A[8] + B[8] A[9] + B[9] A[10] + B[10] A[11] + B[11] A[12] + B[12] A[13] + B[13] A[14] + B[14] A[15] + B[15]
8
9
10
11
12
13
14
15
15
15
15
16
16
17
18
19
20
21
22
22
22
22
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
39
40
41
41
42
42
43
43
44
45
46
200000 400000 600000 800000 1000000 sample # 101 102 103 104 105 106 107 108 time (ns)
47
200000 400000 600000 800000 1000000 sample # 101 102 103 104 105 106 107 108 time (ns)
48
loop.exe ssh.exe firefox.exe loop.exe ssh.exe
49
loop.exe ssh.exe firefox.exe loop.exe ssh.exe
49
loop.exe ssh.exe firefox.exe loop.exe ssh.exe
49
loop.exe ssh.exe firefox.exe loop.exe ssh.exe
50
loop.exe ssh.exe firefox.exe loop.exe ssh.exe
50
51
52
53
54
55
56
56
57
58
59
60
61
62
63
64
65
66
67
68
69
2000 4000 6000 8000 10000 N 0.0 0.2 0.4 0.6 0.8 1.0 matrix in L3 cache
70
100 200 300 400 500 N 0.0 0.5 1.0 1.5 2.0
cycles per multiply/add [less optimized loop] unblocked blocked 200 400 600 800 1000 N 0.0 0.1 0.2 0.3 0.4 0.5
cycles per multiply/add [optimized loop] unblocked blocked 71
72
73
74
200 400 600 800 1000 N 0.0 0.1 0.2 0.3 0.4 0.5
75
76
77
78
78
200 400 600 800 1000 N 0.0 0.5 1.0 1.5 2.0 2.5 3.0 cycles/count
79
200 400 600 800 1000 N 0.0 0.5 1.0 1.5 2.0 2.5 3.0 cycles/count
80
81