PACT’02
Jaewook Shin, Jacqueline Chame and Mary Hall
September 23, 2002
OF SOUTHERN OF SOUTHERN
USC USC
UNIVERSITY UNIVERSITY UNIVERSITY UNIVERSITY CALIFORNIA CALIFORNIA
Jaewook Shin , Jacqueline Chame and Mary Hall PACT02 September 23, - - PowerPoint PPT Presentation
Jaewook Shin , Jacqueline Chame and Mary Hall PACT02 September 23, 2002 USC USC UNIVERSITY UNIVERSITY UNIVERSITY UNIVERSITY OF SOUTHERN OF SOUTHERN CALIFORNIA CALIFORNIA Motivation Multimedia applications are becoming
OF SOUTHERN OF SOUTHERN
UNIVERSITY UNIVERSITY UNIVERSITY UNIVERSITY CALIFORNIA CALIFORNIA
PACT'02 2
PACT'02 3
PACT'02 4
128
SR31
1 2 3 4 5 6 13 12 11 10 9 8 7 16 15 14 1 1 2 2 3 3 4 4 5 6 7 8
SR0 SR1 SR2 SR3 SR4 SR5 Sixteen 8-bit Operands Eight 16-bit Operands Four 32-bit Operands
PACT'02 5
PACT'02 6
PACT'02 7
PACT'02 8
PACT'02 9
PACT'02 10
PACT'02 11
PACT'02 12
for(i=0; i<N; i++) A[i], A[i+2]
for(i=0; i<N; i++) A[i]
for(i=0; i<N; i++) A[i], A[i+2] for(i=0; i<N; i+=4) A[i:i+3]
A[i] A[i+1] A[i] A[i+2] A[i+3] A[i+2] A[i]
A[i] A[i+2]
PACT'02 13
A[i+0]
High address superword Low address
PACT'02 14
aX
SWS(SuperWord Size): Number of data elements that fit in a superword register
PACT'02 15
1 16 31 1 11 21 31
0.0E+00 5.0E+08 1.0E+09 1.5E+09 2.0E+09 2.5E+09 3.0E+09 3.5E+09
Unroll amount j-loop Unroll amount i-loop # Mem. Acc.
PACT'02 16
PACT'02 17
p temp1 p replicate(a, 0) p = shift_and_load(p, temp1) w = *((float *)&a + 0); x = *((float *)&b + 0); y = *((float *)&c + 0); z = *((float *)&d + 0); *((float *)&p + 0) = w; *((float *)&p + 1) = x; *((float *)&p + 2) = y; *((float *)&p + 3) = z; temp1 = replicate(a, 0); temp2 = replicate(b, 0); temp3 = replicate(c, 0); temp4 = replicate(d, 0); p = shift_and_load(p, temp1); p = shift_and_load(p, temp2); p = shift_and_load(p, temp3); p = shift_and_load(p, temp4); Packing through memory Packing in registers
– Alignment, non-unit stride array references
a[0] a[1] a[2] a[3] a[0] a[0] a[0] a[0] a[0] a[0] a[0] a[0] a[0]
PACT'02 18
p temp2 p replicate(a, 0) p = shift_and_load(p, temp2) w = *((float *)&a + 0); x = *((float *)&b + 0); y = *((float *)&c + 0); z = *((float *)&d + 0); *((float *)&p + 0) = w; *((float *)&p + 1) = x; *((float *)&p + 2) = y; *((float *)&p + 3) = z; temp1 = replicate(a, 0); temp2 = replicate(b, 0); temp3 = replicate(c, 0); temp4 = replicate(d, 0); p = shift_and_load(p, temp1); p = shift_and_load(p, temp2); p = shift_and_load(p, temp3); p = shift_and_load(p, temp4); Packing through memory Packing in registers
– Alignment, non-unit stride array references
a[0] a[1] a[2] a[3] a[0] a[0] a[0] a[0] a[0] b[0] b[0] b[0] b[0] a[0] b[0]
PACT'02 19
p temp3 p replicate(a, 0) p = shift_and_load(p, temp3) w = *((float *)&a + 0); x = *((float *)&b + 0); y = *((float *)&c + 0); z = *((float *)&d + 0); *((float *)&p + 0) = w; *((float *)&p + 1) = x; *((float *)&p + 2) = y; *((float *)&p + 3) = z; temp1 = replicate(a, 0); temp2 = replicate(b, 0); temp3 = replicate(c, 0); temp4 = replicate(d, 0); p = shift_and_load(p, temp1); p = shift_and_load(p, temp2); p = shift_and_load(p, temp3); p = shift_and_load(p, temp4); Packing through memory Packing in registers
– Alignment, non-unit stride array references
a[0] a[1] a[2] a[3] a[0] a[0] a[0] a[0] a[0] b[0] c[0] c[0] c[0] c[0] a[0] b[0] c[0]
PACT'02 20
p temp4 p replicate(a, 0) p = shift_and_load(p, temp4) w = *((float *)&a + 0); x = *((float *)&b + 0); y = *((float *)&c + 0); z = *((float *)&d + 0); *((float *)&p + 0) = w; *((float *)&p + 1) = x; *((float *)&p + 2) = y; *((float *)&p + 3) = z; temp1 = replicate(a, 0); temp2 = replicate(b, 0); temp3 = replicate(c, 0); temp4 = replicate(d, 0); p = shift_and_load(p, temp1); p = shift_and_load(p, temp2); p = shift_and_load(p, temp3); p = shift_and_load(p, temp4); Packing through memory Packing in registers
– Alignment, non-unit stride array references
a[0] a[1] a[2] a[3] a[0] a[0] a[0] a[0] a[0] b[0] c[0] d[0] d[0] d[0] d[0] a[0] b[0] c[0] d[0]
PACT'02 21
G4 executable C/ Fortran program
Superword instruction extended C program
Select unroll amounts
PACT'02 22
10 20 30 40 50 60 70 80 90 100
VMM FIR YUV MMM SWIM TOMCATV
10 20 30 40 50 60 70 80 90 100
VMM FIR YUV MMM SWIM TOMCATV
PACT'02 23
VMM FIR YUV MMM SWIM TOMCATV
SLP Unroll-and-Jam + SLP Superword Replacement Packing in Registers
PACT'02 24
No data locality at superword register level Scalar registers that do not have spatial locality Wolfe(89), Ferrante et al(91), Lam et al(91), Wolf(92), Esseghir(93), Temam et al(93,95), Carr et al(94), Coleman and McKinley(95), Gosh et al(97,98), Chame and Moon(99), Rivera and Tseng(99), Sarkar and Megiddo(00), Chatterjee(01), ... Cheong and Lam(97), Larsen and Amarasinghe(00), Sreraman and Govindarajan(00), Commercial products
Wolf(92), Carr and Kennedy(94), Jimenez(99) Locality in caches Superword-Level Parallelism Locality in Scalar Registers
PACT'02 25