TSLP Throttling Automatic Vectorization: When Less is More
Vasileios Porpodas and Timothy M. Jones
University of Cambridge
LLVM Developer’s Meeting 2015
slide 1 of 16 www.cl.cam.ac.uk/∼vp331/
TSLP Throttling Automatic Vectorization: When Less is More - - PowerPoint PPT Presentation
TSLP Throttling Automatic Vectorization: When Less is More Vasileios Porpodas and Timothy M. Jones University of Cambridge LLVM Developers Meeting 2015 www.cl.cam.ac.uk/ vp331/ slide 1 of 16 Why SIMD Vectorization? Scalar Reg. File
LLVM Developer’s Meeting 2015
slide 1 of 16 www.cl.cam.ac.uk/∼vp331/
Scalar Func. Units Scalar Reg. File
FU FU FU FU
slide 2 of 16 www.cl.cam.ac.uk/∼vp331/
Scalar Func. Units Scalar Reg. File
FU FU FU FU
slide 2 of 16 www.cl.cam.ac.uk/∼vp331/
1 2 3 Vector Reg. File
Scalar Func. Units Scalar Reg. File
FU FU FU FU Vector Unit
slide 2 of 16 www.cl.cam.ac.uk/∼vp331/
1 2 3 Vector Reg. File
Scalar Func. Units Scalar Reg. File
FU FU FU FU Vector Unit
slide 2 of 16 www.cl.cam.ac.uk/∼vp331/
1 2 3 Vector Reg. File
Scalar Func. Units Scalar Reg. File
FU FU FU FU Vector Unit
slide 2 of 16 www.cl.cam.ac.uk/∼vp331/
1 2 3 Vector Reg. File
Scalar Func. Units Scalar Reg. File
FU FU FU FU Vector Unit
slide 2 of 16 www.cl.cam.ac.uk/∼vp331/
1 2 3 Vector Reg. File
Scalar Func. Units Scalar Reg. File
FU FU FU FU Vector Unit
slide 2 of 16 www.cl.cam.ac.uk/∼vp331/
1 2 3 Vector Reg. File
Scalar Func. Units Scalar Reg. File
FU FU FU FU Vector Unit
slide 2 of 16 www.cl.cam.ac.uk/∼vp331/
slide 3 of 16 www.cl.cam.ac.uk/∼vp331/
slide 3 of 16 www.cl.cam.ac.uk/∼vp331/
slide 3 of 16 www.cl.cam.ac.uk/∼vp331/
slide 3 of 16 www.cl.cam.ac.uk/∼vp331/
slide 3 of 16 www.cl.cam.ac.uk/∼vp331/
slide 3 of 16 www.cl.cam.ac.uk/∼vp331/
Scalar Code
slide 4 of 16 www.cl.cam.ac.uk/∼vp331/
1 Consecutive Stores 2 Reductions Find vectorization seed instructions 1. Scalar Code
slide 4 of 16 www.cl.cam.ac.uk/∼vp331/
1 Consecutive Stores 2 Reductions
Find vectorization seed instructions 1. Scalar Code 2. Generate graph of isomorphic scalar groups
slide 4 of 16 www.cl.cam.ac.uk/∼vp331/
1 Consecutive Stores 2 Reductions
Find vectorization seed instructions 1. Calculate Vector Cost Calculate Scalar Cost 3. Scalar Code 2. Generate graph of isomorphic scalar groups
slide 4 of 16 www.cl.cam.ac.uk/∼vp331/
1 Consecutive Stores 2 Reductions
Find vectorization seed instructions 1. Calculate Vector Cost Calculate Scalar Cost 3. 4. If < Vector Cost Scalar Cost Scalar Code 2. Generate graph of isomorphic scalar groups
slide 4 of 16 www.cl.cam.ac.uk/∼vp331/
1 Consecutive Stores 2 Reductions
Find vectorization seed instructions 1. Calculate Vector Cost Calculate Scalar Cost 3. 4. If < Vector Cost Scalar Cost Vectorize groups & emit vectors YES 5. DONE Scalar Code 2. Generate graph of isomorphic scalar groups
slide 4 of 16 www.cl.cam.ac.uk/∼vp331/
1 Consecutive Stores 2 Reductions
Find vectorization seed instructions 1. Calculate Vector Cost Calculate Scalar Cost 3. 4. If < Vector Cost Scalar Cost Vectorize groups & emit vectors YES 5. NO DONE Scalar Code 2. Generate graph of isomorphic scalar groups
slide 4 of 16 www.cl.cam.ac.uk/∼vp331/
slide 5 of 16 www.cl.cam.ac.uk/∼vp331/
slide 5 of 16 www.cl.cam.ac.uk/∼vp331/
slide 5 of 16 www.cl.cam.ac.uk/∼vp331/
slide 6 of 16 www.cl.cam.ac.uk/∼vp331/
slide 6 of 16 www.cl.cam.ac.uk/∼vp331/
slide 6 of 16 www.cl.cam.ac.uk/∼vp331/
slide 6 of 16 www.cl.cam.ac.uk/∼vp331/
slide 6 of 16 www.cl.cam.ac.uk/∼vp331/
slide 6 of 16 www.cl.cam.ac.uk/∼vp331/
slide 6 of 16 www.cl.cam.ac.uk/∼vp331/
slide 6 of 16 www.cl.cam.ac.uk/∼vp331/
slide 6 of 16 www.cl.cam.ac.uk/∼vp331/
slide 6 of 16 www.cl.cam.ac.uk/∼vp331/
slide 6 of 16 www.cl.cam.ac.uk/∼vp331/
slide 6 of 16 www.cl.cam.ac.uk/∼vp331/
slide 6 of 16 www.cl.cam.ac.uk/∼vp331/
slide 6 of 16 www.cl.cam.ac.uk/∼vp331/
slide 6 of 16 www.cl.cam.ac.uk/∼vp331/
slide 6 of 16 www.cl.cam.ac.uk/∼vp331/
slide 6 of 16 www.cl.cam.ac.uk/∼vp331/
slide 6 of 16 www.cl.cam.ac.uk/∼vp331/
slide 6 of 16 www.cl.cam.ac.uk/∼vp331/
slide 7 of 16 www.cl.cam.ac.uk/∼vp331/
slide 7 of 16 www.cl.cam.ac.uk/∼vp331/
slide 7 of 16 www.cl.cam.ac.uk/∼vp331/
slide 7 of 16 www.cl.cam.ac.uk/∼vp331/
1. Scalar IR Find seed instructions for vectorization
slide 8 of 16 www.cl.cam.ac.uk/∼vp331/
2. 1. Scalar IR Generate the SLP graph Find seed instructions for vectorization
slide 8 of 16 www.cl.cam.ac.uk/∼vp331/
2. 3. 1. Scalar IR Calculate all valid cuts Generate the SLP graph Find seed instructions for vectorization
slide 8 of 16 www.cl.cam.ac.uk/∼vp331/
2. 3. 1. Scalar IR
Calculate all valid cuts Generate the SLP graph Find seed instructions for vectorization
slide 8 of 16 www.cl.cam.ac.uk/∼vp331/
2. 3. 5. 1. Scalar IR Calculate cost of vectorization
Calculate all valid cuts Generate the SLP graph Find seed instructions for vectorization
slide 8 of 16 www.cl.cam.ac.uk/∼vp331/
2. 3. 5. 1. Scalar IR Calculate cost of vectorization
4.
6. Save cut with best cost Throttle (cut) the SLP graph Calculate all valid cuts Generate the SLP graph Find seed instructions for vectorization
slide 8 of 16 www.cl.cam.ac.uk/∼vp331/
2. 3. 5. 1. Scalar IR 7. Tried all cuts? Calculate cost of vectorization
4.
6. Save cut with best cost Throttle (cut) the SLP graph Calculate all valid cuts Generate the SLP graph NO Find seed instructions for vectorization
slide 8 of 16 www.cl.cam.ac.uk/∼vp331/
2. 3. 5. 1. Scalar IR 7. 8. Tried all cuts? cost < threshold? YES Calculate cost of vectorization
4.
6. Save cut with best cost Throttle (cut) the SLP graph Calculate all valid cuts Generate the SLP graph NO Find seed instructions for vectorization
slide 8 of 16 www.cl.cam.ac.uk/∼vp331/
2. 3. 5. 1. Scalar IR 7. 8. Tried all cuts? cost < threshold? YES YES Calculate cost of vectorization
4.
6. Save cut with best cost Throttle (cut) the SLP graph Calculate all valid cuts Generate the SLP graph NO 9. Replace scalars with vectors Find seed instructions for vectorization
slide 8 of 16 www.cl.cam.ac.uk/∼vp331/
2. 3. 5. DONE 1. Scalar IR 7. 8. Tried all cuts? cost < threshold? YES YES Calculate cost of vectorization
4.
6. Save cut with best cost Throttle (cut) the SLP graph Calculate all valid cuts Generate the SLP graph NO 9. Replace scalars with vectors Find seed instructions for vectorization
slide 8 of 16 www.cl.cam.ac.uk/∼vp331/
2. 3. 5. DONE 1. Scalar IR 7. 8. Tried all cuts? cost < threshold? NO YES YES Calculate cost of vectorization
4.
6. Save cut with best cost Throttle (cut) the SLP graph Calculate all valid cuts Generate the SLP graph NO 9. Replace scalars with vectors Find seed instructions for vectorization
slide 8 of 16 www.cl.cam.ac.uk/∼vp331/
slide 9 of 16 www.cl.cam.ac.uk/∼vp331/
slide 9 of 16 www.cl.cam.ac.uk/∼vp331/
slide 9 of 16 www.cl.cam.ac.uk/∼vp331/
slide 9 of 16 www.cl.cam.ac.uk/∼vp331/
slide 9 of 16 www.cl.cam.ac.uk/∼vp331/
slide 9 of 16 www.cl.cam.ac.uk/∼vp331/
slide 9 of 16 www.cl.cam.ac.uk/∼vp331/
slide 9 of 16 www.cl.cam.ac.uk/∼vp331/
slide 9 of 16 www.cl.cam.ac.uk/∼vp331/
slide 9 of 16 www.cl.cam.ac.uk/∼vp331/
slide 9 of 16 www.cl.cam.ac.uk/∼vp331/
L L L S + * L + *
slide 10 of 16 www.cl.cam.ac.uk/∼vp331/
S L L L S + * L + *
slide 10 of 16 www.cl.cam.ac.uk/∼vp331/
S + S L L L S + * L + *
slide 10 of 16 www.cl.cam.ac.uk/∼vp331/
S + S S + L L L L S + * L + *
slide 10 of 16 www.cl.cam.ac.uk/∼vp331/
S + S S + L L S + * L L L L S + * L + *
slide 10 of 16 www.cl.cam.ac.uk/∼vp331/
S + S S + L L S + * L L L S + * L + L L L S + * L + *
slide 10 of 16 www.cl.cam.ac.uk/∼vp331/
S + S S + L L S + * L L L L S + * L + * L L S + * L + L L L S + * L + *
slide 10 of 16 www.cl.cam.ac.uk/∼vp331/
S + S S + L L S + * L L L L S + * L + * L L S + * L + L S + * L L L S + * L + *
slide 10 of 16 www.cl.cam.ac.uk/∼vp331/
S + S S + L L S + * L L L L S + * L + * L L S + * L + L S + * L L S + * + L L L S + * L + *
slide 10 of 16 www.cl.cam.ac.uk/∼vp331/
S + S S + L L S + * L L L L S + * L + * L L S + * L + L S + * L L S + * + L L L S + * + * L L L S + * L + *
slide 10 of 16 www.cl.cam.ac.uk/∼vp331/
S + S S + L L S + * L L L L S + * L + * L L S + * L + L S + * L L S + * + L L L S + * + * L L L S + * L + *
slide 10 of 16 www.cl.cam.ac.uk/∼vp331/
S + S S + L L S + * L L L L S + * L + * L L S + * L + L S + * L L S + * + L L L S + * + * L L L S + * L + *
slide 10 of 16 www.cl.cam.ac.uk/∼vp331/
S + S S + L L S + * L L L L S + * L + * L L S + * L + L S + * L L S + * + L L L S + * + * L L L S + * L + *
slide 10 of 16 www.cl.cam.ac.uk/∼vp331/
S + S S + L L S + * L L L L S + * L + * L L S + * L + L S + * L L S + * + L L L S + * + * L L L S + * L + *
slide 10 of 16 www.cl.cam.ac.uk/∼vp331/
S + S S + L L S + * L L L L S + * L + * L L S + * L + L S + * L L S + * + L L L S + * + * L L L S + * L + *
slide 10 of 16 www.cl.cam.ac.uk/∼vp331/
Y ... subgraph ... X
slide 11 of 16 www.cl.cam.ac.uk/∼vp331/
Y ... subgraph subgraphs > T ? NO ... X
slide 11 of 16 www.cl.cam.ac.uk/∼vp331/
Y ... Y ... subgraph subgraphs > T ? NO subgraph ... ... X X
slide 11 of 16 www.cl.cam.ac.uk/∼vp331/
Y ... Y ... subgraph subgraphs > T ? NO ... Y ... subgraph subgraph ... ... X X X
slide 11 of 16 www.cl.cam.ac.uk/∼vp331/
Y ... Y ... Y ... subgraph subgraphs > T ? NO subgraph ... Y ... subgraph subgraph ... ... X X X ... X
slide 11 of 16 www.cl.cam.ac.uk/∼vp331/
Y ... Y ... Y ... subgraph subgraphs > T ? NO YES subgraph ... Y ... subgraph subgraph ... ... X X X ... X
slide 11 of 16 www.cl.cam.ac.uk/∼vp331/
Y ... Y ... Y ... Y ... subgraph subgraphs > T ? NO YES subgraph ... Y ... subgraph subgraph ... subgraph ... X X X ... X ... X
slide 11 of 16 www.cl.cam.ac.uk/∼vp331/
Y ... Y ... Y ... Y ... subgraph subgraphs > T ? NO YES subgraph ... Y ... subgraph subgraph ... subgraph ... X X X ... X ... X
slide 11 of 16 www.cl.cam.ac.uk/∼vp331/
slide 12 of 16 www.cl.cam.ac.uk/∼vp331/
slide 12 of 16 www.cl.cam.ac.uk/∼vp331/
slide 12 of 16 www.cl.cam.ac.uk/∼vp331/
slide 12 of 16 www.cl.cam.ac.uk/∼vp331/
1 All loop, SLP and TSLP vectorizers disabled (O3)
slide 12 of 16 www.cl.cam.ac.uk/∼vp331/
1 All loop, SLP and TSLP vectorizers disabled (O3) 2 O3 + SLP enabled (SLP)
slide 12 of 16 www.cl.cam.ac.uk/∼vp331/
1 All loop, SLP and TSLP vectorizers disabled (O3) 2 O3 + SLP enabled (SLP) 3 O3 + TSLP enabled (TSLP)
slide 12 of 16 www.cl.cam.ac.uk/∼vp331/
0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 m
i v a t i
c
p u t e
h s m u l t
u 3
a t
e c
u m
d i r e w a l d
R c
r e c t i
c
p u t e
r i a n g l e
b
l b m
a n d l e I n O u t F l
s h i f t
R c
r e c t i
G M e a n Normalized Time
O3 SLP TSLP
slide 13 of 16 www.cl.cam.ac.uk/∼vp331/
0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 m
i v a t i
c
p u t e
h s m u l t
u 3
a t
e c
u m
d i r e w a l d
R c
r e c t i
c
p u t e
r i a n g l e
b
l b m
a n d l e I n O u t F l
s h i f t
R c
r e c t i
G M e a n Normalized cost
Scalar SLP TSLP
slide 14 of 16 www.cl.cam.ac.uk/∼vp331/
2 4 6 3 6 9 12 15 18 21 24
SLP TSLP
slide 15 of 16 www.cl.cam.ac.uk/∼vp331/
2 4 6 3 6 9 12 15 18 21 24
SLP TSLP
2 10 20 30 40 50 60
SLP TSLP
slide 15 of 16 www.cl.cam.ac.uk/∼vp331/
2 4 6 3 6 9 12 15 18 21 24
SLP TSLP
2 10 20 30 40 50 60
SLP TSLP
4 8 12 60 120 180 240 300 360 420 480 540 600
SLP TSLP
slide 15 of 16 www.cl.cam.ac.uk/∼vp331/
slide 16 of 16 www.cl.cam.ac.uk/∼vp331/
slide 16 of 16 www.cl.cam.ac.uk/∼vp331/
slide 16 of 16 www.cl.cam.ac.uk/∼vp331/
slide 16 of 16 www.cl.cam.ac.uk/∼vp331/
slide 16 of 16 www.cl.cam.ac.uk/∼vp331/
slide 16 of 16 www.cl.cam.ac.uk/∼vp331/
slide 16 of 16 www.cl.cam.ac.uk/∼vp331/