Is vectorization easy? Is vectorization enough? Sbastien Ponce - PowerPoint PPT Presentation

Is vectorization easy? Is vectorization enough? Sébastien Ponce Florian Lemaitre

Plan Introduction Matrix-Vector product Batch processing Hand-made Vectorization Check vectorization Conclusion & Guidelines Plan Introduction 1 What is SIMD ? How is vectorization done? Matrix-Vector product example 2 Impact of other optimizations on vectorization Let’s vectorize Performance Batch processing 3 Array of Structure Structure of Array Hand-made Vectorization 4 Check vectorization 5 Assembly Callgrind Conclusion & Guidelines 6 S. Ponce – F. Lemaitre Vectorization: Easy? Enough? 11/12/2017 2 / 19

Plan Introduction Matrix-Vector product Batch processing Hand-made Vectorization Check vectorization Conclusion & Guidelines What is SIMD ? Single Instruction Multiple Data Available on Intel architectures since 2000 Same time to process 4 , 8 , . . . float s than 1 on regular arithmetic [] x 0 x 1 x 2 x 3 X X + + + + + y 0 y 1 y 2 y 3 Y [] Y X + Y X []+ Y [] x 0+ y 0 x 1+ y 1 x 2+ y 2 x 3+ y 3 S. Ponce – F. Lemaitre Vectorization: Easy? Enough? 11/12/2017 3 / 19

Plan Introduction Matrix-Vector product Batch processing Hand-made Vectorization Check vectorization Conclusion & Guidelines How is vectorization done? Algorithm ( C = A + B scalar) Vector code input : A , B // n vector output : C // n vector what you can write using for i = 0 : n do matlab or numpy without C [ i ] ← A [ i ] + B [ i ] matrices Algorithm ( C = A + B vector) Vectorization is done in 3 : A , B input // n vector steps: output : C // n vector C [ : ] ← A [ : ] + B [ : ] Detect the pattern 1 eg: a simple loop Algorithm ( C = A + B SIMD ) Convert pattern into 2 : A , B abstract vector code input // n vector output : C // n vector Convert vector code into 3 for i = 0 : 4 : n do fixed width vector code C [ i : i +4] ← A [ i : i +4] + B [ i : i +4] S. Ponce – F. Lemaitre Vectorization: Easy? Enough? 11/12/2017 4 / 19

Plan Introduction Matrix-Vector product Batch processing Hand-made Vectorization Check vectorization Conclusion & Guidelines Plan Introduction 1 What is SIMD ? How is vectorization done? Matrix-Vector product example 2 Impact of other optimizations on vectorization Let’s vectorize Performance Batch processing 3 Array of Structure Structure of Array Hand-made Vectorization 4 Check vectorization 5 Assembly Callgrind Conclusion & Guidelines 6 S. Ponce – F. Lemaitre Vectorization: Easy? Enough? 11/12/2017 5 / 19

Plan Introduction Matrix-Vector product Batch processing Hand-made Vectorization Check vectorization Conclusion & Guidelines Matrix-Vector product Algorithm ( Y = A · X ) : A input // n × n matrix : X input // n vector output : Y // n vector Simple algorithm : s temp // scalar accumulator used a lot for i = 0 : n do change of basis in ROOT s ← 0 for j = 0 : n do s ← s + A [ i, j ] · X [ j ] Y [ i ] ← s S. Ponce – F. Lemaitre Vectorization: Easy? Enough? 11/12/2017 5 / 19

Plan Introduction Matrix-Vector product Batch processing Hand-made Vectorization Check vectorization Conclusion & Guidelines Small loop unrolling Impact of other optimizations Algorithm ( Y = A · X ) Complete unrolling is called input : A // n × n matrix unwinding . input : X // n vector output : Y // n vector Compilers are able to unroll temp : s // scalar accumulator small loops for i = 0 : n do s ← 0 if it is considered worth it for j = 0 : n do s ← s + A [ i, j ] · X [ j ] Loop version easier to Y [ i ] ← s understand For a Human Algorithm ( Y = A · X unwinded) For a compiler too input : A // 3 × 3 matrix input : X // 3 vector Unrolled version makes output : Y // 3 vector vectorization hard Y [0] ← A [0 , 0] · X [0]+ A [0 , 1] · X [1]+ A [0 , 2] · X [2] Pattern not recognized Y [1] ← A [1 , 0] · X [0]+ A [1 , 1] · X [1]+ A [1 , 2] · X [2] Y [2] ← A [2 , 0] · X [0]+ A [2 , 1] · X [1]+ A [2 , 2] · X [2] S. Ponce – F. Lemaitre Vectorization: Easy? Enough? 11/12/2017 6 / 19

Plan Introduction Matrix-Vector product Batch processing Hand-made Vectorization Check vectorization Conclusion & Guidelines Loop Order Impact of other optimizations Loop order can be changed Changes the way Algorithm ( Y = A · X scalar ij ) elements are accessed : s temp // scalar accumulator and processed for i = 0 : n do s ← 0 Vectorization will not be for j = 0 : n do s ← s + A [ i, j ] · X [ j ] applied the same way Y [ i ] ← s ij order: Algorithm ( Y = A · X scalar ji ) temp : x // scalar A elements are accessed for i = 0 : n do in Row-Major order Y [ i ] ← 0 for j = 0 : n do x ← X [ j ] ji order: for i = 0 : n do Y [ i ] ← Y [ i ] + A [ i, j ] · x A elements are accessed in Column-Major order S. Ponce – F. Lemaitre Vectorization: Easy? Enough? 11/12/2017 7 / 19

Is vectorization easy? Is vectorization enough? Sbastien Ponce - PowerPoint PPT Presentation

Is vectorization easy? Is vectorization enough? Sbastien Ponce Florian Lemaitre Plan Introduction Matrix-Vector product Batch processing Hand-made Vectorization Check vectorization Conclusion & Guidelines Plan Introduction 1 What

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

Function Call Re-Vectorization Pupil: Rubens Emilio Alves Moreira Advisor: Fernando Magno Quinto

LLVM Auto-Vectorization Past Present Future Renato Golin www.linaro.org LLVM

Lecture 3 SIMD and Vectorization GPU Architecture Todays lecture Vectorization and SSE

Shunem 1. Sufficiency means enough to meet the situation; enough to accomplish the task.

Easy Flype & Easy HiFlype Peripheral Self-Expanding Stent System 20/07/2018 Easy Flype

ADVANCED DATABASE SYSTEMS Vectorization vs. Compilation @ Andy_Pavlo // 15- 721 // Spring

Using Machine Learning to Improve Automatic Vectorization Kevin Stock Louis-Nol Pouchet P .

Welcome! /INFOMOV/ Optimization & Vectorization J. Bikker - Sep-Nov 2019 - Lecture 2:

TSLP Throttling Automatic Vectorization: When Less is More Vasileios Porpodas and Timothy M.

Rapid Restoration Diagnostic Motivate Enable Implement ++ Urgent Urgent Not easy Easy

Easy Move Progression and Distinctive versions The lifts for accessibility EASY MOVE Distinctive

Title Table of content 1 Easy to change colors, photos and Text 2 Easy to change colors,

Meal Planning Made Easy Meal Planning Made Easy Healthy Utah Meal Planning Made Easy

Expandabee Easy Access | Easy Lift | Easy Bees Red B Where we left off Standalone

1 1 easy to compute , 1 easy to compute 2

Introduction to R v2019-01 R can just be a calculator > 3+2 [1] 5 > 2/7 [1] 0.2857143

Neural Ordinary Differential Equations Ricky Chen, Yulia Rubanova, Jesse Bettencourt, David

A Review of Linear Algebra Mohammad Emtiyaz Khan CS,UBC A Review of Linear Algebra p.1/13

Overview Last week introduced the important Diagonalisation Theorem: An n n matrix A is

Automatic SIMD vectorization for Haskell Leaf Petersen, Dominic Orchard , Neal Glew ICFP 2013 -

Vectors III MA1S1 Tristan McLoughlin October 17, 2014 Anton & Rorres: Ch 3.3 Hefferon: Ch

Z3strBV: A Solver for a Theory of Strings and Bit-vectors Murphy Berzish 1 , Sanu Subramanian 2 ,

Whats Algebra? Note: This assumes you have not taken Math 351! If you have, you probably have

Is vectorization easy? Is vectorization enough? Sbastien Ponce - PowerPoint PPT Presentation

Is vectorization easy? Is vectorization enough? Sbastien Ponce Florian Lemaitre Plan Introduction Matrix-Vector product Batch processing Hand-made Vectorization Check vectorization Conclusion & Guidelines Plan Introduction 1 What

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

Function Call Re-Vectorization Pupil: Rubens Emilio Alves Moreira Advisor: Fernando Magno Quinto

LLVM Auto-Vectorization Past Present Future Renato Golin www.linaro.org LLVM

Lecture 3 SIMD and Vectorization GPU Architecture Todays lecture Vectorization and SSE

Shunem 1. Sufficiency means enough to meet the situation; enough to accomplish the task.

Easy Flype &amp; Easy HiFlype Peripheral Self-Expanding Stent System 20/07/2018 Easy Flype

ADVANCED DATABASE SYSTEMS Vectorization vs. Compilation @ Andy_Pavlo // 15- 721 // Spring

Using Machine Learning to Improve Automatic Vectorization Kevin Stock Louis-Nol Pouchet P .

Welcome! /INFOMOV/ Optimization &amp; Vectorization J. Bikker - Sep-Nov 2019 - Lecture 2:

TSLP Throttling Automatic Vectorization: When Less is More Vasileios Porpodas and Timothy M.

Rapid Restoration Diagnostic Motivate Enable Implement ++ Urgent Urgent Not easy Easy

Easy Move Progression and Distinctive versions The lifts for accessibility EASY MOVE Distinctive

Title Table of content 1 Easy to change colors, photos and Text 2 Easy to change colors,

Meal Planning Made Easy Meal Planning Made Easy Healthy Utah Meal Planning Made Easy

Expandabee Easy Access | Easy Lift | Easy Bees Red B Where we left off Standalone

1 1 easy to compute , 1 easy to compute 2

Introduction to R v2019-01 R can just be a calculator &gt; 3+2 [1] 5 &gt; 2/7 [1] 0.2857143

Neural Ordinary Differential Equations Ricky Chen, Yulia Rubanova, Jesse Bettencourt, David

A Review of Linear Algebra Mohammad Emtiyaz Khan CS,UBC A Review of Linear Algebra p.1/13

Overview Last week introduced the important Diagonalisation Theorem: An n n matrix A is

Automatic SIMD vectorization for Haskell Leaf Petersen, Dominic Orchard , Neal Glew ICFP 2013 -

Vectors III MA1S1 Tristan McLoughlin October 17, 2014 Anton &amp; Rorres: Ch 3.3 Hefferon: Ch

Z3strBV: A Solver for a Theory of Strings and Bit-vectors Murphy Berzish 1 , Sanu Subramanian 2 ,

Whats Algebra? Note: This assumes you have not taken Math 351! If you have, you probably have

Easy Flype & Easy HiFlype Peripheral Self-Expanding Stent System 20/07/2018 Easy Flype

Welcome! /INFOMOV/ Optimization & Vectorization J. Bikker - Sep-Nov 2019 - Lecture 2:

Introduction to R v2019-01 R can just be a calculator > 3+2 [1] 5 > 2/7 [1] 0.2857143

Vectors III MA1S1 Tristan McLoughlin October 17, 2014 Anton & Rorres: Ch 3.3 Hefferon: Ch