ADVANCED DATABASE SYSTEMS Vectorized Execution @ Andy_Pavlo // - PowerPoint PPT Presentation

Lect ure # 20 ADVANCED DATABASE SYSTEMS Vectorized Execution @ Andy_Pavlo // 15- 721 // Spring 2019

CMU 15-721 (Spring 2019) 2 Background Hardware Vectorized Algorithms (Columbia)

CMU 15-721 (Spring 2019) 3 VECTO RIZATIO N The process of converting an algorithm's scalar implementation that processes a single pair of operands at a time, to a vector implementation that processes one operation on multiple pairs of operands at once.

CMU 15-721 (Spring 2019) 4 WH Y TH IS M ATTERS Say we can parallelize our algorithm over 32 cores. Each core has a 4-wide SIMD registers. Potential Speed-up: 32x × 4x = 128x

CMU 15-721 (Spring 2019) 5 M ULTI- CO RE CPUS Use a small number of high-powered cores. → Intel Xeon Skylake / Kaby Lake → High power consumption and area per core. Massively superscalar and aggressive out-of- order execution → Instructions are issued from a sequential stream. → Check for dependencies between instructions. → Process multiple instructions per clock cycle.

CMU 15-721 (Spring 2019) 6 M AN Y IN TEGRATED CO RES (M IC) Use a larger number of low-powered cores. → Intel Xeon Phi → Low power consumption and area per core. → Expanded SIMD instructions with larger register sizes. Knights Ferry (Columbia Paper) → Non-superscalar and in-order execution → Cores = Intel P54C (aka Pentium from the 1990s). Knights Landing (Since 2016) → Superscalar and out-of-order execution. → Cores = Silvermont (aka Atom)

CMU 15-721 (Spring 2019) 8 S IN GLE I N STRUCTIO N, M ULTIPLE D ATA A class of CPU instructions that allow the processor to perform the same operation on multiple data points simultaneously. All major ISAs have microarchitecture support SIMD operations. → x86 : MMX, SSE, SSE2, SSE3, SSE4, AVX, AVX2, AVX512 → PowerPC : Altivec → ARM : NEON

CMU 15-721 (Spring 2019) 9 SIM D EXAM PLE X + Y = Z 8 7 x 1 y 1 x 1 +y 1 6 X 5 x 2 y 2 x 2 +y 2 + = 4 Z ⋮ ⋮ ⋮ 3 x n y n x n +y n 2 SISD 1 + 9 1 for (i=0; i<n; i++) { 1 Z[i] = X[i] + Y[i]; 1 } Y 1 1 1 1 1

CMU 15-721 (Spring 2019) 9 SIM D EXAM PLE X + Y = Z 8 7 x 1 y 1 x 1 +y 1 6 X 5 x 2 y 2 x 2 +y 2 + = 4 Z ⋮ ⋮ ⋮ 3 x n y n x n +y n 2 SISD 1 + 9 8 7 6 5 4 3 2 1 for (i=0; i<n; i++) { 1 Z[i] = X[i] + Y[i]; 1 } Y 1 1 1 1 1

CMU 15-721 (Spring 2019) 9 SIM D EXAM PLE 128-bit SIMD Register X + Y = Z 8 7 x 1 y 1 x 1 +y 1 8 7 6 5 6 X 5 x 2 y 2 x 2 +y 2 + = 4 Z ⋮ ⋮ ⋮ 3 x n y n x n +y n 2 SIMD + 1 1 for (i=0; i<n; i++) { 1 1 1 1 1 Z[i] = X[i] + Y[i]; 1 128-bit SIMD Register } Y 1 1 1 1 1

CMU 15-721 (Spring 2019) 9 SIM D EXAM PLE 128-bit SIMD Register X + Y = Z 8 7 x 1 y 1 x 1 +y 1 8 7 6 5 6 X 5 x 2 y 2 x 2 +y 2 + = 4 Z ⋮ ⋮ ⋮ 3 x n y n x n +y n 2 SIMD + 1 9 8 7 6 1 for (i=0; i<n; i++) { 128-bit SIMD Register 1 1 1 1 1 Z[i] = X[i] + Y[i]; 1 128-bit SIMD Register } Y 1 1 1 1 1

CMU 15-721 (Spring 2019) 9 SIM D EXAM PLE X + Y = Z 8 7 x 1 y 1 x 1 +y 1 6 X 5 x 2 y 2 x 2 +y 2 + = 4 Z ⋮ ⋮ ⋮ 3 4 3 2 1 x n y n x n +y n 2 SIMD + 1 9 8 7 6 5 4 3 2 1 for (i=0; i<n; i++) { 1 Z[i] = X[i] + Y[i]; 1 } Y 1 1 1 1 1 1 1 1 1

CMU 15-721 (Spring 2019) 10 STREAM IN G SIM D EXTEN SIO N S (SSE) SSE is a collection SIMD instructions that target special 128-bit SIMD registers. These registers can be packed with four 32-bit scalars after which an operation can be performed on each of the four elements simultaneously. First introduced by Intel in 1999.

CMU 15-721 (Spring 2019) 11 SIM D IN STRUCTIO NS (1) Data Movement → Moving data in and out of vector registers Arithmetic Operations → Apply operation on multiple data items (e.g., 2 doubles, 4 floats, 16 bytes) → Example: ADD , SUB , MUL , DIV , SQRT , MAX , MIN Logical Instructions → Logical operations on multiple data items → Example: AND , OR , XOR , ANDN , ANDPS , ANDNPS

CMU 15-721 (Spring 2019) 12 SIM D IN STRUCTIO NS (2) Comparison Instructions → Comparing multiple data items ( == , < , <= , > , >= , != ) Shuffle instructions → Move data in between SIMD registers Miscellaneous → Conversion: Transform data between x86 and SIMD registers. → Cache Control: Move data directly from SIMD registers to memory (bypassing CPU cache).

CMU 15-721 (Spring 2019) 13 IN TEL SIM D EXTEN SIO N S Width Integers Single-P Double-P 1997 MMX 64 bits ✔ 1999 SSE 128 bits ✔ (×4) ✔ 2001 SSE2 128 bits ✔ (×2) ✔ ✔ 2004 SSE3 128 bits ✔ ✔ ✔ 2006 SSSE 3 128 bits ✔ ✔ ✔ 2006 SSE 4.1 128 bits ✔ ✔ ✔ 2008 SSE 4.2 128 bits ✔ ✔ ✔ 2011 AVX 256 bits ✔ (×8) ✔ (×4) ✔ 2013 AVX2 256 bits ✔ ✔ ✔ 2017 AVX-512 512 bits ✔ (×16) ✔ (×8) ✔ Source: James Reinders

CMU 15-721 (Spring 2019) 14 VECTO RIZATIO N Choice #1: Automatic Vectorization Choice #2: Compiler Hints Choice #3: Explicit Vectorization Source: James Reinders

CMU 15-721 (Spring 2019) 14 VECTO RIZATIO N Ease of Use Choice #1: Automatic Vectorization Choice #2: Compiler Hints Choice #3: Explicit Vectorization Programmer Control Source: James Reinders

CMU 15-721 (Spring 2019) 15 AUTO M ATIC VECTO RIZATIO N The compiler can identify when instructions inside of a loop can be rewritten as a vectorized operation. Works for simple loops only and is rare in database operators. Requires hardware support for SIMD instructions.

CMU 15-721 (Spring 2019) 16 AUTO M ATIC VECTO RIZATIO N This loop is not legal to void add ( int *X , automatically vectorize. int *Y , int *Z ) { for ( int i=0; i<MAX; i++) { Z[i] = X[i] + Y[i]; } }

CMU 15-721 (Spring 2019) 16 AUTO M ATIC VECTO RIZATIO N This loop is not legal to void add ( int *X , automatically vectorize. int *Y , *Z=*X+1 int *Z ) { for ( int i=0; i<MAX; i++) { Z[i] = X[i] + Y[i]; } } These might point to the same address!

CMU 15-721 (Spring 2019) 16 AUTO M ATIC VECTO RIZATIO N This loop is not legal to void add ( int *X , automatically vectorize. int *Y , *Z=*X+1 int *Z ) { for ( int i=0; i<MAX; i++) { The code is written such that the Z[i] = X[i] + Y[i]; addition is described as being } } done sequentially. These might point to the same address!

CMU 15-721 (Spring 2019) 17 CO M PILER H IN TS Provide the compiler with additional information about the code to let it know that is safe to vectorize. Two approaches: → Give explicit information about memory locations. → Tell the compiler to ignore vector dependencies.

CMU 15-721 (Spring 2019) 18 CO M PILER H IN TS The restrict keyword in C++ void add ( int * restrict X , tells the compiler that the arrays int * restrict Y , int * restrict Z ) { are distinct locations in memory. for ( int i=0; i<MAX; i++) { Z[i] = X[i] + Y[i]; } }

CMU 15-721 (Spring 2019) 19 CO M PILER H IN TS This pragma tells the compiler to void add ( int *X , ignore loop dependencies for the int *Y , int *Z ) { vectors. #pragma ivdep for ( int i=0; i<MAX; i++) { It’s up to you make sure that this Z[i] = X[i] + Y[i]; } is correct. }

CMU 15-721 (Spring 2019) 20 EXPLICIT VECTO RIZATIO N Use CPU intrinsics to manually marshal data between SIMD registers and execute vectorized instructions. Potentially not portable.

CMU 15-721 (Spring 2019) 21 EXPLICIT VECTO RIZATIO N Store the vectors in 128-bit SIMD void add ( int *X , int *Y , registers. int *Z ) { __mm128i *vecX = (__m128i*)X; Then invoke the intrinsic to add __mm128i *vecY = (__m128i*)Y; __mm128i *vecZ = (__m128i*)Z; together the vectors and write for ( int i=0; i<MAX /4 ; i++) { them to the output location. _mm_store_si128(vecZ++, ⮱ _mm_add_epi32(*vecX++, ⮱ *vecY++)) ; } }

CMU 15-721 (Spring 2019) 22 VECTO RIZATIO N DIRECTIO N Approach #1: Horizontal 0 1 2 3 → Perform operation on all elements together 6 SIMD Add within a single vector. Approach #2: Vertical 0 1 2 3 → Perform operation in an elementwise manner on elements of each vector. 1 2 3 4 SIMD Add 1 1 1 1 Source:

CMU 15-721 (Spring 2019) 23 EXPLICIT VECTO RIZATIO N Linear Access Operators → Predicate evaluation → Compression Ad-hoc Vectorization → Sorting → Merging Composable Operations → Multi-way trees → Bucketized hash tables Source: Orestis Polychroniou

CMU 15-721 (Spring 2019) 24 VECTO RIZED DBM S ALGO RITH M S Principles for efficient vectorization by using fundamental vector operations to construct more advanced functionality. → Favor vertical vectorization by processing different input data per lane. → Maximize lane utilization by executing different things per lane subset. RETHINKING SIMD VECTORIZATION FOR IN IN- MEMORY DATABASES SIGMOD 2015

CMU 15-721 (Spring 2019) 25 FUN DAM EN TAL O PERATIO N S Selective Load Selective Store Selective Gather Selective Scatter

ADVANCED DATABASE SYSTEMS Vectorized Execution @ Andy_Pavlo // - PowerPoint PPT Presentation

Lect ure # 20 ADVANCED DATABASE SYSTEMS Vectorized Execution @ Andy_Pavlo // 15- 721 // Spring 2019 CMU 15-721 (Spring 2019) 2 Background Hardware Vectorized Algorithms (Columbia) CMU 15-721 (Spring 2019) 3 VECTO RIZATIO N The process

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

Advanced Database Management Systems Database Management Systems Alvaro A A Fernandes School of

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

Database Systems Database Systems 1 Creating a Database System Design Construction

Overview of Database Systems CS3860 - Jay Urbain, PhD Introduction to Database Systems 1 2 UFR

ADVANCED DATABASE SYSTEMS Database Compression @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

National Address Database National Address Database What is a National Address Database?

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction & History of Database Systems

ADVANCED DATABASE SYSTEMS Self-Driving Database Management Systems @ Andy_Pavlo // 15- 721 //

Redelivery Management of Assets, Airlines and Lessors Enda Clarke, Chief Technical Officer,

Stormwater Charge and Credit Program Multi-Residential Credit Program Seminar December 7 th ,

Airports of Thailand Plc. Airports of Thailand Plc. Corporate Presentation for Q1 of Fiscal Year

Compromised Social Network Accounts Detection and Incentives Manuel Egele Dept. of Electrical

Fault Attacks on Supersingular Isogeny Cryptosystems Yan Bo Ti Department of Mathematics,

Course Overview Day 1: Fundamentals accelerator architectures, review of shared-memory

LANDER EXAMPLE Fundamentals of Computer Science I Outline Approach Find: Objects

LLVM for the future of Supercomputing Hal Finkel hfinkel@anl.gov 2017-03-27 2017 European LLVM

ADVANCED DATABASE SYSTEMS Vectorized Execution @ Andy_Pavlo // - PowerPoint PPT Presentation

Lect ure # 20 ADVANCED DATABASE SYSTEMS Vectorized Execution @ Andy_Pavlo // 15- 721 // Spring 2019 CMU 15-721 (Spring 2019) 2 Background Hardware Vectorized Algorithms (Columbia) CMU 15-721 (Spring 2019) 3 VECTO RIZATIO N The process

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

Advanced Database CS 525: Organization? Advanced Database =Database Implementation

Advanced Database Management Systems Database Management Systems Alvaro A A Fernandes School of

DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016-2017

DATABASE SYSTEMS Database programming in a web environment Database System Course AGENDA FOR

Lect ure # 11 ADVANCED DATABASE SYSTEMS System Catalogs and Database Compression @

NEBC Database Course 2008 Database Servers Database Interfaces Tim Booth : tbooth@ceh.ac.uk

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

DATABASE SECURITY CS4750 Database Systems Prof. Nada Basit Email: basit@virginia.edu Fall

Database Systems Database Systems 1 Creating a Database System Design Construction

Overview of Database Systems CS3860 - Jay Urbain, PhD Introduction to Database Systems 1 2 UFR

ADVANCED DATABASE SYSTEMS Database Compression @ Andy_Pavlo // 15- 721 // Spring 2019 CMU

National Address Database National Address Database What is a National Address Database?

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction &amp; History of Database Systems

ADVANCED DATABASE SYSTEMS Self-Driving Database Management Systems @ Andy_Pavlo // 15- 721 //

Redelivery Management of Assets, Airlines and Lessors Enda Clarke, Chief Technical Officer,

Stormwater Charge and Credit Program Multi-Residential Credit Program Seminar December 7 th ,

Airports of Thailand Plc. Airports of Thailand Plc. Corporate Presentation for Q1 of Fiscal Year

Compromised Social Network Accounts Detection and Incentives Manuel Egele Dept. of Electrical

Fault Attacks on Supersingular Isogeny Cryptosystems Yan Bo Ti Department of Mathematics,

Course Overview Day 1: Fundamentals accelerator architectures, review of shared-memory

LANDER EXAMPLE Fundamentals of Computer Science I Outline Approach Find: Objects

LLVM for the future of Supercomputing Hal Finkel hfinkel@anl.gov 2017-03-27 2017 European LLVM

Lect ure # 01 ADVANCED DATABASE SYSTEMS Course Introduction & History of Database Systems