Overview on Hardware Optimizations for Database Engines
Annett Ungethüm, Dirk Habich, Tomas Karnagel, Sebastian Haas, Eric Mier, Gerhard Fettweis, Wolfgang Lehner
BTW 2017, Stuttgart, Germany, 2017-03-09
Overview on Hardware Optimizations for Database Engines Annett - - PowerPoint PPT Presentation
Overview on Hardware Optimizations for Database Engines Annett Ungethm, Dirk Habich, Tomas Karnagel, Sebastian Haas, Eric Mier, Gerhard Fettweis, Wolfgang Lehner BTW 2017, Stuttgart, Germany, 2017-03-09 Interaction DB-Engine and Hardware
BTW 2017, Stuttgart, Germany, 2017-03-09
2
1970 1980 1990 2000 2010 2020 10 100 1000 10000 1e+05 1e+06 1e+07 memory (KByte) 1970 1980 1990 2000 2010 2020 2 4 6 8 10 #cores
3
1970 1980 1990 2000 2010 2020 1 10 100 1000 10000 1e+05 1e+06 1e+07 #transistors (x1000) process (nm) http://engineering.nyu.edu/garg/node/31
4
5
6
7
8
9
Co Control-Pl Plane Co Control-Pl Plane
10
EVELOPMEN ENT OF OF IN INSTRUCTIO ION SE SET EX EXTEN TENSIONS WI WITH TH
ENSILICA TO TOOLS
YNTHE HESIS OF OF RT
CODE
11
Bi Bitmap p Co Compression and Pr Processin ing ( (AND, OR OR, XOR OR) Ha Hashing So Sorted Se Set Operat ations ns WAH PLWAH COMPAX Hash + Lookup Hash + Insert Hash Keys Hash Sampling CityHash32 Merge Sort Intersection Union Difference Sort-Merge Join Sort-Merge Aggregation (SUM)
Primivites
12 Basic RISC Instruction Set Application-Specific Instruction Set
Application-Specific States Application-Specific Registers Basic Registers
Instruction fetch Load-Store Unit 0 Load-Store Unit 1 Data Prefetcher Interconnect Local Instruction Memory Local Data Memory 0 Local Data Memory 1
64 bit 128 bit 128 bit
13
bitmap index OID X =0 =1 =2 =3 1 1 2 1 1 3 3 1 4 2 1 5 3 1 6 3 1 7 1 1 8 3 1 b1 b2 b3 b4
select * from T where X < 2 Table T
Bit-wise OR
14
40000380 00000000 00000000 001FFFFF b1 40000380 8000002 001FFFFF
Literal 0 fill Literal
7FFFFFFF 7FFFFFFF 7C0001E0 3FE00000 b2 WAH b1 C0000002 7C0001E0 3FE00000
1 fill Literal Literal
WAH b2 Bit-wise OR
32 bit words In hex OR OR OR OR
1) Load WAH word(s) 2) Calculate output (Fill-Fill, Literal-Fill, Literal-Literal) 3) Combine output
10<runlength> 11<runlength>
... ... 7FFFFFFF 00000000
15
WHILE(XIDX!=XSIZE && YIDX!=YSIZE) { //new X or Y? Calculate new fill count … if(XisFill==1 && YisFill==1) { //2 fills if(XfillWords<YfillWords) min=XfillWords; else min=YfillWords; writeFill(comprResultBI,&Zidx,X[Xidx]|Y[Yidx],min); XfillWords-=min; YfillWords-=min; } else if((XisFill==1 && YisFill==0) || (XisFill==0 && YisFill==1)) { if(XisFill==1){ XfillWords--; if((X[Xidx]&0xC0000000)==0xC0000000) writeFill(comprResultBI, &Zidx, 0xC0000000, 1); else { comprResultBI[Zidx]=Y[Yidx]; Zidx++; } } if(YisFill==1){ YfillWords--; if((Y[Yidx]&0xC0000000)==0xC0000000) writeFill(comprResultBI, &Zidx, 0xC0000000, 1); else {comprResultBI[Zidx]=X[Xidx]; Zidx++; } } } else { result=X[Xidx]|Y[Yidx]; if((result&0x7FFFFFFF)==0x7FFFFFFF) writeFill(comprResultBI, &Zidx, 0xC0000000, 1); else if((result&0x7FFFFFFF)==0) writeFill(comprResultBI, &Zidx, 0x80000000, 1); else { comprResultBI[Zidx]=X[Xidx]|Y[Yidx]; Zidx++; } } }
Fill-Fill Literal-Fill Literal-Literal
16
Application specific states Preprocessing Operation Postprocessing Application specific states Initial Load Load Prepare Store Store Memory 0 Memory 1 Memory 0 Memory 1
0000000F 00000003 40000380 80000002 001FFFFF C0000002 7C0001E0 3FE00000
M E M O R Y M E M O R Y 1
10000000..11000001..00101010..0111011..
11000000..00101010..11000001..00110111.. 10000000..11000001..00101010..01110111..
Is word fill or Literal?
11111111..11111111..11111111..11111111.. 00000000..00000000..00000000..
11000000..00101010..11000001..0011011..
00000000.. v 11111111... => 111111.. Write to output stream
with increased fill counter 00000000.0000000..00000..110011010..
Buffer result
11001110.. 00000000.. 00000000.. 00000000..
M E M O R Y 0/1
Proceed to next word (4x)
Align to 128-bit lines
Perform operation OR
17
40000380 00000000 00000000 001FFFFF b1 40000380 8000002 001FFFFF
Literal 0 fill Literal
7FFFFFFF 7FFFFFFF 7C0001E0 3FE00000 b2 WAH b1 C0000002 7C0001E0 3FE00000
1 fill Literal Literal
WAH b2 Bit-wise OR
32 bit words In hex OR OR OR OR
do{ ldXstream(); ldYstream(); WAHinst(); WAHinst(); WAHinst(); } while(WAHinst());
18
Bi Bitmap p Co Compression and and Pr Processin ing ( (AND, OR OR, XOR OR) Ha Hashing So Sorted Se Set Op Operations WAH PLWAH COMPAX Hash + Lookup Hash + Insert Hash Keys Hash Sampling CityHash32 Merge Sort Intersection Union Difference Sort-Merge Join Sort-Merge Aggregation (SUM) BitiX X X X HASHI X X X X X Titan3D X X X X X X X Tomahawk DBA X X X X X X
Processor Extension
19
Pr Processor De Description Te Technology [n [nm] Ato
tota tal [m
[mm²] fMA
MAX [GHz
Hz] PMA
MAX [W
[W] ] @ fMA
MAX
Tomahawk without DBA Basic Xtensa LX5 without instruction set extensions, 1 LSU, 32-bit memory interface 28 15.92 0.555 0.7 Tomahawk with DBA Set of different DB-Extensions for WAH- Compression, Hashing and Sorted-Set Operations 28 18 0.5 0.753 Intel i7-6500U Low-power Intel 2-core processor based
14 99* 3.1 25 Comparison
20
21
22
22
Local Memory Local Memory Local Memory
Local Memory Cache tAN tNMc 0xCCA 1 0x00B 2 0x0FA 3 0x1FD 4 0xDE1 5 0x0ED 6 0x00E 7 0xD0A tNA tMcN tMcM tMMc tAPP Problem: Many round-trips for key lookups Approach: “Teach B-trees to the memory controller“
23
23
Local Memory Local Memory Local Memory
Local Memory Cache
0xCC6 1 0x000 2 0x0F0 3 0x1FD 4 0xDE1 5 0x0ED 6 0x00E 7 0xD0A tCN tNC tMcM tMMc tNP tPN tPMc tMcP Vision (and first simulations)
memory layout
Implementation (no yet in silicon)
24
25
26
BTW 2017, Stuttgart, Germany, 2017-03-09