Q100: The Architecture and Design of a DATABASE PROCESSING UNIT
Lisa Wu, Andrea Lottarini, Tim Paine, Martha Kim, and Ken Ross Columbia University, NYC
1 Thursday, March 6, 2014
Q100: The Architecture and Design of a DATABASE PROCESSING UNIT - - PowerPoint PPT Presentation
Q100: The Architecture and Design of a DATABASE PROCESSING UNIT Lisa Wu, Andrea Lottarini, Tim Paine, Martha Kim, and Ken Ross Columbia University, NYC 1 Thursday, March 6, 2014 DPUs are analogous to GPUs Graphics Database Workloads
Lisa Wu, Andrea Lottarini, Tim Paine, Martha Kim, and Ken Ross Columbia University, NYC
1 Thursday, March 6, 2014
02/27/14 Columbia University
CPU GPU CPU DPU
2
Thursday, March 6, 2014
02/27/14 Columbia University
3
Thursday, March 6, 2014
02/27/14 Columbia University
3
Thursday, March 6, 2014
02/27/14 Columbia University
The sale of an airline ticket Sales Projection Transactional Processing Analytic Processing 3
Thursday, March 6, 2014
02/27/14 Columbia University
3
Thursday, March 6, 2014
02/27/14 Columbia University
JOIN AGGRE- GATE SORT SELECT
3
Thursday, March 6, 2014
02/27/14 Columbia University
3
Thursday, March 6, 2014
02/27/14 Columbia University
3
Relational Operator
Thursday, March 6, 2014
02/27/14 Columbia University
3
Thursday, March 6, 2014
02/27/14 Columbia University SELECT s_season, SUM(s_qty) as sum_qty FROM sales WHERE s_shipdate >= ‘2013-01-01’ GROUP BY s_season ORDER BY s_season
Query
4
Thursday, March 6, 2014
02/27/14 Columbia University SELECT s_season, SUM(s_qty) as sum_qty FROM sales WHERE s_shipdate >= ‘2013-01-01’ GROUP BY s_season ORDER BY s_season
Query
SALES Bool Gen Col3 Col2 Col1 Col Select Col Select Col Filter Col Filter Bool1 Stitch Col4 Col5 Table1 Agg Agg Agg Agg Parti- tion Table2 Table4 Table5 Append Final Answer Table6 Table7 Append Append Col Select Col SelectCol Select Col SelectCol Select Col SelectCol Select Col SelectCol Select Col6 Col7 Col8 Col9 Col10 Col11 Col12 Col13 Table3
Plan
4
Thursday, March 6, 2014
02/27/14 Columbia University SELECT s_season, SUM(s_qty) as sum_qty FROM sales WHERE s_shipdate >= ‘2013-01-01’ GROUP BY s_season ORDER BY s_season
Query
SALES Bool Gen Col3 Col2 Col1 Col Select Col Select Col Filter Col Filter Bool1 Stitch Col4 Col5 Table1 Agg Agg Agg Agg Parti- tion Table2 Table4 Table5 Append Final Answer Table6 Table7 Append Append Col Select Col SelectCol Select Col SelectCol Select Col SelectCol Select Col SelectCol Select Col6 Col7 Col8 Col9 Col10 Col11 Col12 Col13 Table3
Plan Nodes = Relational Operators Edges = Data Dependencies
4
Thursday, March 6, 2014
02/27/14 Columbia University
Query Plan Q100 Program
5
SALES Bool Gen Col3 Col2 Col1 Col Select Col Select Col Filter Col Filter Bool1 Stitch Col4 Col5 Table1 Agg Agg Agg Agg Parti- tion Table2 Table4 Table5 Append Final Answer Table6 Table7 Append Append Col Select Col SelectCol Select Col SelectCol Select Col SelectCol Select Col SelectCol Select Col6 Col7 Col8 Col9 Col10 Col11 Col12 Col13 Table3
MEMORY INTERCONNECT MEMORY
Q100 Device
Thursday, March 6, 2014
02/27/14 Columbia University
Query Plan Q100 Program
5
SALES Bool Gen Col3 Col2 Col1 Col Select Col Select Col Filter Col Filter Bool1 Stitch Col4 Col5 Table1 Agg Agg Agg Agg Parti- tion Table2 Table4 Table5 Append Final Answer Table6 Table7 Append Append Col Select Col SelectCol Select Col SelectCol Select Col SelectCol Select Col SelectCol Select Col6 Col7 Col8 Col9 Col10 Col11 Col12 Col13 Table3
MEMORY INTERCONNECT MEMORY
Q100 Device
Thursday, March 6, 2014
02/27/14 Columbia University
Query Plan Q100 Program
5
SALES Bool Gen Col3 Col2 Col1 Col Select Col Select Col Filter Col Filter Bool1 Stitch Col4 Col5 Table1 Agg Agg Agg Agg Parti- tion Table2 Table4 Table5 Append Final Answer Table6 Table7 Append Append Col Select Col SelectCol Select Col SelectCol Select Col SelectCol Select Col SelectCol Select Col6 Col7 Col8 Col9 Col10 Col11 Col12 Col13 Table3
MEMORY INTERCONNECT MEMORY
Q100 Device
Thursday, March 6, 2014
SALES Bool Gen Col3 Col2 Col1 Col Select Col Select Col Filter Col Filter Bool1 Stitch Col4 Col5 Table1 Agg Agg Agg Agg Parti- tion Table2 Table4 Table5 Append Final Answer Table6 Table7 Append Append Col Select Col SelectCol Select Col SelectCol Select Col SelectCol Select Col SelectCol Select Col6 Col7 Col8 Col9 Col10 Col11 Col12 Col13 Table3
02/27/14 Columbia University
Query Plan Q100 Program
6
Thursday, March 6, 2014
SALES Bool Gen Col3 Col2 Col1 Col Select Col Select Col Filter Col Filter Bool1 Stitch Col4 Col5 Table1 Agg Agg Agg Agg Parti- tion Table2 Table4 Table5 Append Final Answer Table6 Table7 Append Append Col Select Col SelectCol Select Col SelectCol Select Col SelectCol Select Col SelectCol Select Col6 Col7 Col8 Col9 Col10 Col11 Col12 Col13 Table3
02/27/14 Columbia University
Query Plan Q100 Program Spatial Instructions
6
Thursday, March 6, 2014
SALES Bool Gen Col3 Col2 Col1 Col Select Col Select Col Filter Col Filter Bool1 Stitch Col4 Col5 Table1 Agg Agg Agg Agg Parti- tion Table2 Table4 Table5 Append Final Answer Table6 Table7 Append Append Col Select Col SelectCol Select Col SelectCol Select Col SelectCol Select Col SelectCol Select Col6 Col7 Col8 Col9 Col10 Col11 Col12 Col13 Table3
02/27/14 Columbia University
Query Plan Q100 Program Temporal Instructions Spatial Instructions
6
Thursday, March 6, 2014
SALES Bool Gen Col3 Col2 Col1 Col Select Col Select Col Filter Col Filter Bool1 Stitch Col4 Col5 Table1 Agg Agg Agg Agg Parti- tion Table2 Table4 Table5 Append Final Answer Table6 Table7 Append Append Col Select Col SelectCol Select Col SelectCol Select Col SelectCol Select Col SelectCol Select Col6 Col7 Col8 Col9 Col10 Col11 Col12 Col13 Table3
02/27/14 Columbia University
Query Plan Q100 Program Temporal Instructions Spatial Instructions
6
Thursday, March 6, 2014
02/27/14 Columbia University MEMORY
SALES TABLE 7
Thursday, March 6, 2014
02/27/14 Columbia University MEMORY
SALES TABLE 7
Thursday, March 6, 2014
02/27/14 Columbia University MEMORY
SALES TABLE 7
Thursday, March 6, 2014
02/27/14 Columbia University MEMORY
SALES TABLE Partitioned Tables Temp Column 7
Thursday, March 6, 2014
02/27/14 Columbia University MEMORY
SALES TABLE Partitioned Tables Temp Column 7
Thursday, March 6, 2014
02/27/14 Columbia University MEMORY
SALES TABLE Partitioned Tables Temp Column Read datum
perform multiple
7
Thursday, March 6, 2014
02/27/14 Columbia University MEMORY
SALES TABLE Partitioned Tables Temp Column Pipeline Parallelism Temp Table Read datum
perform multiple
7
Thursday, March 6, 2014
02/27/14 Columbia University MEMORY
SALES TABLE Partitioned Tables Temp Column Pipeline Parallelism Temp Table Temp Column Data Parallelism Read datum
perform multiple
7
Thursday, March 6, 2014
02/27/14 Columbia University MEMORY
SALES TABLE Partitioned Tables Temp Column Pipeline Parallelism Temp Table Temp Column Data Parallelism Minimize Spills/Fills Read datum
perform multiple
7
Thursday, March 6, 2014
02/27/14 Columbia University MEMORY
SALES TABLE Partitioned Tables Temp Column Pipeline Parallelism Temp Table Temp Column Data Parallelism Minimize Spills/Fills Use coarse-grain hardware primitives that operate on coarse-grain data Read datum
perform multiple
7
Thursday, March 6, 2014
02/27/14 Columbia University MEMORY MEMORY INTER- CONNECT
How do we implement these
How many tiles should there be and
How do we generate these query plans? What kind of interconnect should we use? How do we schedule the plans? Is the Q100 performance and energy efficient?
8
Thursday, March 6, 2014
02/27/14 Columbia University MEMORY MEMORY INTER- CONNECT
How many tiles should there be and
How do we generate these query plans? What kind of interconnect should we use? How do we implement these
How do we schedule the plans? Is the Q100 performance and energy efficient?
8
Thursday, March 6, 2014
02/27/14 Columbia University MEMORY MEMORY INTER- CONNECT
How do we generate these query plans? What kind of interconnect should we use? How do we implement these
How many tiles should there be and
How do we schedule the plans? Is the Q100 performance and energy efficient?
8
Thursday, March 6, 2014
02/27/14 Columbia University MEMORY MEMORY INTER- CONNECT
How do we generate these query plans? What kind of interconnect should we use? How do we implement these
How many tiles should there be and
How do we schedule the plans? Is the Q100 performance and energy efficient?
8
Thursday, March 6, 2014
02/27/14 Columbia University MEMORY MEMORY INTER- CONNECT
How do we generate these query plans? What kind of interconnect should we use? Bandwidth needs on- and
How do we implement these
How many tiles should there be and
How do we schedule the plans? Is the Q100 performance and energy efficient?
8
Thursday, March 6, 2014
02/27/14 Columbia University
How do we schedule the plans?
MEMORY MEMORY INTER- CONNECT
How do we generate these query plans? What kind of interconnect should we use? Bandwidth needs on- and
How do we implement these
How many tiles should there be and
Is the Q100 performance and energy efficient?
8
Thursday, March 6, 2014
02/27/14 Columbia University MEMORY MEMORY INTER- CONNECT
How do we implement these
9
Thursday, March 6, 2014
02/27/14 Columbia University
10
IN 0 IN 1 OUT
BOOLGEN
==
EN
Thursday, March 6, 2014
02/27/14 Columbia University
10
IN 0 IN 1 OUT
BOOLGEN
==
EN
COLUMN FILTER
IN 2
Thursday, March 6, 2014
02/27/14 Columbia University
10
IN 0 IN 1 OUT
BOOLGEN
==
EN
COLUMN FILTER
IN 2
WHERE s_shipdate >= ‘2013-01-01’
Thursday, March 6, 2014
02/27/14 Columbia University
11
GRP OUT
AGG
sum
EN DATA
Thursday, March 6, 2014
02/27/14 Columbia University
11
GRP OUT
AGG
sum
EN DATA
SORT
GRP DATA
Thursday, March 6, 2014
02/27/14 Columbia University
12
GRP DATA
SORT
GRP DATA
Thursday, March 6, 2014
02/27/14 Columbia University
12
GRP DATA
SORT
GRP DATA Limitation: number of records
Thursday, March 6, 2014
02/27/14 Columbia University
12
GRP DATA
SORT
GRP DATA
PARTITION SORT SORT
Limitation: number of records
Thursday, March 6, 2014
02/27/14 Columbia University
Aggregator ALU Boolean Generator Column Filter Joiner Partitioner Sorter
13
Thursday, March 6, 2014
02/27/14 Columbia University
Aggregator ALU Boolean Generator Column Filter Joiner Partitioner Sorter
Table Appender Column Selector Column Concatenator Column Stitcher
13
Thursday, March 6, 2014
02/27/14 Columbia University
Aggregator ALU Boolean Generator Column Filter Joiner Partitioner Sorter
Table Appender Column Selector Column Concatenator Column Stitcher
Tile Characterization Methodology Verilog implementation for each tile, synthesized, placed, and routed using Synopsys 32nm Generic Libraries
13
Thursday, March 6, 2014
02/27/14 Columbia University
1 2 3 4
AGG ALU BOOLGENCOLFILTER JOIN PART SORT APPEND COLSELECTCONCAT STITCH
Critical Path (ns) 10 20 30 40 Power (mW) 0.25 0.5 0.75 1 Area (mm2)
14
Thursday, March 6, 2014
02/27/14 Columbia University
1 2 3 4
AGG ALU BOOLGENCOLFILTER JOIN PART SORT APPEND COLSELECTCONCAT STITCH
Critical Path (ns) 10 20 30 40 Power (mW) 0.25 0.5 0.75 1 Area (mm2)
14
Thursday, March 6, 2014
02/27/14 Columbia University
1 2 3 4
AGG ALU BOOLGENCOLFILTER JOIN PART SORT APPEND COLSELECTCONCAT STITCH
Critical Path (ns) 10 20 30 40 Power (mW) 0.25 0.5 0.75 1 Area (mm2)
14
Thursday, March 6, 2014
02/27/14 Columbia University
1 2 3 4
AGG ALU BOOLGENCOLFILTER JOIN PART SORT APPEND COLSELECTCONCAT STITCH
Critical Path (ns) 10 20 30 40 Power (mW) 0.25 0.5 0.75 1 Area (mm2)
Max Freq 315 MHz
14
Thursday, March 6, 2014
02/27/14 Columbia University MEMORY MEMORY INTER- CONNECT 15
How many tiles should there be and
Thursday, March 6, 2014
02/27/14 Columbia University AGGREGATOR 1 2 3 4 5 6 7 8 9 10 11 12… ALU 1 2 3 4 5 6 7 8 9 10 11 12… BOOLEAN GENERATOR 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN FILTER 1 2 3 4 5 6 7 8 9 10 11 12… JOINER 1 2 3 4 5 6 7 8 9 10 11 12… PARTITIONER 1 2 3 4 5 6 7 8 9 10 11 12… SORTER 1 2 3 4 5 6 7 8 9 10 11 12… TABLE APPENDER 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN SELECTOR 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN CONCATENATOR 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN STITCHER 1 2 3 4 5 6 7 8 9 10 11 12…
16
Thursday, March 6, 2014
02/27/14 Columbia University
17
Thursday, March 6, 2014
02/27/14 Columbia University
0.25 0.5 0.75 1 1 2 3 4 5 6 7 8 9 10 Query Runtime wrt. 1 ALU Number of ALUs
TPC-H Queries
18 Q 1 Q 2 Q 3 Q 4 Q 5 Q 6 Q 7 Q 8 Q 10 Q 11 Q 12 Q 14 Q 15 Q 16 Q 17 Q 18 Q 19 Q 20 Q 21
Thursday, March 6, 2014
02/27/14 Columbia University
0.25 0.5 0.75 1 1 2 3 4 5 6 7 8 9 10 Query Runtime wrt. 1 ALU Number of ALUs
TPC-H Queries
18 Q 1 Q 2 Q 3 Q 4 Q 5 Q 6 Q 7 Q 8 Q 10 Q 11 Q 12 Q 14 Q 15 Q 16 Q 17 Q 18 Q 19 Q 20 Q 21
Thursday, March 6, 2014
02/27/14 Columbia University AGGREGATOR 1 2 3 4 5 6 7 8 9 10 11 12… ALU 1 2 3 4 5 6 7 8 9 10 11 12… BOOLEAN GENERATOR 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN FILTER 1 2 3 4 5 6 7 8 9 10 11 12… JOINER 1 2 3 4 5 6 7 8 9 10 11 12… PARTITIONER 1 2 3 4 5 6 7 8 9 10 11 12… SORTER 1 2 3 4 5 6 7 8 9 10 11 12… TABLE APPENDER 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN SELECTOR 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN CONCATENATOR 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN STITCHER 1 2 3 4 5 6 7 8 9 10 11 12…
19
Thursday, March 6, 2014
02/27/14 Columbia University AGGREGATOR 1 2 3 4 5 6 7 8 9 10 11 12… ALU 1 2 3 4 5 6 7 8 9 10 11 12… BOOLEAN GENERATOR 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN FILTER 1 2 3 4 5 6 7 8 9 10 11 12… JOINER 1 2 3 4 5 6 7 8 9 10 11 12… PARTITIONER 1 2 3 4 5 6 7 8 9 10 11 12… SORTER 1 2 3 4 5 6 7 8 9 10 11 12… TABLE APPENDER 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN SELECTOR 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN CONCATENATOR 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN STITCHER 1 2 3 4 5 6 7 8 9 10 11 12…
19
Thursday, March 6, 2014
02/27/14 Columbia University AGGREGATOR 1 2 3 4 5 6 7 8 9 10 11 12… ALU 1 2 3 4 5 6 7 8 9 10 11 12… BOOLEAN GENERATOR 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN FILTER 1 2 3 4 5 6 7 8 9 10 11 12… JOINER 1 2 3 4 5 6 7 8 9 10 11 12… PARTITIONER 1 2 3 4 5 6 7 8 9 10 11 12… SORTER 1 2 3 4 5 6 7 8 9 10 11 12… TABLE APPENDER 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN SELECTOR 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN CONCATENATOR 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN STITCHER 1 2 3 4 5 6 7 8 9 10 11 12…
2.9 Million Designs!! 19
Thursday, March 6, 2014
02/27/14 Columbia University AGGREGATOR 1 2 3 4 5 6 7 8 9 10 11 12… ALU 1 2 3 4 5 6 7 8 9 10 11 12… BOOLEAN GENERATOR 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN FILTER 1 2 3 4 5 6 7 8 9 10 11 12… JOINER 1 2 3 4 5 6 7 8 9 10 11 12… PARTITIONER 1 2 3 4 5 6 7 8 9 10 11 12… SORTER 1 2 3 4 5 6 7 8 9 10 11 12… TABLE APPENDER 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN SELECTOR 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN CONCATENATOR 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN STITCHER 1 2 3 4 5 6 7 8 9 10 11 12…
2.9 Million Designs!! Explore tiles that consume >= 5mW 4 6 6 8 7 3 2 1 2 3 4 5 1 2 3 4 5 6 1 2 3 4 5 4 19
Thursday, March 6, 2014
02/27/14 Columbia University AGGREGATOR 1 2 3 4 5 6 7 8 9 10 11 12… ALU 1 2 3 4 5 6 7 8 9 10 11 12… BOOLEAN GENERATOR 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN FILTER 1 2 3 4 5 6 7 8 9 10 11 12… JOINER 1 2 3 4 5 6 7 8 9 10 11 12… PARTITIONER 1 2 3 4 5 6 7 8 9 10 11 12… SORTER 1 2 3 4 5 6 7 8 9 10 11 12… TABLE APPENDER 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN SELECTOR 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN CONCATENATOR 1 2 3 4 5 6 7 8 9 10 11 12… COLUMN STITCHER 1 2 3 4 5 6 7 8 9 10 11 12…
150 Designs Explore tiles that consume >= 5mW 4 6 6 8 7 3 2 1 2 3 4 5 1 2 3 4 5 6 1 2 3 4 5 4 19
Thursday, March 6, 2014
2 4 6 8 10 12 0.2 0.4 0.6 TPC-H Runtime (miliseconds) Power (Watts)
02/27/14 Columbia University
20
Thursday, March 6, 2014
2 4 6 8 10 12 0.2 0.4 0.6 TPC-H Runtime (miliseconds) Power (Watts)
02/27/14 Columbia University
20
Low Power 1 ALU 1 Partitioner 1 Sorter
Thursday, March 6, 2014
2 4 6 8 10 12 0.2 0.4 0.6 TPC-H Runtime (miliseconds) Power (Watts)
02/27/14 Columbia University
20
Low Power 1 ALU 1 Partitioner 1 Sorter Pareto 4 ALUs 2 Partitioners 1 Sorter
Thursday, March 6, 2014
2 4 6 8 10 12 0.2 0.4 0.6 TPC-H Runtime (miliseconds) Power (Watts)
02/27/14 Columbia University
20
Low Power 1 ALU 1 Partitioner 1 Sorter Pareto 4 ALUs 2 Partitioners 1 Sorter High Perf 5 ALUs 3 Partitioners 6 Sorters
Thursday, March 6, 2014
02/27/14 Columbia University MEMORY MEMORY INTER- CONNECT 21
Bandwidth needs on- and
Thursday, March 6, 2014
02/27/14 Columbia University
10 20 30 40 50 5 10 15 20 IDEAL
Runtime Normalized to IDEAL NoC BW Limit (GB/s)
High Perf
22
Thursday, March 6, 2014
02/27/14 Columbia University
10 20 30 40 50 5 10 15 20 IDEAL
Runtime Normalized to IDEAL NoC BW Limit (GB/s)
High Perf
23
Thursday, March 6, 2014
02/27/14 Columbia University
10 20 30 40 50 5 10 15 20 IDEAL
Runtime Normalized to IDEAL NoC BW Limit (GB/s)
Pareto High Perf
23
Thursday, March 6, 2014
02/27/14 Columbia University
10 20 30 40 50 5 10 15 20 IDEAL
Runtime Normalized to IDEAL NoC BW Limit (GB/s)
Low Power Pareto High Perf
23
Thursday, March 6, 2014
02/27/14 Columbia University
10 20 30 40 50 5 10 15 20 IDEAL
Runtime Normalized to IDEAL NoC BW Limit (GB/s)
Low Power Pareto High Perf
23
NoC Limit @ 6.3 GB/s Scaled down from Intel TeraFlop
Thursday, March 6, 2014
02/27/14 Columbia University
10 20 30 40
Q 14 Q 19 Q 12 Q 8 Q 6 Q 17 Q 7 Q 5 Q 15 Q 4 Q 1 Q 3 Q 16 Q 18 Q 21 Q 2 Q 20 Q 10 Q 11
Bandwidth (GB/s)
24
Read Write
Low Power
Thursday, March 6, 2014
02/27/14 Columbia University
10 20 30 40
Q 14 Q 19 Q 12 Q 8 Q 6 Q 17 Q 7 Q 5 Q 15 Q 4 Q 1 Q 3 Q 16 Q 18 Q 21 Q 2 Q 20 Q 10 Q 11
Bandwidth (GB/s)
Q 14 Q 17 Q 19 Q 8 Q 5 Q 7 Q 3 Q 4 Q 15 Q 12 Q 6 Q 18 Q 1 Q 21 Q 16 Q 2 Q 11 Q 20 Q 10 Q 14 Q 19 Q 8 Q 4 Q 7 Q 12 Q 1 Q 3 Q 17 Q 21 Q 15 Q 18 Q 5 Q 6 Q 16 Q 2 Q 10 Q 20 Q 11
25
Read Write Read Write Read Write
Low Power Pareto High Perf
Thursday, March 6, 2014
02/27/14 Columbia University
10 20 30 40
Q 14 Q 19 Q 12 Q 8 Q 6 Q 17 Q 7 Q 5 Q 15 Q 4 Q 1 Q 3 Q 16 Q 18 Q 21 Q 2 Q 20 Q 10 Q 11
Bandwidth (GB/s)
Q 14 Q 17 Q 19 Q 8 Q 5 Q 7 Q 3 Q 4 Q 15 Q 12 Q 6 Q 18 Q 1 Q 21 Q 16 Q 2 Q 11 Q 20 Q 10 Q 14 Q 19 Q 8 Q 4 Q 7 Q 12 Q 1 Q 3 Q 17 Q 21 Q 15 Q 18 Q 5 Q 6 Q 16 Q 2 Q 10 Q 20 Q 11
25
Read Write Read Write Read Write
Low Power Pareto High Perf
Thursday, March 6, 2014
02/27/14 Columbia University
10 20 30 40
Q 14 Q 19 Q 12 Q 8 Q 6 Q 17 Q 7 Q 5 Q 15 Q 4 Q 1 Q 3 Q 16 Q 18 Q 21 Q 2 Q 20 Q 10 Q 11
Bandwidth (GB/s)
Q 14 Q 17 Q 19 Q 8 Q 5 Q 7 Q 3 Q 4 Q 15 Q 12 Q 6 Q 18 Q 1 Q 21 Q 16 Q 2 Q 11 Q 20 Q 10 Q 14 Q 19 Q 8 Q 4 Q 7 Q 12 Q 1 Q 3 Q 17 Q 21 Q 15 Q 18 Q 5 Q 6 Q 16 Q 2 Q 10 Q 20 Q 11
25
Read Write Read Write Read Write
Low Power Pareto High Perf BW Write Limit @ 10 GB/s
Thursday, March 6, 2014
02/27/14 Columbia University
10 20 30 40
Q 14 Q 19 Q 12 Q 8 Q 6 Q 17 Q 7 Q 5 Q 15 Q 4 Q 1 Q 3 Q 16 Q 18 Q 21 Q 2 Q 20 Q 10 Q 11
Bandwidth (GB/s)
Q 14 Q 17 Q 19 Q 8 Q 5 Q 7 Q 3 Q 4 Q 15 Q 12 Q 6 Q 18 Q 1 Q 21 Q 16 Q 2 Q 11 Q 20 Q 10 Q 14 Q 19 Q 8 Q 4 Q 7 Q 12 Q 1 Q 3 Q 17 Q 21 Q 15 Q 18 Q 5 Q 6 Q 16 Q 2 Q 10 Q 20 Q 11
25
Read Write Read Write Read Write
Low Power Pareto High Perf BW Write Limit @ 10 GB/s BW Read Limit @ 20 or 30 GB/s
Thursday, March 6, 2014
02/27/14 Columbia University MEMORY MEMORY INTER- CONNECT
Is the Q100 performance and energy efficient?
26
Thursday, March 6, 2014
02/27/14 Columbia University
27
Thursday, March 6, 2014
02/27/14 Columbia University
28 0% 10% 20% 30% 100X Input Data Size Relative Runtime 0% 5% 10% 15% 100X Input Data Size Relative Power 0% 0.25% 0.50% 0.75% 1.00% 100X Input Data Size Relative Energy LowPower Pareto HighPerf
Thursday, March 6, 2014
02/27/14 Columbia University
28 0% 10% 20% 30% 100X Input Data Size Relative Runtime 0% 5% 10% 15% 100X Input Data Size Relative Power 0% 0.25% 0.50% 0.75% 1.00% 100X Input Data Size Relative Energy 37X-70X Better Performance LowPower Pareto HighPerf
Thursday, March 6, 2014
02/27/14 Columbia University
28 0% 10% 20% 30% 100X Input Data Size Relative Runtime 0% 5% 10% 15% 100X Input Data Size Relative Power 0% 0.25% 0.50% 0.75% 1.00% 100X Input Data Size Relative Energy 37X-70X Better Performance 1/1000th Energy Consumption LowPower Pareto HighPerf
Thursday, March 6, 2014
02/27/14 Columbia University
28 0% 10% 20% 30% 100X Input Data Size Relative Runtime 0% 5% 10% 15% 100X Input Data Size Relative Power 0% 0.25% 0.50% 0.75% 1.00% 100X Input Data Size Relative Energy 37X-70X Better Performance 1/1000th Energy Consumption LowPower Pareto HighPerf
Thursday, March 6, 2014
02/27/14 Columbia University
28 0% 10% 20% 30% 100X Input Data Size Relative Runtime 0% 5% 10% 15% 100X Input Data Size Relative Power 0% 0.25% 0.50% 0.75% 1.00% 100X Input Data Size Relative Energy 37X-70X Better Performance 1/1000th Energy Consumption 10X Performance LowPower Pareto HighPerf
Thursday, March 6, 2014
02/27/14 Columbia University
28 0% 10% 20% 30% 100X Input Data Size Relative Runtime 0% 5% 10% 15% 100X Input Data Size Relative Power 0% 0.25% 0.50% 0.75% 1.00% 100X Input Data Size Relative Energy 37X-70X Better Performance 1/1000th Energy Consumption < 1/100th Energy Consumption 10X Performance LowPower Pareto HighPerf
Thursday, March 6, 2014
02/27/14 Columbia University
accelerator for analytical database workloads
device gets exceptional performance and energy efficiency
29
Thursday, March 6, 2014
30 Thursday, March 6, 2014