FineStream: Fine-Grained Window-Based Stream Processing on CPU-GPU - PowerPoint PPT Presentation

USENIX ATC’20 2020 USENIX Annual Technical Conference JULY 15 – 17, 2020 FineStream: Fine-Grained Window-Based Stream Processing on CPU-GPU Integrated Architectures Feng Zhang, Lin Yang, Shuhao Zhang, Bingsheng He, Wei Lu, Xiaoyong Du Renmin University of China Technische Universit ả t Berlin National University of Singapore 1

Outline 1. Background 2. Motivation 3. Challenges 4. FineStream 5. Evaluation 6. Conclusion 2

1. Background • Continuous operator model • Bulk-synchronous parallel model • operator granularity • query granularity operator 1 operator 1 CPU CPU query query operator 2 operator 2 … … GPU GPU operator n operator n [ SIGMOD’16 ] Saber : Window-based hybrid stream This paper processing for heterogeneous architectures CPU and GPU can concurrently execute in both cases — only the granularity is different. 3

2. Integrated Architectures • 2012, Jan • 2011, Jan • 2014, Apr • Int Intel l Iv Ivy y Br Bridge • AMD APU • Nvi vidia ia Tegra Benefit Be its • No PCI-e transfer overhead • Shared global memory • High energy efficiency 4

1. Background • Integrated architectures vs. discrete architectures Integrated architectures Discrete architectures Architecture A10-7850K Ryzen5 2400G GTX 1080Ti V100 #cores 512+4 704+4 3584 5120 TFLOPS 0.9 1.7 11.3 14.1 bandwidth (GB/s) 25.6 38.4 484.4 900 price ($) 209 169 1100 8999 TDP (W) 95 65 250 300 5

3. Stream Processing with SQL window size • Data stream • Window window w1 data stream • Operator … tuple … • Query window w2 * Batch window slide operator 1 operator 2 query … operator n results 6

2. Motivation • Varying Operator-Device Preference query on CPU query on GPU GPU queue: operator 1 operator 2 5.2 ms 18.2 ms group-by aggregation operator 1 operator 2 6.7 ms 5.8 ms CPU group-by aggregation queue: time 8

2. Motivation • Performance (tuples/s) of operators on the CPU and the GPU of the integrated architecture. Operator CPU only GPU only Device choice Projection 14.2 14.3 GPU Selection 13.1 14.1 GPU Aggregation 14.7 13.5 CPU Group-by 8.1 12.4 GPU Join 0.7 0.1 CPU 9

2. Motivation • Fine-Grained Stream Processing • A fine-grained stream processing method that can consider both integrated architecture characteristics and operator features shall have better performance. • memory bandwidth limit • operators - preferred devices • CPU and GPU have good performance • consider the interplay of operator features and architecture difference. 10

3. Challenges • Challenge 1: Application topology combined with architectural characteristics GPU CPU OP2 OP3 GPU GPU GPU CPU CPU CPU … … core core core core core core OP7 CPU Cache GPU Cache OP5 OP6 OP11 Shared Memory Management Unit OP9 OP10 System DRAM 12

3. Challenges • Challenge 2: SQL query plan optimization with shared main memory GPU only CPU only CPU-GPU co-run 6.7 ms 18.2 ms 22.4 ms GPU queue: query on GPU query on GPU CPU queue: query on CPU query on CPU time 13

3. Challenges • Challenge 3: Adjustment for dynamic workload 90% OP2 10% OP2 OP1 OP1 10% 90% OP3 OP3 14

4. FineStream • Overview stream batch batch … … online dispatcher SQL profiling operators op1 op2 op1 performance results model dev dev dev device mapping FineStream 16

4. FineStream • Topology branch1 OP1 OP2 OP3 path critical OP7 branch2 OP4 OP5 OP6 OP11 branch3 OP8 OP9 OP10 17

4. FineStream • Optimization 1: Branch Co-Running t stage t stage3 t stage2 t stage3 t stage2 t stage1 1 branch 1 branch 1 branch 3 branch 2 branch 2 branch 3 branch 3 time time (a) Branch parallelism. (b) Branch scheduling optimization. 18

4. FineStream • Optimization 2: Batch Pipeline phase 2 phase 1 PH i : phase i B i : batch i OP1 OP2 OP3 PH2 B1 PH2 B2 OP7 OP4 OP5 OP6 OP11 PH1 B1 PH1 B2 … OP8 OP9 OP10 time (a) Phase partitioning. (b) Batch pipeline. 19

4. FineStream • Optimization 3: Handling Dynamic Workload • Light-Weight Resource Reallocation Integrated Integrated Shared memory Shared memory architectures architectures 90% 10% OP2 GPU CPU OP2 CUs CUs OP1 OP1 CPU GPU 10% 90% OP3 OP3 CUs CUs (a) 90% workload goes to OP2. (b) 90% workload goes to OP3. • Query Plan Adjustment 20

4. FineStream • Execution flow stream 1 parallelism thread 1 thread 2 operators performance utilization DAG 1 DAG i CPU OP CPU% GPU% … Branch … batch1 CPU OP1 GPU OP1 … … … … Co-Running … … … … … … … Batch OPi bandwidth batch2 GPU OPi … … … … Pipeline utilization default batch3 dynamic- operator yes migration resource query plan still low batch4 dataflow workload detected reallocation adjustment performance? … detection monitoring time 21

5. Evaluation • Platforms Example - Q1 • AMD A10- 7850K • Ryzen 5 2400G (Google compute cluster monitoring) • Datasets • Google compute cluster monitoring select timestamp, category, sum (cpu) • Anomaly detection in smart grids as totalCPU • Linear road benchmark from TaskEvents [ range 256 slide 1] • Synthetically generated dataset group by category • Benchmarks • Nine queries 23

5. Evaluation • Throughput: FineStream achieves the best performance in most cases. 25 Single Saber FineStream (1E5 tuples/s) 20 Throughput 15 10 5 0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 A10-7850K Ryzen5 2400G 24

5. Evaluation • Latency: Low latency in most cases. 1.5 Single Saber FineStream Latency (s) 1 0.5 0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 A10-7850K Ryzen5 2400G 25

5. Evaluation • Throughput vs. latency • Queries with high throughput usually have low latency, and vice versa. 25 FineStream(A10-7850K) 20 (1E5 tuples/s) Saber(A10-7850K) Throughput 15 FineStream(Ryzen5) Saber(Ryzen) 10 5 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Latency (s) 26

5. Evaluation • Utilization • FineStream utilizes the GPU device better on the integrated architecture. 100 Saber FineStream utilization (%) 80 60 40 20 0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 CPU GPU 27

5. Evaluation • Comparison with Discrete Architectures • Throughput: The discrete GPUs exhibit 1.8x to 5.7x higher throughput than the integrated architectures, due to the more computational power of discrete GPUs. • Latency: • Discrete GPUs: Ttotal = TPCIe_transmit + Tcompute • Integrated GPUs: Ttotal = Tcompute 28

5. Evaluation • Comparison with Discrete Architectures • High Price-Throughput Ratio 14000 1080ti v100 A10-7850K Ryzen Price-performance ratio 12000 (performance/USD) 10000 8000 6000 4000 2000 0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 29

5. Evaluation • Comparison with Discrete Architectures • High Energy Efficiency 35000 1080ti v100 A10-7850K Ryzen (performance/Watt) 30000 Energy efficiency 25000 20000 15000 10000 5000 0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 30

USENIX ATC’20 2020 USENIX Annual Technical Conference JULY 15 – 17, 2020 6. Conclusion • The first fine-grained window-based relational stream processing. • Lightweight query plan adaptations handling dynamic workloads. • FineStream evaluation on a set of stream queries. Feng Zhang, Lin Yang, Shuhao Zhang, Bingsheng He, Wei Lu, Xiaoyong Du Renmin University of China, Technische Universit ả t Berlin, National University of Singapore fengzhang@ruc.edu.cn, yanglin2330@ruc.edu.cn, shuhao.zhang@tu-berlin.de, hebs@comp.nus.edu.sg, lu-wei@ruc.edu.cn, duyong@ruc.edu.cn 32

FineStream: Fine-Grained Window-Based Stream Processing on CPU-GPU - PowerPoint PPT Presentation

USENIX ATC20 2020 USENIX Annual Technical Conference JULY 15 17, 2020 FineStream: Fine-Grained Window-Based Stream Processing on CPU-GPU Integrated Architectures Feng Zhang, Lin Yang, Shuhao Zhang, Bingsheng He, Wei Lu, Xiaoyong Du

Fine Grained Access Control Fine-Grained Access Control Fine Grained Access Control

Fine-Grained Access Control Fine Grained Access Control Fine-grained access control examples:

Fine-Grained Geographic Communication (Geocast) Nexus Workshop Frank Drr 23.07.2003 1

Average-Case Fine-Grained Hardness Marshall Ball Alon Rosen Manuel Sabin Prashant Nalini

Fine-grained Visual Analysis: From Classification to Retrieval Yi-Zhe Song SketchX Lab, CVSSP,

Mechanized Verification of Fine-grained Concurrent Programs Ilya Sergey Aleks Nanevski

Fine-Grained Power Modeling for Smartphones Using System Call Tracing Based on paper and

Junfeng Fan ESAT/COSIC ECC implementation methods Multi-core systems Coarse-Grained

Fine-grained Image Recognition Lei Wang VILA group School of Computing and Information

Combining Data-Intense and Compute-Intense Methods for Fine-Grained Morphological Analyses Petra

Fine-Grained Tracking of Grid Infections Ashish Gehani SRI Basim Baig, Salman Mahmood, Dawood

Addressing Inter-Class Similarity in Fine-Grained Visual Classification Abhimanyu Dubey

On the Correctness Criteria of Fine-Grained Access Control in Relational Databases Qihua Wang,

Fine Grained Coordinated Parallelism in a Real World Application Mohammad Rezaei, PhD June 2012

Phase Transition in 3SAT Yi Zhou Phase Transition in 3SAT Phase Transition in 3SAT Fine Grained

Enhancing Fine- Grained Parallelism Loop vectorization, Loop distribution, Scalar expansion

Caches, Coherence, and Consistency (and Consensus) Dan Ports, CSEP 552 Caching Simple

Br Brainstormi ming No screens Say your name Prof. Lydia Chilton COMS 4170 19 March 2018 1

Welcome (Open forum in the Harvard Physics Department, to discuss issues of Equity, Diversity,

c pentaquark channels Ur sa Skerbi s ursa.skerbis@ijs.si in collaboration with: Sa sa

Trial Use of the USACE Risk Management Method Case Study #1 Remedial Investigation at Assateague

ASX Investor Hour DISCLAIMER: The following material was presented at ASX Investor Hour. The

The Origins of Stories in the Old T estament Atheism Q&A 1. History of religions of

We are not a perfect church We are not a perfect people We are here because we know we need

FineStream: Fine-Grained Window-Based Stream Processing on CPU-GPU - PowerPoint PPT Presentation

USENIX ATC20 2020 USENIX Annual Technical Conference JULY 15 17, 2020 FineStream: Fine-Grained Window-Based Stream Processing on CPU-GPU Integrated Architectures Feng Zhang, Lin Yang, Shuhao Zhang, Bingsheng He, Wei Lu, Xiaoyong Du

Fine Grained Access Control Fine-Grained Access Control Fine Grained Access Control

Fine-Grained Access Control Fine Grained Access Control Fine-grained access control examples:

Fine-Grained Geographic Communication (Geocast) Nexus Workshop Frank Drr 23.07.2003 1

Average-Case Fine-Grained Hardness Marshall Ball Alon Rosen Manuel Sabin Prashant Nalini

Fine-grained Visual Analysis: From Classification to Retrieval Yi-Zhe Song SketchX Lab, CVSSP,

Mechanized Verification of Fine-grained Concurrent Programs Ilya Sergey Aleks Nanevski

Fine-Grained Power Modeling for Smartphones Using System Call Tracing Based on paper and

Junfeng Fan ESAT/COSIC ECC implementation methods Multi-core systems Coarse-Grained

Fine-grained Image Recognition Lei Wang VILA group School of Computing and Information

Combining Data-Intense and Compute-Intense Methods for Fine-Grained Morphological Analyses Petra

Fine-Grained Tracking of Grid Infections Ashish Gehani SRI Basim Baig, Salman Mahmood, Dawood

Addressing Inter-Class Similarity in Fine-Grained Visual Classification Abhimanyu Dubey

On the Correctness Criteria of Fine-Grained Access Control in Relational Databases Qihua Wang,

Fine Grained Coordinated Parallelism in a Real World Application Mohammad Rezaei, PhD June 2012

Phase Transition in 3SAT Yi Zhou Phase Transition in 3SAT Phase Transition in 3SAT Fine Grained

Enhancing Fine- Grained Parallelism Loop vectorization, Loop distribution, Scalar expansion

Caches, Coherence, and Consistency (and Consensus) Dan Ports, CSEP 552 Caching Simple

Br Brainstormi ming No screens Say your name Prof. Lydia Chilton COMS 4170 19 March 2018 1

Welcome (Open forum in the Harvard Physics Department, to discuss issues of Equity, Diversity,

c pentaquark channels Ur sa Skerbi s ursa.skerbis@ijs.si in collaboration with: Sa sa

Trial Use of the USACE Risk Management Method Case Study #1 Remedial Investigation at Assateague

ASX Investor Hour DISCLAIMER: The following material was presented at ASX Investor Hour. The

The Origins of Stories in the Old T estament Atheism Q&amp;A 1. History of religions of

We are not a perfect church We are not a perfect people We are here because we know we need

The Origins of Stories in the Old T estament Atheism Q&A 1. History of religions of