Low-Latency Transaction Execution
- n Graphics Processors:
Dream or Reality?
Iya Arefyeva, Gabriel Campero Durand, Marcus Pinnecke, David Broneske, Gunter Saake
Workgroup Databases and Software Engineering University of Magdeburg
Low-Latency Transaction Execution on Graphics Processors: Dream or - - PowerPoint PPT Presentation
Workgroup Databases and Software Engineering University of Magdeburg Low-Latency Transaction Execution on Graphics Processors: Dream or Reality? Iya Arefyeva, Gabriel Campero Durand, Marcus Pinnecke, David Broneske , Gunter Saake Motivation:
Workgroup Databases and Software Engineering University of Magdeburg
2
Summit Supercomputer, Oak Ridge
3
4
Caldera Architecture [5]
5
SM structure of Nvidia's Pascal GP100 SM [9]
6
Experiments with GPUTx [6]
7
8
batch collection
9
GPU is used), only the necessary data is transferred
thread
several cases need to be considered.
transaction manager client 1 client 2 client N
batch processing
replying to clients request request request
batch processing batch processing
10
write 1 write 2 write 3 write 4 write 5 write 6 write 7 key 22 key 4 key 19 key 8 key 10 key 1 key 56 write 8 key 5
new request collected batch for writes
11
write 1 write 2 write 3 write 4 write 5 write 6 write 7 key 22 key 4 key 19 key 8 key 10 key 1 key 56 write 8 key 5 write 8 key 5
new request collected batch for writes
12
write 1 write 2 write 3 write 4 write 5 write 6 write 7 key 22 key 4 key 19 key 8 key 10 key 1 key 56 batch processing write 8 key 5 write 1 write 2 write 3 write 4 write 5 write 6 write 7 key 22 key 4 key 19 key 8 key 10 key 1 key 56 write 8 key 5 write 8 key 5
new request collected batch for writes batch full
13
write 8 key 4 write 1 write 2 write 3 write 4 write 5 write 6 key 22 key 4 key 19 key 8 key 10 key 1
collected batch for writes new request
14
write 8 key 4 write 1 write 2 write 3 write 4 write 5 write 6 key 22 key 4 key 19 key 8 key 10 key 1
collected batch for writes new request
batch processing write 1 write 2 write 3 write 4 write 5 write 6 key 22 key 4 key 19 key 8 key 10 key 1
flush writes
15
write 8 key 4 write 1 write 2 write 3 write 4 write 5 write 6 key 22 key 4 key 19 key 8 key 10 key 1
collected batch for writes new request
batch processing write 8 key 4 write 1 write 2 write 3 write 4 write 5 write 6 key 22 key 4 key 19 key 8 key 10 key 1
flush writes collected batch for writes
16
read 5 key 4 write 1 write 2 write 3 write 4 write 5 write 6 key 22 key 4 key 19 key 8 key 10 key 1
new request
batch processing read 5 key 4 write 1 write 2 write 3 write 4 write 5 write 6 key 22 key 4 key 19 key 8 key 10 key 1 read 1 read 2 read 3 read 4 key 7 key 13 key 32 key 25
flush writes collected batch for writes collected batch for reads
17
write 5 key 4 read 1 read 2 read 3 read 4 read 5 read 6 read 7 key 22 key 4 key 19 key 8 key 10 key 1 key 56
new request
batch processing write 5 key 4 read 1 read 2 read 3 read 4 read 5 read 6 read 7 key 22 key 4 key 19 key 8 key 10 key 1 key 56 write 1 write 2 write 3 write 4 key 7 key 13 key 32 key 25
flush reads collected batch for reads collected batch for writes
18
19
YCSB client architecture [7]
100k read operations All fields of a tuple are read Zipfian distribution of requests 1 million update operations Only one field is updated Zipfian distribution of requests 100k read/update operations (50% reads and 50% updates) 80% operations access last entries (20% of tuples)
20
Goal: Evaluating performance on independent reads or write to find the impact of batch size Goal: What is the impact of concurrency control? Do stale reads improve performance?
21
22
23
with CC w/o CC with CC w/o CC
24
with CC w/o CC with CC w/o CC
25
26
27
28
29
1. He, B., Lu, M., Yang, K., Fang, R., Govindaraju, N.K., Luo, Q. and Sander, P.V., 2009. Relational query coprocessing on graphics processors. ACM Transactions on Database Systems (TODS), 34(4), p.21. 2. Breß, S. and Saake, G., 2013. Why it is time for a HyPE: A hybrid query processing engine for efficient GPU coprocessing in DBMS. Proceedings of the VLDB Endowment, 6(12), pp.1398-1403. 3. Breß, S., 2014. The design and implementation of CoGaDB: A column-oriented GPU-accelerated
4. Heimel, M., Saecker, M., Pirk, H., Manegold, S. and Markl, V., 2013. Hardware-oblivious parallelism for in-memory column-stores. Proceedings of the VLDB Endowment, 6(9), pp.709-720. 5. Appuswamy, R., Karpathiotakis, M., Porobic, D. and Ailamaki, A., 2017. The Case For Heterogeneous HTAP. In 8th Biennial Conference on Innovative Data Systems Research (No. EPFL-CONF-224447). 6. He, B. and Yu, J.X., 2011. High-throughput transaction executions on graphics processors. Proceedings of the VLDB Endowment, 4(5), pp.314-325. 7. Cooper, Brian F., Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. "Benchmarking cloud serving systems with YCSB." In Proceedings of the 1st ACM symposium on Cloud computing, pp. 143-154. ACM, 2010. 8. MapD Product Website: https://www.mapd.com/ 9. Soyata, Tolga. GPU Parallel Program Development Using CUDA. Chapman and Hall/CRC, 2018.
30
10. Top 500 news: https://www.top500.org/news/new-gpu-accelerated-supercomputers-change-the-balance-of-power-o n-the-top500/ 11. Appuswamy, Raja, Angelos C. Anadiotis, Danica Porobic, Mustafa K. Iman, and Anastasia Ailamaki. "Analyzing the impact of system architecture on the scalability of OLTP engines for high-contention workloads." Proceedings of the VLDB Endowment 11, no. 2 (2017): 121-134.
31
Source: https://blog.acolyer.org/2016/02/24/a-critique-of-ansi-sql-isolation-levels/
32
State-of-the-art PM and TxSM design for GPUs [6]
33 Block 1 Shared memory
Thread Thread
Constant memory Global memory Texture memory
Block 2 Shared memory
Thread Thread
CPU
Registers Registers Registers Registers
Local memory Local memory Local memory Local memory
34
35
A a1 B C b1 c1 a2 b2 c2 a3 b3 c3 A a1 B C b1 c1 a2 b2 c2 a3 b3 c3