Eliminating the Bandwidth Bottleneck of Central Query Dispatching - PowerPoint PPT Presentation

Eliminating the Bandwidth Bottleneck of Central Query Dispatching Through TCP Connection Hand-Over Stefan Klauck 1 , Max Plauth 1 , Sven Knebel 1 , Marius Strobl 2 , Douglas Santry 2 , Lars Eggert 2 1 Hasso Plattner Institute, University of Potsdam, Germany 2 . March, 2019 Image: wolfro54 CC BY - NC - ND 2 . 0

Motivation In scale-out database systems, queries must be routed to individual servers. 2

Motivation In scale-out database systems, queries must be routed to individual servers. Central Dispatcher Direct Communication Client 1 DB Backend 1 Client 1 DB Backend 1 … … … … Dispatcher Client m DB Backend n Client m DB Backend n + Latency + Simple clients / dynamic backends - Requires smart clients or static backends - Central dispatcher is potential bottleneck 3

Motivation – Use Cases for Central Dispatching Shard 1 ■ Horizontal Partitioning / Sharded Database 1 2 3 4 Client 1 … Dispatcher Client m Shard 2 5 6 7 8 Replica 1 5 1 ■ Partially Replicated Database System 6 q 1 (100%) q 1 1 2 3 4 10% 2 3 4 q 5 (50%) 7 8 9 10 q 2 3 4 5 6 15% 25% □ Maximize throughput q 3 7 8 9 Replica 2 25% q 2 (100%) q 4 20% 8 9 10 5 1 q 5 (33.3%) q 5 6 1 30% by balancing the load evenly 25% Database 2 3 4 7 8 9 10 Client 1 Scale-Out Client 1 5 1 Dispatcher … … 6 Replica 3 while minimizing memory footprint Client m Client m 2 3 4 7 8 9 10 5 1 q 3 (100%) 6 25% 2 3 4 7 8 9 10 Replica 4 q 4 (100%) q 5 (16.6%) 4 5 1 Rabl et Jacobsen. Query Centric Partitioning and Allocation for Partially Replicated Database Systems. SIGMOD 2017. 25% 6 2 3 4 Klauck et Schlosser. Workload-Driven Fragment Allocation for Partially Replicated Databases Using Linear Programming. ICDE 2019. 7 8 9 10

Motivation – Central Dispatching from a Network Perspective >>> import psycopg2 >>> conn = psycopg2.connect("dbname='tpch' host='dispatcher'”) client1:65140 database1:5432 ■ Logical view Client 1 DB Backend 1 dispatcher:65228 … … Dispatcher dispatcher:5432 dispatcher:65231 Client m DB Backend n client2:65144 database2:5432 5

Motivation – Central Dispatching from a Network Perspective >>> import psycopg2 >>> conn = psycopg2.connect("dbname='tpch' host='dispatcher'”) client1:65140 database1:5432 ■ Logical view Client 1 DB Backend 1 dispatcher:65228 … … Dispatcher dispatcher:5432 dispatcher:65231 Client m DB Backend n client2:65144 database2:5432 client1:65140 database1:5432 Dispatcher ■ Physical view Client 1 DB Backend 1 dispatcher:65231 dispatcher:5432 dispatcher:65228 … … Client m Switch DB Backend n 6 client2:65144 database2:5432

Motivation ■ Whether the dispatcher becomes a bottleneck depends on the workload □ Number and size of queries/messages Client 1 DB Backend 1 … … Dispatcher □ Ratio of processed tuples and result set size Client m DB Backend n 7

Motivation ■ Whether the dispatcher becomes a bottleneck depends on the workload □ Number and size of queries/messages Client 1 DB Backend 1 … … Dispatcher □ Ratio of processed tuples and result set size Client m DB Backend n ■ “Transferring a large amount of data out of a database system to a client program is a common task.” Raasveldt et Mühleisen. Don’t Hold My Data Hostage – A Case For Client Protocol Redesign. VLDB 2017. □ Needed for statistical analyses or machine learning in clients □ Main bottleneck is network bandwidth 8

Research Goals ■ Integration of a TCP connection hand-over by means of a reprogrammable network switch into a database ■ Comparison of query-based dispatching approaches in terms of □ Throughput scaling □ Processing flexibility 9

Dispatcher Implementations ■ Traditional architecture with two separate TCP connections: client ßà dispatcher ßà database 1. HAProxy – free and open source TCP/HTTP load balancer 2. Hyrise dispatcher https://github.com/hyrise ■ Using a reprogrammable switch to perform TCP connection hand-over 3. Prism : exchange most packets directly between client and backend Y. Hayakawa et al. Prism: A Proxy Architecture for Datacenter Networks. SoCC 2017. 10

Dispatcher Implementations - Prism ■ Client query is initially sent/routed to Prism Controller ■ Prism Controller forwards connection to an appropriate backend and reprograms the switch ■ Backend processes query and sends result directly to the client (bypassing the Prism Controller) ■ Backend hands back connection to Prism Controller Prism Controller Unmatched Packets Transform Rules Connection Hand-O ff /Hand-Back Lookup(Src IP , Src TCP Rewrite Paket DB Backend Port, Dst IP , Dst Port) Information Client Client Switch Logic Prism Interface 11 Prism Switch

Experimental Evaluation ■ 10Gb and 40Gb Ethernet experiments □ Hyrise with a stored procedure https://github.com/hyrise □ wrk - HTTP benchmarking tool https://github.com/wg/wrk □ mSwitch - software switch Honda et al. mSwitch: A Highly-Scalable, Modular Software Switch. SOSR 2015. Client 1 Load-Balancer DB Backend 1 Client 1 DB Backend 1 wrk 1 Hyrise Dispatcher/ Hyrise 1 wrk 1 Prism Controller Hyrise 1 HAProxy mSwitch mSwitch Client 2 DB Backend 2 Client 2 DB Backend 2 Learning Bridge Mode Prism Switch Module wrk 2 Hyrise 2 wrk 2 Hyrise 2 Switch Switch 12

Experimental Evaluation with two Clients and Backends ■ 10 GbE results ß scales up to bandwidth: min( Σ clients, Σ backends) 20 Throughput [Gb/s] Prism 10 ß limited by bandwidth of central dispatcher Dispatcher 5 HAProxy 2.5 1.25 1 10 -2 10 -4 1B 32B 1KiB 32KiB 1MiB 32MiB Payload □ Using TCP hand-over outperforms traditional approaches for large payloads 13

Experimental Evaluation with two Clients and Backends ■ 10 GbE results ß scales up to bandwidth: min( Σ clients, Σ backends) 20 Throughput [Gb/s] Prism 10 ß limited by bandwidth of central dispatcher Dispatcher 5 HAProxy 2.5 1.25 1 ß Throughput for 512 B payload 10 -2 Prism: 50 Mb/s 10 -4 Dispatcher: 63 MB/s 1B 32B 1KiB 32KiB 1MiB 32MiB HAProxy: 42 MB/s Payload □ Using TCP hand-over outperforms traditional approaches for large payloads □ Hyrise dispatcher performs best for small payload sizes up to 4kB 14

Eliminating the Bandwidth Bottleneck of Central Query Dispatching - PowerPoint PPT Presentation

Eliminating the Bandwidth Bottleneck of Central Query Dispatching Through TCP Connection Hand-Over Stefan Klauck 1 , Max Plauth 1 , Sven Knebel 1 , Marius Strobl 2 , Douglas Santry 2 , Lars Eggert 2 1 Hasso Plattner Institute, University of

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

von Neumann's bottleneck von Neumann machine One control unit that connects memory and

Eliminating Infringements! Eliminating Infringements! A review of a few of the A review of a few

Veterans Affairs: Eliminating Veterans Affairs: Eliminating Medication Errors Through Medication

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Bandwidth Management Chris Wilson Aptivate Ltd, UK AfNOG 2010 Ingredients What is bandwidth

Bandwidth Ex Parte Addendum M a y 1 0 , 2 0 1 8 Addendum to Bandwidth FCC Meeting on May 2,

Virtualising our CPE Mantychore is part-funded by the EC under Grant Agreement N 261527

Use Cases for High Bandwidth Query and Control of Core

A Generic Mapping-based Query Translation A Generic Mapping-based Query Translation from SPARQL

Information Retrieval > Query Us User er Query Words Query Words Search Personalization

Lost in transaction? Strategies to deal with (in)consistency in distributed systems

Programming Distributed Systems 09 Consistency in Transactions Annette Bieniusa AG Softech FB

CSE 5306 Distributed Systems Introduction Jia Rao http://ranger.uta.edu/~jrao/ Outline

Impossibility Results for Distributed Transactional Memory Paper Reading Group Costas Bunsch

Environments Costas Busch Louisiana State University (Joint work with Gokarna Sharma) WTTM 2013

Microservice Splitting the Monolith Software Engineering II Sharif University of Technology

Zerocoin: Anonymous Distributed E-Cash from Bitcoin Ian Miers , Christina Garman, Matthew Green,

Building Spanner Better clocks stronger semantics Alex Lloyd Senior Staff Software Engineer

Eliminating the Bandwidth Bottleneck of Central Query Dispatching - PowerPoint PPT Presentation

Eliminating the Bandwidth Bottleneck of Central Query Dispatching Through TCP Connection Hand-Over Stefan Klauck 1 , Max Plauth 1 , Sven Knebel 1 , Marius Strobl 2 , Douglas Santry 2 , Lars Eggert 2 1 Hasso Plattner Institute, University of

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

von Neumann's bottleneck von Neumann machine One control unit that connects memory and

Eliminating Infringements! Eliminating Infringements! A review of a few of the A review of a few

Veterans Affairs: Eliminating Veterans Affairs: Eliminating Medication Errors Through Medication

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Bandwidth Management Chris Wilson Aptivate Ltd, UK AfNOG 2010 Ingredients What is bandwidth

Bandwidth Ex Parte Addendum M a y 1 0 , 2 0 1 8 Addendum to Bandwidth FCC Meeting on May 2,

Virtualising our CPE Mantychore is part-funded by the EC under Grant Agreement N 261527

Use Cases for High Bandwidth Query and Control of Core

A Generic Mapping-based Query Translation A Generic Mapping-based Query Translation from SPARQL

Information Retrieval &gt; Query Us User er Query Words Query Words Search Personalization

Lost in transaction? Strategies to deal with (in)consistency in distributed systems

Programming Distributed Systems 09 Consistency in Transactions Annette Bieniusa AG Softech FB

CSE 5306 Distributed Systems Introduction Jia Rao http://ranger.uta.edu/~jrao/ Outline

Impossibility Results for Distributed Transactional Memory Paper Reading Group Costas Bunsch

Environments Costas Busch Louisiana State University (Joint work with Gokarna Sharma) WTTM 2013

Microservice Splitting the Monolith Software Engineering II Sharif University of Technology

Zerocoin: Anonymous Distributed E-Cash from Bitcoin Ian Miers , Christina Garman, Matthew Green,

Building Spanner Better clocks stronger semantics Alex Lloyd Senior Staff Software Engineer

Information Retrieval > Query Us User er Query Words Query Words Search Personalization