University of Minnesota Scaling Up The Performance of Distributed - PowerPoint PPT Presentation

University of Minnesota Scaling Up The Performance of Distributed Key-Value Stores Using Emerging Technologies for Big Data Applications Hebatalla Eldakiky Advisor: Prof. David H. C. Du Department of Computer Science and Engineering University of Minnesota , USA January 22 nd , 2020

Talk Outline • Introduction • Background & Motivation • Completed Work ❑ TurboKV: Scaling Up the Performance of Distributed Key-value Stores with In-Switch Coordination ❑ Key-value Pairs Allocation Strategy for Kinetic Drives • Proposed Work ❑ TransKV: A Networking Support for Transaction Processing in Distributed Key-value Stores (Proposed Project) • Conclusion • Future Plan 2

The Big Data Era (1/2) We live in the digital era, where data is generated from everywhere Bridge Monitoring. Environment Controls. Elder Care Monitoring. Forest Management. Soil Monitoring. Internet of Things. Social Media. Smart Phones. New 4 PB/day and more….. 6000 tweets/sec C 2017, Effective Business Intelligence with Quick Sight 3

The Big Data Era (2/2) NoSQL Databases become a competitive alternate to the relational DB to store and process the data. NoSQL DB Document Key-Value Graph DB Column DB DB Store RAMCloud 4

Big Data & Storage Challenges (1/2) • • Storage infrastructure is vital for solving big Software-defined Networks (SDN) provide data problems. efficient resource allocation and flexibility for • maximum network performance. Enormous amount of data is distributed • between several storage nodes which are Network switches also become more intelligent connected with network switches. to perform some computational tasks in- • network. Network latency plays a critical role in the efficient access of data in this distributed How to use SDN environment. to manage the distributed storage nodes intelligently Storage Infrastructure 5

Big Data & Storage Challenges (2/2) In-Storage Computing Architecture Conventional Architecture Host Host Execute CPU DRAM DRAM CPU Query Host Interface Host Interface Send Return Return Read Query Query Data Data Results Storage Device Execute Storage Device CPU Device Query ( ARM Processor ) DRAM Data movement problem Reduce the amount of data shipped between With data intensive application, amount of data storage and compute shipped from storage drives to be processed by the host is very large.  Lower Latency  Less energy for data transfer 6

Programmable Networks  In-Network Computing Programmable Networks P4 is a high-level language for Switch OS programming protocol independent Network Demands packet processors designed to Feedback achieve 3 goals. Run-time API Driver • Protocol independence. P4 • Target independence. • “This is how I want the Re-configurability in the field. network to behave and how to switch Think programming rather than packets…” protocols… (the user / controller makes the rules) P4 Programmable Device 7

What is PISA ? Programmer defines the Programmer declares the Programmer declares tables and the exact headers that should be how the output packet processing algorithm recognized and their order will look on the wire in the packet Programmable Match-Action Pipeline Programmable Programmable Parser Deparser • Packet is parsed into individual headers. • Headers and intermediate results are used for matching and actions. • Headers can be modified, added or removed in match-action processing. • Packet is deparsed. 8

Match-Action Processing • Tables are the fundamental unit in the match-action pipeline SUME • Each table contains one or more entries Bandwidth 6.5 Tbps Bandwidth 4x10 Gbps Processing delay < 1 µs  An entry contains: specific key to match on, single action, Action data. Systems use programmable switches • NetCache [ SOSP’ 17 ]  On-switch cache for Load Balancing (LB). • NetChain [ NSDI’ 18]  on-switch KV store for small data. • DistCache [ FAST’ 19]  multiple racks on-switch cache for LB • iSwitch [ ISCA ’19]  on-switch aggregation for distributed RL 9

Kinetic Drive  In-Storage Computing Kinetic Stack • Active KV storage device developed by Seagate. • Accessible by an Ethernet Model No. ST4000NK0001 connection. Transfer rate 60 Mbps • Has CPU and RAM with built-in Capacity 4 TB LevelDB. Key size Up to 4 KB • Handle device to device data Value size Up to 1 MB migration through P2P copy commands. Kinetic Drives Research • Applications communicate with the • Kinetic Action [ICPADS’ 17] drive using the Kinetic Protocol  Performance evaluation of KD characteristics. • over the TCP network. Data Allocation [BigDataService’ 17] • Simple API (get, put, delete).  4 data allocation approaches for KD. 10

Our Mission • Improve data access performance for distributed KV Stores when applications access storage through network. Apps • Reduce the amount of data shipped from storage devices to be processed by the host in data intensive applications. KV Stores • Completed Work  TurboKV: Scaling Up The performance of Distributed Key-value stores with In-Switch Coordination  Key-value pair allocation strategy for Kinetic Storage drives. Infrastructure • Proposed Work  TransKV: Networking Support for Transaction Processing in Distributed Key-value Stores. 11

Completed Work (1/2) TurboKV: Scaling Up the Performance of Distributed Key-value Stores with In-Switch Coordination [𝟐] [1] Hebatalla Eldakiky, David H.C. Du, and Eman Ramadan, “TurboKV: Scaling Up the performance of Distributed Key -value Stores with In-Switch Coordination”, under submission to ACM Transaction on Storage (ToS) 12

Problem Definition • In distributed Key-value store, data is partitioned between several nodes. • Partitions management and query routing are managed in three different ways: Server-driven coordination, Client-driven coordination, and Master-node coordination Server-driven Coordination Master-node Coordination Client-driven Coordination Reply sent to Reply sent Request directed to the client Request sent to the client the right instance to master node 2 2 3 1 1 1 Request sent to 2 Request sent to 3 target storage node random instance Re-direct to right Reply sent to the client storage node Increase query response time. Periodic pulling of updated directory info. × × Increase query response time. × Single point of failure. client needs to link code related to the × ×  Client doesn’t need to link any code  Client doesn’t need to link any code used KV store. to the KV store.  to the kV store. Decrease query response time. 13

Why Switch-driven Coordination? 4 hops • Requests pass by network switches to arrive at their target. • Switch-driven Coordination can carry out  Partitions management  Query routing In network switches. 2  Higher Throughput hops  Lower R/W Latency 14

Objectives • Design in-switch indexing scheme to manage the directory information records. • Adapt the scheme to the match-action pipeline in the programable switches. • Utilize switches as a monitoring system for data popularity and storage nodes load. • Scale up the scheme to multiple racks inside the data center network. Design Issues  Data Partitioning  Key-value Operations Processing  Data Replication  Load Balancing  Index Table Design  Failure Handling  Network Protocol  Scaling up to the data center networks. 15

TurboKV Overview Programmable Switches • Match-action table stores directory information. • Manages key-based Routing. • Provide Query statistics reports to controller. System Controller • Load balancing between the storage nodes. • Updating match-action tables with new location of data. • Handle failures. Storage Nodes • Server library to translate TurboKV packet to the used key-value store. System Clients • Client library to construct TurboKV request packets. 16

TurboKV Data plane Design (1/3) Logical View of TurboKV Data Plane Pipeline Hash partitioning Range Partitioning Chain Replication 17

TurboKV Data plane Design (2/3) On-Switch Index Table Network Protocol Sub-range Storage Nodes Sub-range1 𝐽𝑄 1 , 𝐽𝑄 2 , 𝐽𝑄 3 Sub-range2 𝐽𝑄 2 , 𝐽𝑄 3 , 𝐽𝑄 4 Sub-range3 𝐽𝑄 3 , 𝐽𝑄 4 , 𝐽𝑄 1 Sub-range4 𝐽𝑄 4 , 𝐽𝑄 1 , 𝐽𝑄 2 18

TurboKV Data plane Design (3/3) Key-value Operations Processing RANGE ( 𝑳 𝟐𝟏 , 𝑳 𝟐𝟏𝟏 ) PUT (K,value) GET (K) At egress pipeline [𝐿 1 − 𝐿 30 ] Packet out 𝐿 100 ≤ 𝐿 30 Recirculate [𝐿 31 − 𝐿 80 ] Packet out 𝐿 100 ≤ 𝐿 80 [𝐿 80 − 𝐿 120 ] Recirculate Packet out 𝐿 100 ≤ 𝐿 120 19

University of Minnesota Scaling Up The Performance of Distributed - PowerPoint PPT Presentation

University of Minnesota Scaling Up The Performance of Distributed Key-Value Stores Using Emerging Technologies for Big Data Applications Hebatalla Eldakiky Advisor: Prof. David H. C. Du Department of Computer Science and Engineering University

Minnesota Taconite Workers Health Study University of Minnesota Final Presentation to Lung Health

Reimagining Minnesota State Next Steps MINNESOTA STATE 2 FY2020 and Beyond: Strategic

Demographic Trends in Minnesota Implications for Minnesota State Colleges and Universities Board

Challenges for Rural Minnesota Communities Minnesota Rural Water Association Ruth Hubbard,

University of Minnesota School of Public Health University of Minnesota College of Veterinary

Coalition of Greater Minnesota Cities Susan Brower, Minnesota State Demographer July 2015 MN

Minnesota State and Minnesota Department of Education PSEO & Concurrent Enrollment COVID 19

Local Government Water Roundtable (LGWR) Minnesota Association of Counties Minnesota

University of Minnesota NASA USLI Presentation 10-29-2012 University of Minnesota: 2012-2013

Usage Aware Average-Clicks Kalyan Beemanapalli University of Minnesota Ramya Rangarajan

Three approaches to recommender systems Martin Powers University of Minnesota - Morris Morris,

The TMR Feeding Program Dr. Jim Linn University of Minnesota St. Paul, Minnesota Keys to a

Minnesota Ci+es Suppor+ng Bees Ana Heck, MPP University of Minnesota Bee Squad City Efforts

Office for Technology Commercialization Mission To translate University of Minnesota research

The S he Sta tate of te of Nonpr Nonprofit ofit Websites in Minnesota: ebsites in Minnesota:

MN COVID 19 Funding Reporting Overview August 4 , 2020 1 Local Government Support Minnesota

Meshfree methods for conservation laws using kinetic approach and alternate least squares

Kinetic Monte Carlo Methods C n e n o t i t e a r t f u o p r CC DC Kinetic

Modelling Biochemical Reaction Networks Lecture 14: Stochastic theory of reaction kinetics Marc

Chemistry 1000 Lecture 18: The kinetic molecular theory of gases Marc R. Roussel October 11,

An Unusual Two-Higgs Doublet Model from Warped Space We-Fu Chang 1 John N. Ng 2 . Spray 2 Andrew

02 Sampling algorithms Shravan Vasishth SMLP Shravan Vasishth 02 Sampling algorithms SMLP 1 /

Axial kinetic theory and spin transport for massive fermions Di-Lun Yang Keio Institute of Pure

Param etric an d Kin etic Min im um Span n in g Trees Pan kaj K. Agarwal David Eppstein Leon

University of Minnesota Scaling Up The Performance of Distributed - PowerPoint PPT Presentation

University of Minnesota Scaling Up The Performance of Distributed Key-Value Stores Using Emerging Technologies for Big Data Applications Hebatalla Eldakiky Advisor: Prof. David H. C. Du Department of Computer Science and Engineering University

Minnesota Taconite Workers Health Study University of Minnesota Final Presentation to Lung Health

Reimagining Minnesota State Next Steps MINNESOTA STATE 2 FY2020 and Beyond: Strategic

Demographic Trends in Minnesota Implications for Minnesota State Colleges and Universities Board

Challenges for Rural Minnesota Communities Minnesota Rural Water Association Ruth Hubbard,

University of Minnesota School of Public Health University of Minnesota College of Veterinary

Coalition of Greater Minnesota Cities Susan Brower, Minnesota State Demographer July 2015 MN

Minnesota State and Minnesota Department of Education PSEO &amp; Concurrent Enrollment COVID 19

Local Government Water Roundtable (LGWR) Minnesota Association of Counties Minnesota

University of Minnesota NASA USLI Presentation 10-29-2012 University of Minnesota: 2012-2013

Usage Aware Average-Clicks Kalyan Beemanapalli University of Minnesota Ramya Rangarajan

Three approaches to recommender systems Martin Powers University of Minnesota - Morris Morris,

The TMR Feeding Program Dr. Jim Linn University of Minnesota St. Paul, Minnesota Keys to a

Minnesota Ci+es Suppor+ng Bees Ana Heck, MPP University of Minnesota Bee Squad City Efforts

Office for Technology Commercialization Mission To translate University of Minnesota research

The S he Sta tate of te of Nonpr Nonprofit ofit Websites in Minnesota: ebsites in Minnesota:

MN COVID 19 Funding Reporting Overview August 4 , 2020 1 Local Government Support Minnesota

Meshfree methods for conservation laws using kinetic approach and alternate least squares

Kinetic Monte Carlo Methods C n e n o t i t e a r t f u o p r CC DC Kinetic

Modelling Biochemical Reaction Networks Lecture 14: Stochastic theory of reaction kinetics Marc

Chemistry 1000 Lecture 18: The kinetic molecular theory of gases Marc R. Roussel October 11,

An Unusual Two-Higgs Doublet Model from Warped Space We-Fu Chang 1 John N. Ng 2 . Spray 2 Andrew

02 Sampling algorithms Shravan Vasishth SMLP Shravan Vasishth 02 Sampling algorithms SMLP 1 /

Axial kinetic theory and spin transport for massive fermions Di-Lun Yang Keio Institute of Pure

Param etric an d Kin etic Min im um Span n in g Trees Pan kaj K. Agarwal David Eppstein Leon

Minnesota State and Minnesota Department of Education PSEO & Concurrent Enrollment COVID 19