STI-BT: A Scalable Transactional Index Nuno Diegues and Paolo Romano - PowerPoint PPT Presentation

STI-BT: A Scalable Transactional Index Nuno Diegues and Paolo Romano 34th International Conference on Distributed Systems (ICDCS)

Distributed Key-Value (DKV) stores rise in popularity:

Distributed Key-Value (DKV) stores rise in popularity: • scalability • fault-tolerance • elasticity

Distributed Key-Value (DKV) stores rise in popularity: • scalability • fault-tolerance • elasticity Recent trend: • cloud adoption, large/elastic scaling • move towards strong consistency • easy and transparent APIs

We focus on two main disadvantages: typically embrace weak consistency key-value access is too simplistic • mainly an index for primary key

We focus on two main disadvantages: typically embrace weak consistency key-value access is too simplistic • mainly an index for primary key Providing Secondary index is non-trivial

We focus on two main disadvantages: typically embrace weak consistency key-value access is too simplistic • mainly an index for primary key Providing Secondary index is non-trivial But it is desirable!

We focus on two main disadvantages: typically embrace weak consistency key-value access is too simplistic • mainly an index for primary key Providing Secondary index is non-trivial But it is desirable! State of the art solutions either…: did not provide strongly consistent transactions — more difficult were not fully decentralised — not scalable required several hops to access the index — more latency

In this work we present STI-BT: a Scalable Transactional Index

In this work we present STI-BT: a Scalable Transactional Index Serializable distributed transactions Secondary indexes via a distributed B+Tree implementation Index accesses/changes obey transactions’ semantics

In this work we present STI-BT: a Scalable Transactional Index Serializable distributed transactions Secondary indexes via a distributed B+Tree implementation Index accesses/changes obey transactions’ semantics Provide strong consistency + scalable indexing

Outline • Background on DKV stores • STI-BT • Evaluation • Related Work

Background on DKV store Infinispan DKV store by Red Hat

Background on DKV store Infinispan DKV store by Red Hat Distributed vector-clock based protocol: GMU [ICDCS’12]

Background on DKV store Infinispan DKV store by Red Hat Distributed vector-clock based protocol: GMU [ICDCS’12] Read-only transactions do not abort M ulti-versioned

Background on DKV store Infinispan DKV store by Red Hat Distributed vector-clock based protocol: GMU [ICDCS’12] Read-only transactions do not abort M ulti-versioned Update transactions U pdate Serializability

Background on DKV store Infinispan DKV store by Red Hat Distributed vector-clock based protocol: GMU [ICDCS’12] Read-only transactions do not abort M ulti-versioned G enuine Update transactions U pdate Serializability

GMU data set:

GMU data set: replication degree: 2 consistent hash function

GMU: genuine partial replication No central component Transactions require only machines holding data used

GMU: genuine partial replication No central component Transactions require only machines holding data used read/write

GMU: genuine partial replication No central component Transactions require only machines holding data used commit tx

GMU: genuine partial replication No central component Transactions require only machines holding data used commit tx consensus for commit

Outline • Background on DKV stores • STI-BT � • Evaluation • maximizing data locality • hybrid replication • Related Work • elastic scaling • concurrency enhancements

The need for data locality of the index Starting point: • consider a distributed B+Tree built on the DKV

The need for data locality of the index Starting point: • consider a distributed B+Tree built on the DKV consistent hash function S1 S2 S3 S4

The need for data locality of the index Starting point: • consider a distributed B+Tree built on the DKV • tree nodes placed with random consistent hash consistent hash function S1 S4 S3 S1 S2 S3 S4 S3 S1 S4 S2

Current problems with data locality Problems with consistent hashing data placement: S1 S4 S3 S2 Z P S3 S1 S4

Current problems with data locality Problems with consistent hashing data placement: - One index access entails several hops S1 S4 S3 S2 Z P S3 S1 S4

Current problems with data locality Problems with consistent hashing data placement: - One index access entails several hops S1 delete Z S4 S3 P Z S3 S1 S4 S2

Current problems with data locality Problems with consistent hashing data placement: - One index access entails several hops S1 delete Z S1 S2 S3 S4 S3 Z P Z S3 S1 S4 S2

Current problems with data locality Problems with consistent hashing data placement: - One index access entails several hops - Some servers receive more load than others S1 delete Z S1 S2 S3 S4 S3 Z P Z S3 S1 S4 S2

Current problems with data locality Problems with consistent hashing data placement: - One index access entails several hops - Some servers receive more load than others server load S1 delete Z S1 S2 S3 S4 S3 Z P Z S3 S1 S4 S2

Current problems with data locality Problems with consistent hashing data placement: - One index access entails several hops - Some servers receive more load than others - Range scan operations are also inefficient server load S1 delete Z S1 S2 S3 S4 S3 Z P Z S3 S1 S4 S2

Current problems with data locality Problems with consistent hashing data placement: - One index access entails several hops - Some servers receive more load than others - Range scan operations are also inefficient server load S1 delete Z S1 S2 S3 S4 S3 Z P Z S3 S1 S4 S2 scan P to Z

Where typical solutions fall short Partial replication of the index: poor locality poor load balancing

Where typical solutions fall short Partial replication of the index: poor locality poor load balancing Full replication of the index: consensus on updates is too expensive prevents scaling out storage

STI-BT: Maximizing data locality of the index

STI-BT: Maximizing data locality of the index Hybrid replication top nodes are more accessed but less modified better load balancing, rare cost for expensive consensus full replication C (cut-off level) partial replication

STI-BT: Maximizing data locality of the index Hybrid replication top nodes are more accessed but less modified better load balancing, rare cost for expensive consensus Co-located data placement groups of sub-trees, reduce network hops migrate transaction to exploit co-location full replication C (cut-off level) S1 S3 S4 S2 partial replication

Transaction migration driven by data co-location full C S1 S3 S4 partial S2 K

Transaction migration driven by data co-location S1 S2 S3 S4 full C S1 S3 S4 partial S2 K

Transaction migration driven by data co-location Lookup K S1 S2 S3 S4 full C S1 S3 S4 partial S2 K

Transaction migration driven by data co-location Lookup K 1 S1 S2 S3 S4 1 full C S1 S3 S4 partial S2 K

Transaction migration driven by data co-location Lookup K 1 2 local search S1 S2 S3 S4 full 2 C S1 S3 S4 partial S2 K

Transaction migration driven by data co-location Lookup K 1 local 2 search 3 migrate tx S1 S2 S3 S4 full C 3 S1 S3 S4 partial S2 K

Transaction migration driven by data co-location Lookup K 1 local 4 search local 2 search 3 migrate tx S1 S2 S3 S4 full C S1 S3 S4 partial S2 4 K

Transaction migration driven by data co-location Lookup K 1 local 4 search local 2 search 3 migrate tx S1 S2 S3 S4 full C S1 S3 S4 partial S2 K

Grouping index in sub-trees Still rely on consistent hashing: • preserve fully decentralized design and quick lookup of data Exploit knowledge over structure of the indexed data • general purpose data placement is agnostic of the data • but we know how it will be structured

Grouping index in sub-trees Still rely on consistent hashing: • preserve fully decentralized design and quick lookup of data Exploit knowledge over structure of the indexed data • general purpose data placement is agnostic of the data • but we know how it will be structured k u : unique key consistent hash function server

Grouping index in sub-trees Still rely on consistent hashing: • preserve fully decentralized design and quick lookup of data Exploit knowledge over structure of the indexed data • general purpose data placement is agnostic of the data • but we know how it will be structured k u : unique key consistent hash function server local map lookup k u : unique key

STI-BT: A Scalable Transactional Index Nuno Diegues and Paolo Romano - PowerPoint PPT Presentation

STI-BT: A Scalable Transactional Index Nuno Diegues and Paolo Romano 34th International Conference on Distributed Systems (ICDCS) Distributed Key-Value (DKV) stores rise in popularity: Distributed Key-Value (DKV) stores rise in popularity:

CS143: Index 1 Topics to Learn Important concepts Dense index vs. sparse index Primary

ATLAS ATLAS A Scalable Emulator for A Scalable Emulator for Transactional Parallel Systems

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 25 November 2016 Lecture 8

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 27 November 2015 Lecture 8

Index Rules and Methodology Index Name Ticker S-Network US Equity 3000 Index SN3000 S-Network

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Transactional Locking II Nir Shavit, Dave Dice and Ori Shalev Scalable Synchronization Group

Phased Transactional Memory Dan Nussbaum Scalable Synchronization Research Group Joint work

Eric Berne 1910 - 1970 Transactional Analysis UET6 TASK 02 Transactional Analysis is: A

Transactional Systems: Examples Core OS / RedHat: Various: SUSE: Common Properties of

Ego State Model Transactional Analysis Ego States P A C VISIONS Inc. Transactional Analysis

Transactional Memory 1 To read more This days papers: Herlihy and Moss, Transactional

Extending Hardware Transactional Memory to Support Non-busy Waiting and Non-transactional Actions

Transactional Memory: Architectural support for Lock-Free Data Structure Transactional Memory:

Transactional Recovery Transactional Recovery Transactions: ACID Properties Transactions: ACID

Transactional memory with data Transactional memory with data invariants: or putting the

Software Asset Management (SAM) Sarah Lawrence Software Asset Auditor, Software & IBM TSS

Benefits Local catalog Seamless discovery and delivery: find it, click it, get it Consortial

Load Balancing in Cellular Networks with User-in-the-loop: A Spatial Traffic Shaping Approach

How to Re-Architect without Breaking Stuff (too much) Owen G Garrett Ma March 2018

1 The SP Suite has the capability to design an entire concrete structure from foundation to roof.

Transmission Plan Development Draft 2012/2013 ISO Transmission Plan Stakeholder Meeting Neil

AUTOMATIC CONTINGENCY SELECTION Ejebe/Wollenberg EE 8725 Presentation November 3, 2015 Tahnee

2018 Results February 1 st , 2019 2018 Results February 1 st 2019 / 2 Disclaimer This document

STI-BT: A Scalable Transactional Index Nuno Diegues and Paolo Romano - PowerPoint PPT Presentation

STI-BT: A Scalable Transactional Index Nuno Diegues and Paolo Romano 34th International Conference on Distributed Systems (ICDCS) Distributed Key-Value (DKV) stores rise in popularity: Distributed Key-Value (DKV) stores rise in popularity:

CS143: Index 1 Topics to Learn Important concepts Dense index vs. sparse index Primary

ATLAS ATLAS A Scalable Emulator for A Scalable Emulator for Transactional Parallel Systems

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 25 November 2016 Lecture 8

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 27 November 2015 Lecture 8

Index Rules and Methodology Index Name Ticker S-Network US Equity 3000 Index SN3000 S-Network

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Transactional Locking II Nir Shavit, Dave Dice and Ori Shalev Scalable Synchronization Group

Phased Transactional Memory Dan Nussbaum Scalable Synchronization Research Group Joint work

Eric Berne 1910 - 1970 Transactional Analysis UET6 TASK 02 Transactional Analysis is: A

Transactional Systems: Examples Core OS / RedHat: Various: SUSE: Common Properties of

Ego State Model Transactional Analysis Ego States P A C VISIONS Inc. Transactional Analysis

Transactional Memory 1 To read more This days papers: Herlihy and Moss, Transactional

Extending Hardware Transactional Memory to Support Non-busy Waiting and Non-transactional Actions

Transactional Memory: Architectural support for Lock-Free Data Structure Transactional Memory:

Transactional Recovery Transactional Recovery Transactions: ACID Properties Transactions: ACID

Transactional memory with data Transactional memory with data invariants: or putting the

Software Asset Management (SAM) Sarah Lawrence Software Asset Auditor, Software &amp; IBM TSS

Benefits Local catalog Seamless discovery and delivery: find it, click it, get it Consortial

Load Balancing in Cellular Networks with User-in-the-loop: A Spatial Traffic Shaping Approach

How to Re-Architect without Breaking Stuff (too much) Owen G Garrett Ma March 2018

1 The SP Suite has the capability to design an entire concrete structure from foundation to roof.

Transmission Plan Development Draft 2012/2013 ISO Transmission Plan Stakeholder Meeting Neil

AUTOMATIC CONTINGENCY SELECTION Ejebe/Wollenberg EE 8725 Presentation November 3, 2015 Tahnee

2018 Results February 1 st , 2019 2018 Results February 1 st 2019 / 2 Disclaimer This document

Software Asset Management (SAM) Sarah Lawrence Software Asset Auditor, Software & IBM TSS