SLIK: Scalable Low-Latency Indexes for a Key-Value Store Ankita - - PowerPoint PPT Presentation

slik scalable low latency indexes for a key value store
SMART_READER_LITE
LIVE PREVIEW

SLIK: Scalable Low-Latency Indexes for a Key-Value Store Ankita - - PowerPoint PPT Presentation

SLIK: Scalable Low-Latency Indexes for a Key-Value Store Ankita Kejriwal (With Arjun Gopalan, Ashish Gupta, Greg Hill, Zhihao Jia, Stephen Yang and John Ousterhout) PlatformLab Hypothesis A key value store can support highly consistent


slide-1
SLIDE 1

SLIK: Scalable Low-Latency Indexes for a Key-Value Store

Ankita Kejriwal

(With Arjun Gopalan, Ashish Gupta, Greg Hill, Zhihao Jia, Stephen Yang and John Ousterhout) PlatformLab

slide-2
SLIDE 2

A key value store can support highly consistent secondary indexes while operating at low latency and large scale.

SLIK Slide 2

Hypothesis

slide-3
SLIDE 3
  • SLIK:

Scalable Low-latency Indexes for a Key-value Store

§ Enables multiple secondary keys for each object § Allows lookups and range queries on these keys

  • Key design features:

§ Scalability using independent partitioning § Consistency with minimal performance overheads using an

  • rdered write approach
  • Performance

§ 11-13 µs indexed reads § 29-37 µs writes/overwrites of objects with one indexed attribute § Linear throughput increase with increasing number of partitions

  • Feedback welcome!

Introduction

slide-4
SLIDE 4
  • Motivation
  • Data Model and API
  • Design
  • Performance
  • Related Work
  • Summary

SLIK Slide 4

Talk Outline

slide-5
SLIDE 5

SLIK Slide 5

Talk Outline

  • Motivation
  • Data Model and API
  • Design
  • Performance
  • Related Work
  • Summary
slide-6
SLIDE 6

SLIK Slide 6

Motivation

Traditional RDBMs NoSQL Systems

+ scalability

  • data models
  • consistency

MySQL + consistency + data models + low latency H-Base Espresso PNUTS Tao Megastore Spanner H-Store HyperDex MongoDB SLIK

slide-7
SLIDE 7

SLIK Slide 7

Talk Outline

  • Motivation
  • Data Model and API
  • Design
  • Performance
  • Related Work
  • Summary
slide-8
SLIDE 8

Slide 8

Object Format

SLIK

Tables

Object Value Blob Key

slide-9
SLIDE 9

Slide 9

Object Format

Tables

Object Primary Key

SLIK

Key[0] Value Blob Key[2] Key[1] Num Keys ….

slide-10
SLIDE 10

SLIK Slide 10

Object Format and API

createIndex(tableId, ¡indexId, ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡indexType) ¡ dropIndex(tableId, ¡indexId) ¡ ¡ write(tableId, ¡keys, ¡value) ¡ ¡ IndexLookup(tableId, ¡indexId, ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡keyRange) ¡

⇒ objects ¡in ¡a ¡sorted ¡order ¡via ¡

streaming ¡interface ¡

Tables

Object Primary Key Key[0] Value Blob Key[2] Key[1] Num Keys ….

slide-11
SLIDE 11

SLIK Slide 11

Talk Outline

  • Motivation
  • Data Model and API
  • Design
  • Performance
  • Related Work
  • Summary
slide-12
SLIDE 12
  • Scalable distributed system
  • Consistency expected from a centralized

system (with minimal performance overheads)

SLIK Slide 12

Design Goals

slide-13
SLIDE 13
  • Scalable distributed system
  • Consistency expected from a centralized

system (with minimal performance overheads)

SLIK Slide 13

Design Goals

slide-14
SLIDE 14

Index Partitioning

55 2 13 Indexlet Tablet 1 20 23 Indexlet 89 14 Tablet 5 31 Indexlet 60 3 9 Tablet 11 15 24 45

5 14 9 3 11 15 31 89 60 24 2 1 13 20 55 23 45

SLIK Slide 14

Colocation Approach

  • Colocate index entries and objects
  • One of the keys used to partition the table’s objects and indexes
  • No particular association between index partitions and index key ranges
slide-15
SLIDE 15

Slide 15

Index Partitioning

55 2 13 Indexlet Tablet 1 20 23 Indexlet 89 14 Tablet 5 31 Indexlet 60 3 9 Tablet 11 15 24 45

5 14 9 3 11 15 31 89 60 24 2 1 13 20 55 23 45

Example query: Objects with “age” between 11 – 14 Not Scalable!

Colocation Approach

  • Colocate index entries and objects
  • One of the keys used to partition the table’s objects and indexes
  • No particular association between index partitions and index key ranges
slide-16
SLIDE 16

Index Partitioning

55 2 13 Tablet 1 20 23 89 14 Tablet 5 31 60 3 9 Tablet 11 15 24 45

3 1 5 9 14 11 2 13 23 15 24 45 89 60 20 31 55

Indexlet Indexlet

Slide 16 SLIK

Independent Partitioning

  • Partition each index and table independently
  • Partition each index according to sort order for that index
slide-17
SLIDE 17

Index Partitioning

Example query: Objects with “age” between 11 – 14 Scalable!

55 2 13 Tablet 1 20 23 89 14 Tablet 5 31 60 3 9 Tablet 11 15 24 45

3 1 5 9 14 11 2 13 23 15 24 45 89 60 20 31 55

Indexlet Indexlet

Slide 17

Independent Partitioning

  • Partition each index and table independently
  • Partition each index according to sort order for that index
slide-18
SLIDE 18

Index Partitioning

20 40 60 80 100 120 10 20 30 40 50 60 70 80 Lookup Latency (µs) Number of Servers

Colocation size 1 Colocation size 10 Independent size 1 Independent size 10

8.3 26.7 87.3 16.2 89.7 15.2 16.7 12.7 22.3 28.8

Latency for IndexLookup: single table with one index with varying num indexlets Each object: pk 30 bytes, sk 30 bytes, val 100 bytes

slide-19
SLIDE 19

Index Partitioning

1000 2000 3000 4000 5000 1 2 3 4 5 6 7 8 9 10 Throughput (103 lookups/sec) Number of Indexlets

Independent Partitioning Colocation

580 1127 1619 2197 2655 3199 3629 4248 4629 5069 461 423 463 447 457 447 357 441 418 435

Throughput for IndexLookup: single table with one index with varying num indexlets Queried via multiple clients Each object: pk 30 bytes, sk 30 bytes, val 100 bytes

slide-20
SLIDE 20
  • Scalable distributed system:
  • Use independent partitioning
  • But: indexed object writes: distributed operations
  • Consistency expected from a centralized

system (with minimal performance overheads):

  • If an object contains a given secondary key, then an

index lookup with that key will return the object

  • If an object is returned by index lookup, then this
  • bject contains a secondary key for that index within

the specified range

SLIK Slide 20

Design Goals

slide-21
SLIDE 21

Consistency

  • Consistency properties:
  • If an object contains a given secondary key, then an index lookup

with that key will return the object

  • If an object is returned by index lookup, then this object contains a

secondary key for that index within the specified range

  • Solution:
  • Longer index lifespan (via ordered writes)
  • Object data is ground truth and index entries serve as hints

Slide 21

time Object: ¡Foo ¡(pk): ¡Bob ¡(sk) ¡ Object: ¡Foo ¡(pk): ¡Sam ¡(sk) ¡ Index ¡Entry: ¡Bob ¡-­‑> ¡Foo ¡ Index ¡Entry: ¡Sam ¡-­‑> ¡Foo ¡ commit point modify object commit point commit point write object remove object

slide-22
SLIDE 22

SLIK Slide 22

Talk Outline

  • Motivation
  • Data Model and API
  • Design
  • Performance
  • Related Work
  • Summary
slide-23
SLIDE 23

Implemented SLIK in RAMCloud

  • Distributed in-memory key-value storage system
  • Designed for large-scale applications
  • Optimized to operate at lowest possible latency

SLIK Slide 23

Performance

slide-24
SLIDE 24

Questions:

  • Does SLIK meet the low latency goal?
  • Does SLIK meet the scalability goal?
  • How does the performance of indexing with SLIK

compare to other state-of-the-art systems?

SLIK Slide 24

Performance

slide-25
SLIDE 25

Systems we compared:

  • HyperDex:

§ Spaces containing objects § Objects have primary key and multiple attributes § Data (and indexes) partitioned using hyperspace hashing § Each index contains all object data

  • H-Store:

§ Main memory database § SQL+ACID § Data (and indexes) partitioned based on specified attribute § Many parameters for tuning

  • Got assistance from developers to tune for each test
  • Examples: txn_incoming_delay, partitioning column

SLIK Slide 25

Performance

slide-26
SLIDE 26

Slide 26

Lookup Latency

Single table with one index having a single partition; Each object: pk 30 bytes, sk 30 bytes, val 100 bytes

10 10 10 10 10 10 10 Size of Index (# objects)

10 100 1000 100 101 102 103 104 105 106 (a) Lookup Latency (µs)

H-Store SK Partitioned 173.10 147.71 187.69 196.65 186.66 204.40 203.70 H-Store PK Partitioned 963.82 929.22 937.06 989.51 1024.35 987.00 941.17 HyperDex 155.9 166.3 181.8 184.4 185.0 239.2 263.7 SLIK TCP 45.2 44.9 42.7 48.5 49.7 45.6 54.5 SLIK 11.0 10.2 11.7 11.6 12.7 12.8 13.1

slide-27
SLIDE 27

Slide 27

Overwrite Latency

Single table with one index having a single partition; Each object: pk 30 bytes, sk 30 bytes, val 100 bytes

10 100 1000 100 101 102 103 104 105 106 (c) Overwrite Latency (µs) Size of Index (# objects)

H-Store SK Partitioned 179.53 160.16 195.06 202.07 192.46 207.10 209.37 H-Store PK Partitioned 987.25 939.84 961.17 1019.82 1048.86 1010.92 968.72 HyperDex 768.5 782.8 782.9 785.7 789.4 949.4 870.0 SLIK TCP 124.4 135.8 126.5 125.9 124.3 123.8 129.7 SLIK 31.4 32.7 34.2 34.3 35.2 35.2 37.0

slide-28
SLIDE 28

Multiple Secondary Indexes

Single table with varying num indexes, each having a single partition; Each object: pk 30 bytes, sk 30 bytes, val 100 bytes

Slide 28

10 100 1000 1 2 3 4 5 6 7 8 9 10 Latency (µs) Number of Indexes

H-Store via Sk1

268.82 248.51 271.58 241.77 255.70

H-Store via Pk

1620.92 1343.23 1360.54 1455.67 1448.46

H-Store via SkX SLIK TCP

138.5 139.1 156.2 165.2 164.3 175.3 175.7 179.1 182.0 184.6

SLIK

33.0 35.3 39.8 39.0 42.7 42.4 46.4 47.3 49.5 51.2

HyperDex

893.6 978.5 1061.5 1116.3 1267.8 836.4

slide-29
SLIDE 29

Slide 29

Scalability

Single table with one index having varying num partitions; Each object: pk 30 bytes, sk 30 bytes, val 100 bytes

1000 2000 3000 4000 5000 1 2 3 4 5 6 7 8 9 10 Throughput (103 lookups/sec) Number of Indexlets

H-Store HyperDex SLIK TCP SLIK

38.49 40.70 55.59 70.15 82.75 98.09 115.40 130.27 149.25 190.20 267 275 371 430 653 794 1001 1199 1352 1445 1663 1807 220 580 1127 1619 2197 2655 3199 3629 4248 4629 5069

slide-30
SLIDE 30

50 100 150 200 250 300 1 2 3 4 5 6 7 8 9 10 Average Latency per Lookup (µs) Number of Indexlets

H-Store

119.41 152.61 179.68 199.22 215.65 243.45 240.38 267.46 269.84 267.23

SLIK tcp

49.93 55.63 58.22 69.15 81.61 90.68 91.07 112.76 114.47 113.32

SLIK

13.13 13.36 13.99 13.78 14.45 14.74 14.53 14.45 14.68 14.75

Slide 30

Scalability

Single table with one index having varying num partitions; Each object: pk 30 bytes, sk 30 bytes, val 100 bytes

slide-31
SLIDE 31

SLIK Slide 31

Talk Outline

  • Motivation
  • Data Model and API
  • Design
  • Performance
  • Related Work
  • Summary
slide-32
SLIDE 32

SLIK Slide 32

Related Work

Data storage system

§ Scale (spectrum from local machine to datacenter) § Data model (spectrum from key-value to relational) § Consistency (spectrum from eventual to strong) § Performance: latency and/or throughput

slide-33
SLIDE 33

Current Web Scale Datastores

Slide 33 SLIK

1µs 10µs 100µs 1ms 10ms 100ms 1s Eventual 10s Causal, SI, “Define your own” PNUTS Cassandra CouchDB Tao

Read / write latency (approx) Consistency Level

MegaStore H-Store HyperDex MongoDB Spanner Espresso H-Base Strong

slide-34
SLIDE 34

Current Web Scale Datastores

1µs 10µs 100µs 1ms 10ms 100ms 1s Eventual Strong 10s Causal, SI, “Define your own” PNUTS Cassandra CouchDB Tao

Read / write latency (approx) Consistency Level

Slide 34

MegaStore

SLIK

H-Store HyperDex MongoDB Spanner SLIK Espresso H-Base

slide-35
SLIDE 35

SLIK Slide 35

Talk Outline

  • Motivation
  • Data Model and API
  • Design
  • Performance
  • Related Work
  • Summary
slide-36
SLIDE 36

A key value store can support highly consistent secondary indexes while operating at low latency and large scale.

SLIK Slide 36

Summary

Lookups and range queries on secondary keys 11-13 µs indexed reads 29-37 µs writes/overwrites of indexed objects Ordered writes and indexes as hints With increasing number of partitions:

  • Linear throughput increase
  • Minimal latency impact
slide-37
SLIDE 37

Thank you!

I can be reached at: ankitak@cs.stanford.edu, @ankitaak Code available open source: github.com/PlatformLab/RAMCloud Papers and other information at: ramcloud.stanford.edu