Search Lookaside Buffer: Efficient Caching for Index Data Structures - PowerPoint PPT Presentation

Search Lookaside Buffer: Efficient Caching for Index Data Structures Xingbo Wu, Fan Ni, Song Jiang

Background ● Large-scale in-memory applications. ○ In-memory databases In-memory NoSQL stores and caches ○ Software routing tables ○ ● They rely on index data structures to access their data. Hash Table B + -tree 2

Background ● Large-scale in-memory applications. ○ In-memory databases In-memory NoSQL stores and caches ○ Software routing tables ○ ● They rely on index data structures to access their data. ● “ hash index (i.e., hash table) accesses are the most significant single source of runtime overhead, constituting 14–94% of total query execution time. ” [Kocberber et al., MICRO-46] 3

CPU Cache is Not Effectively Used ● Indices are too large to fit in CPU cache. In-memory Database: “ 55% of the total memory”. [Zhang et al., SIGMOD’16] In-memory KV caches: 20–40% of the memory. [Atikoglu et al., Sigmetrics’12] ● Access locality has potential to address the problem. Facebook’s Memcached workload study: “ All workloads exhibit the expected long-tail distributions, with a small percentage of keys appearing in most of the requests. . . ” ● However, data locality is compromised during index search. 4

Case Study: Search in a B + -tree-indexed Store 10 M ops/sec Store size: 10 GB 8B Keys, 64B Values Zipfian workload 40 MB CPU cache Accessed data set: 10 GB 5

Case Study: Search in a B + -tree-indexed Store 10 M ops/sec Store size: 10 GB 12.5 M ops/sec 8B Keys, 64B Values Zipfian workload 40 MB CPU cache Accessed data set: 10 MB 6

Case Study: Search in a B + -tree-indexed Store 10 M ops/sec Store size: 10 GB 12.5 M ops/sec 8B Keys, 64B Values Zipfian workload 382 M ops/sec 40 MB CPU cache If we remove the index and put the same data set in an array Accessed data set: 10 GB 7

A Look at Index Traversal ● Index search in B + -tree: binary search at each node 8

A Look at Index Traversal ● The intermediate entries on the path become hot . 11

False Temporal Locality ● The intermediate entries on the path become hot . ● The purpose of index search is to find the target entry . False temporal Locality Target Entry 12

False Spatial Locality ● Each hot intermediate entry occupies a whole cache line . ● Touched cache lines ≫ entries required in the search. 64-byte False cache lines spatial Locality 13

False Localities on a Hash Table ● Chains or open addressing lead to false temporal locality. ● False spatial locality is significant even with short chains. The target entry 14

A Closer Look at Your CPU Cache ● Cache space is occupied by index entries of false localities. Target entries Intermediate entries 15

Existing Efforts on Improving Index Search ● Redesigning the data structure: Cuckoo hash, Masstree.. Must be an expert of the data structure ○ ○ Optimizations are specific to certain data structures ○ May add overhead to other operations (e.g., expensive insertions) ● Hardware accelerators: Widx, MegaKV, etc. ○ High design cost ○ Hard to adapt to new index data structures High latency for out-of-core accelerators (e.g., GPUs, FPGAs) ○ 16

The Issue of Virtual Address Translation Use of page tables shares the same challenges of index search. Large index: every process has a page table. ● ● Frequently accessed: consulted in every memory access. ● False temporal locality: tree-structured tables. False spatial locality: intermediate page-table directories. ● 17

Fast Address translation with TLB TLB directly caches P age T able E ntries for translation. Bypasses page table walking ➔ Covers large memory area with a small cache ➔ TLB PTE PTE PTE PTE PTE PTE 18

Our Solution: Search Lookaside Buffer ● Pure software library ● Easy integration with any index data structure ● Negligible overhead even in the worst case 19

Index Search with SLB Every lookup first consults SLB. X = SLB_GET(key) if X: return X X = INDEX_GET(key) if X: SLB_EMIT(key, X) SLB_GET return X Not found return NULL 20

Index Search with SLB Emits a target entry after successful search. X = SLB_GET(key) if X: return X X = INDEX_GET(key) if X: SLB_EMIT(key, X) return X return NULL 21

Index Search with SLB A hit in SLB cache completes the search. X = SLB_GET(key) if X: return X X = INDEX_GET(key) if X: SLB_EMIT(key, X) SLB_GET return X KV return NULL 22

Design challenges ❖ Tracking KV temperatures can pollute CPU cache Cache-line-local access counters for cached items. ➢ Approximate access logging for uncached items. ➢ 23

Design challenges ❖ Tracking temperatures of items can pollute CPU cache Cache-line-local access counters for cached items. ➢ Approximate access logging for uncached items. ➢ ❖ Frequent replacement hurts index performance Adaptive logging throttling for uncached items. ➢ ❖ More details in the paper... 24

Experimental Setup B + -tree, Skip list, and hash tables ● Filled with 10 8 KVs (8B K, 64B V) ● ● Store size: ~10GB Zipfian workload ● ● Accessed data set: 10MB->10GB SLB size: 16/32/64 MB ● ● Uses one NUMA node (16 cores) 25

B + -tree and Skip List B + -tree Skip list 15x 2.5x ● Significant improvements for ordered data structures ○ Substantial False localities caused by index traversal 26

Hash Tables Cuckoo Chaining +28% +50% ● Chaining hash table: average chain length <= 1 ○ The index has no false temporal locality. ○ improves by up to 28% by removing false spatial locality 27

High-performance KV Server ● An RDMA-port of MICA [Lim et al., NSDI’14] ○ In-memory KV store Bulk-chaining partitioned hash tables ○ Batch-processing ○ ○ Lock-free accesses 28

MICA over 100Gbps Infiniband ● GET: Limited improvements due to network bandwidth. 10.7GB/s ● PROBE: only returns True/False ~90% Bandwidth +20%~66% GET PROBE 29

Conclusion ● We identify the issue of false temporal/spatial locality in index search. We propose SLB, a general software solution to improve search ● for any index data structure by removing the false localities. ● SLB improves index search for workloads with strong locality, and imposes negligible overhead with weak locality. 30

Thank You ! ☺ Questions? 31

Backup slides 32

Replaying Facebook KV Workloads Five key-value traces collected on production memcached servers [Atikoglu et al., Sigmetrics’12] 33

Replaying Facebook KV Workloads USR: GET-dominant Less skewed Working set >>> cache No improvement 34

Replaying Facebook KV Workloads APP & ETC: More skewed Working set fits the cache 10%-30% DELETE frequent invalidations in SLB Improvement < 20% 35

Replaying Facebook KV Workloads SYS & VAR: GET & UPDATE Working set fits the cache Improvement > 43% 36

Search Lookaside Buffer: Efficient Caching for Index Data Structures - PowerPoint PPT Presentation

Search Lookaside Buffer: Efficient Caching for Index Data Structures Xingbo Wu, Fan Ni, Song Jiang Background Large-scale in-memory applications. In-memory databases In-memory NoSQL stores and caches Software routing tables

Guess What? Caching! Translation-Lookaside Buffer (TLB) stores for future use a successful

1 The Translation The Translation Lookaside Lookaside Buffer (TLB) Buffer (TLB) Care and

External buffer Raslan Darawsheh Mellanox External buffer First was introduced by Olivier

TLBleed Translation leak-aside buffer: Defeating cache side-channel protections with TLB

TinyOS Determine when Fill message Specify Pass buffer message buffer Network Communication

Lab 2: Buffer Overflows Fengwei Zhang SUSTech CS 315 Computer Security 1 Buffer Overflows

Delta Pointers: Buffer Overflow Checks Without the Checks Tadde us Kroes & Koen Koning

Smashing the Buffer Smashing the Buffer Miroslav tampar Miroslav tampar (mstampar@zsis.hr )

Buffer Software Security overflows and other memory safety vulnerabilities Buffer overflow

Buffer Overflows with Content 2 A Process Stack Buffer Overflow Common Techniques employed

More Vulnerabilities (buffer overreads, format string, integer overflow, heap overflows) Chester

Introduction Buffer Overflows Buffer overflows were the most common form of security

a single gadget weird machine Framing Signals a return to portable shellcode Erik Bosman and

Week 03 Lectures PostgreSQL Buffer Manager 1/95 PostgreSQL buffer manager: provides a shared

Shared buffer laboratory 2 implements a shared buffer Process loop Ke yboard wait for

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

with a Runahead Buffer Milad Hashemi Yale N. Patt December 8, 2015 Runahead Execution Overview

External Sorting (From Chapter 13)

System Notes 02: Hardware Hector Garcia-Molina CS 245 Notes 2 1 Outline Hardware: Disks

The Page Cache Don Porter 1 CSE 506: Opera.ng Systems Logical Diagram Binary Memory Threads

Relaxed memory concurrency and verified compilation Viktor Vafeiadis Max Planck Institute for

Data Management Systems Storage Management Basic principles Memory hierarchy The

Stack Smashing as of Today A State-of-the-Art Overview on Buffer Overflow Protections on

Hardware-Based Speculation Or how it is in real life

Search Lookaside Buffer: Efficient Caching for Index Data Structures - PowerPoint PPT Presentation

Search Lookaside Buffer: Efficient Caching for Index Data Structures Xingbo Wu, Fan Ni, Song Jiang Background Large-scale in-memory applications. In-memory databases In-memory NoSQL stores and caches Software routing tables

Guess What? Caching! Translation-Lookaside Buffer (TLB) stores for future use a successful

1 The Translation The Translation Lookaside Lookaside Buffer (TLB) Buffer (TLB) Care and

External buffer Raslan Darawsheh Mellanox External buffer First was introduced by Olivier

TLBleed Translation leak-aside buffer: Defeating cache side-channel protections with TLB

TinyOS Determine when Fill message Specify Pass buffer message buffer Network Communication

Lab 2: Buffer Overflows Fengwei Zhang SUSTech CS 315 Computer Security 1 Buffer Overflows

Delta Pointers: Buffer Overflow Checks Without the Checks Tadde us Kroes &amp; Koen Koning

Smashing the Buffer Smashing the Buffer Miroslav tampar Miroslav tampar (mstampar@zsis.hr )

Buffer Software Security overflows and other memory safety vulnerabilities Buffer overflow

Buffer Overflows with Content 2 A Process Stack Buffer Overflow Common Techniques employed

More Vulnerabilities (buffer overreads, format string, integer overflow, heap overflows) Chester

Introduction Buffer Overflows Buffer overflows were the most common form of security

a single gadget weird machine Framing Signals a return to portable shellcode Erik Bosman and

Week 03 Lectures PostgreSQL Buffer Manager 1/95 PostgreSQL buffer manager: provides a shared

Shared buffer laboratory 2 implements a shared buffer Process loop Ke yboard wait for

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

with a Runahead Buffer Milad Hashemi Yale N. Patt December 8, 2015 Runahead Execution Overview

External Sorting (From Chapter 13)

System Notes 02: Hardware Hector Garcia-Molina CS 245 Notes 2 1 Outline Hardware: Disks

The Page Cache Don Porter 1 CSE 506: Opera.ng Systems Logical Diagram Binary Memory Threads

Relaxed memory concurrency and verified compilation Viktor Vafeiadis Max Planck Institute for

Data Management Systems Storage Management Basic principles Memory hierarchy The

Stack Smashing as of Today A State-of-the-Art Overview on Buffer Overflow Protections on

Hardware-Based Speculation Or how it is in real life

Delta Pointers: Buffer Overflow Checks Without the Checks Tadde us Kroes & Koen Koning