Optimizing Redis for Locality and Capacity Kevin C., Yoongu K. - PowerPoint PPT Presentation

Optimizing Redis for Locality and Capacity Kevin C., Yoongu K. Lavanya S. 15-799 Project Presentation 12/4/2013 1

Goals of Our Project • Leverage DRAM and dataset characteristics to improve performance of in-memory database • Locality : Exploit DRAM internal buffers • Capacity : Exploit redundancy in dataset 2

DRAM System Organization CPU DRAM Bus 3

DRAM System Organization DRAM System CPU Bus Bank Banks can be accessed in parallel 4

DRAM Bank Organization Columns Rows (8KB) Row Buffer • Row buffer serves as a fast cache in a bank – Row buffer miss transfers an entire row of data to the row buffer – Row buffer hit for accesses in the same row (reduces latency by 1-2x) 5

RBL in In-Memory Databases • Idea : Map hot data to a few DRAM rows • Hot data : Data with high temporal correlation • Examples of temporally correlated data : – Records touched around the same time – Query terms searched together often 6

Challenge • How are data mapped to DRAM? Which bank? Which row? Virtual Address Virtual Page Number Offset Physical Address Physical Page Number Offset Unexposed to the system: Determined by the HW (memory controller) DRAM System Bank 7

Task 1: Find the Mapping to DRAM • Approach: Kernel module with assembly code to observe access latency to different addresses Input: addr1 & addr2 1. Cache hit 1. Load addr1 // Fill TLB for addr1 2. Cache miss – Row Hit 2. Load addr2 // Fill TLB for addr2 3. Cache miss – Row Miss 3. Flush the cache lines of addr1 and addr2 4. Load addr1 5. Read CPU cycle counter // Tstart for addr2 6. Load addr2 7. Read CPU cycle counter // Tend of addr2 Courtesy: Backbone kernel module is obtained from Hyoseung Kim under Prof. Rajkumar 8

Task 1: Find the Mapping to DRAM • Experimental setup: 3.4GHz Haswell CPU, 2GB DRAM DIMM (8 banks) • With an exhaustive selection of addr1 and addr2, we discover the mapping to be: Physical Address Physical Page Number Offset 18 16 15 13 12 0 Row Row Bank Offset Byte offset within a row (8KB) XOR bit [15:13] with bit [18:16] to select a bank 9

Task 1: Find the Mapping to DRAM 0xFFFF 18 16 15 13 12 0 Row Row Bank Offset Byte offset within a row (8KB) XOR bit [15:13] with bit [18:16] to select a bank P0 P1 P7 Rows … 0x4000 P9 P8 P1 0x2000 Bank 0 Bank 1 Bank 7 P0 8KB 0x0000 Physical Address Space 10

Task 1: Find the Mapping to DRAM • Measurement: Request Type Approximate Latency (CPU cycles) Cache hit 30 Row hit in the same bank 170 60% increase Row hit in a different bank 220 Row miss 270 • The cache hit latency includes the overhead of extra assembly instructions • Under investigation: Why does row hit in a different bank incurs extra latency? 11

Task2: Microbenchmark • Kernel: Allocates 128KB of memory space(guaranteed to be contiguous physical pages) Test 1: Striding within a row Row X+1 -> Results in row hits Bank Y Base + (9 * 8KB) Test 2: Zigzag b/w 2 rows in the same bank … -> Results in row misses Base + 8KB Row X Bank Y Base 12

Why Understand Mapping to DRAM? • Enables mapping application data to exploit locality • Pages mapped to rows: – Data accesses to the same row incur low latency – Colocate frequently accessed data in same row • Next cache line prefetched: – Accessing next cache line incurs low latency – Map data accessed together to adjacent cache lines 13

Data Mapping Benefits in Redis • Is memory access the bottleneck? • Profiling using Performance API (PAPI) – An interface to hardware performance counters • Profile set and get key functions – Determine what fraction of cycles are set and get 14

Data Mapping Benefits in Redis 0.35 0.3 Fraction of Cycles 0.25 0.2 0.15 Set Cycle Fraction 0.1 Get Cycle Fraction 0.05 0 Number of Random Queries Memory is not a significant bottleneck in Redis 15

Sensitivity to Payload Size 0.4 0.35 Fraction Of Cycles 0.3 0.25 0.2 Set Fraction 0.15 0.1 0.05 0 2 4 64 128 8192 16384 32768 65536 Payload Size Memory still not a significant bottleneck in Redis 16

Next Steps • Row-hit vs. miss behavior on Redis : – Memmap to allocate data contiguously in a page – Microbenchmarks to access same and different rows/pages Row X+1 Bank Y … Row X Bank Y 17

More Potential for Data Mapping? • Single-node databases • Mainframe transaction processing systems • Data analytics systems 18

Dataset • Could not find suitable in-memory dataset • We constructed our own dataset based on the English Wikipedia corpus 1. XML dump of current revisions for all English articles • 43GB (uncompressed) • 11/04/2013 • http://dumps.wikimedia.org/enwiki/20131104/enwiki-20131104- pages-articles.xml.bz2 2. Article hit-count log (one hour) • 307MB (uncompressed) • Last hour of 11/04/2013 • http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013- 11/pagecounts-20131105-000001.gz 19

Dataset (cont’d) • Sanitation was unexpectedly non-trivial... – Spam and/or invalid user queries – ASCII vs. UTF-8 vs. ISO/IEC 8859-1 – URI escape characters, HTML escape characters – Running out of memory • Sanitized dataset – 141K key-value pairs: (title, article) – 3.6GB (uncompressed) 20

Optimizing Redis for Locality and Capacity Kevin C., Yoongu K. - PowerPoint PPT Presentation

Optimizing Redis for Locality and Capacity Kevin C., Yoongu K. Lavanya S. 15-799 Project Presentation 12/4/2013 1 Goals of Our Project Leverage DRAM and dataset characteristics to improve performance of in-memory database Locality :

Redis for Fast Data Ingest Agenda Fast Data Ingest and its challenges Redis for Fast

Redis Graph A graph database built on top of redis Whats Redis? Open source in-memory

Multiple NoSQL Use Cases with Redis Modules Kamran Yousaf kamran@redislabs.com About Redis Open

Redis 2.2 October 27 th 2010 Pieter Noordhuis Who am I? Live in Groningen, NL Redis

Redis Presentation by Atreyee Maiti What is redis? an in-memory key-value store, with

Intro to Redis Streams IMCSUMMIT - NOVEMBER 2019 | DAVE NIELSEN What is a data stream?

RedisGears Redis in memory data processing JUNE 2019 | PIETER CAILLIAU About me Produced

redis cluster or: distributed systems are hard Jan-Erik Rediger 28. Mai 2015 Hi, Im Jan-Erik

Optimizing monitoring networks for Optimizing monitoring networks for Optimizing monitoring

What is Redis? Open source in-memory data structure store used as What is A database A

Build Highly Resilient Applications with Redis Enterprise Clustering MAY 2019 | MANUEL HURTADO

Redis Cluster a pragmatic approach to distribution All nodes are directly connected with a

Redis 101 A whirlwind tour of the next big thing in NoSQL data storage P E T E R C O O P E R h

Leveraging bloom filters on Redis Cristian Castiblanco me@cristian.io | cristian@scopely.com

Statistical analysis of flow data using Python and Redis DRAFT FLOCON 2013 Kevin Noble

Redis and Memcached Speaker: Vladimir Zivkovic, Manager, IT June, 2019 Problem Scenario

Top-K Query Processing D. Gunopulos 1 Multimedia Top-K Queries The IBM QBIC project (90s):

Advances in Programming Languages APL13: Concurrency Abstractions David Aspinall School of

Hit Finder Validation & Prospects for Purity Measurement Matthew Thiesse 7 September 2016

A New Cache Monitoring Scheme for Memory-Aware Scheduling and Partitioning G. Edward Suh

Parity to Safety in Polynomial Time for Pushdown and Collapsible Pushdown Games Matthew Hague

PAPI-NUMA: Middleware to Support Hardware Sampling IVONNE LOPEZ AND SHIRLEY

Christopher Dilks for the STAR Collaboration Spin2014 The 21 st International Symposium on Spin

The CBM Time-of-Flight wall Ingo Deppner Physikalisches Institut der Uni. Heidelberg Outline:

Optimizing Redis for Locality and Capacity Kevin C., Yoongu K. - PowerPoint PPT Presentation

Optimizing Redis for Locality and Capacity Kevin C., Yoongu K. Lavanya S. 15-799 Project Presentation 12/4/2013 1 Goals of Our Project Leverage DRAM and dataset characteristics to improve performance of in-memory database Locality :

Redis for Fast Data Ingest Agenda Fast Data Ingest and its challenges Redis for Fast

Redis Graph A graph database built on top of redis Whats Redis? Open source in-memory

Multiple NoSQL Use Cases with Redis Modules Kamran Yousaf kamran@redislabs.com About Redis Open

Redis 2.2 October 27 th 2010 Pieter Noordhuis Who am I? Live in Groningen, NL Redis

Redis Presentation by Atreyee Maiti What is redis? an in-memory key-value store, with

Intro to Redis Streams IMCSUMMIT - NOVEMBER 2019 | DAVE NIELSEN What is a data stream?

RedisGears Redis in memory data processing JUNE 2019 | PIETER CAILLIAU About me Produced

redis cluster or: distributed systems are hard Jan-Erik Rediger 28. Mai 2015 Hi, Im Jan-Erik

Optimizing monitoring networks for Optimizing monitoring networks for Optimizing monitoring

What is Redis? Open source in-memory data structure store used as What is A database A

Build Highly Resilient Applications with Redis Enterprise Clustering MAY 2019 | MANUEL HURTADO

Redis Cluster a pragmatic approach to distribution All nodes are directly connected with a

Redis 101 A whirlwind tour of the next big thing in NoSQL data storage P E T E R C O O P E R h

Leveraging bloom filters on Redis Cristian Castiblanco me@cristian.io | cristian@scopely.com

Statistical analysis of flow data using Python and Redis DRAFT FLOCON 2013 Kevin Noble

Redis and Memcached Speaker: Vladimir Zivkovic, Manager, IT June, 2019 Problem Scenario

Top-K Query Processing D. Gunopulos 1 Multimedia Top-K Queries The IBM QBIC project (90s):

Advances in Programming Languages APL13: Concurrency Abstractions David Aspinall School of

Hit Finder Validation &amp; Prospects for Purity Measurement Matthew Thiesse 7 September 2016

A New Cache Monitoring Scheme for Memory-Aware Scheduling and Partitioning G. Edward Suh

Parity to Safety in Polynomial Time for Pushdown and Collapsible Pushdown Games Matthew Hague

PAPI-NUMA: Middleware to Support Hardware Sampling IVONNE LOPEZ AND SHIRLEY

Christopher Dilks for the STAR Collaboration Spin2014 The 21 st International Symposium on Spin

The CBM Time-of-Flight wall Ingo Deppner Physikalisches Institut der Uni. Heidelberg Outline:

Hit Finder Validation & Prospects for Purity Measurement Matthew Thiesse 7 September 2016