What the heck is an In-Memory Data Grid? @addisonhuddy How are we - - PowerPoint PPT Presentation

what the heck is an in memory data grid
SMART_READER_LITE
LIVE PREVIEW

What the heck is an In-Memory Data Grid? @addisonhuddy How are we - - PowerPoint PPT Presentation

What the heck is an In-Memory Data Grid? @addisonhuddy How are we going to answer this question? 1. Tell you about my first introduction to IMDGs 2. See some real-world use cases 3. Design an IMDG 4. Implement Use Cases Definition IMDGs


slide-1
SLIDE 1

What the heck is an In-Memory Data Grid?

@addisonhuddy

slide-2
SLIDE 2

How are we going to answer this question?

1. Tell you about my first introduction to IMDGs 2. See some real-world use cases 3. Design an IMDG 4. Implement Use Cases

slide-3
SLIDE 3

Definition

IMDGs provide a lightweight, distributed, scale-out in-memory object store — the data grid. Multiple applications can concurrently perform transactional and/or analytical operations in the low-latency data grid, thus minimizing access to high-latency, hard-disk-drive-based or solid-state-drive-based data storage.1

Gartner

1 https://www.gartner.com/reviews/market/in-memory-data-grids

slide-4
SLIDE 4

My First Thought

slide-5
SLIDE 5

My Second Thought

slide-6
SLIDE 6

Two Examples

5,700 train stations 4.5 million tickets per day 20 million daily users 1.4 billion page views per day 40,000 visits per second

China Railway Corporation

70+ cities 4,000 daily flights 706 aircraft Largest airline website by visitors

Southwest Airlines

slide-7
SLIDE 7

When Not To Use An IMDG

  • Small Amounts of Data
  • Low-latency isn’t mission critical
  • Not a total replacement for RDBMS
slide-8
SLIDE 8

Let’s Make an IMDG

slide-9
SLIDE 9

Design Goals

  • Extremely Low Latency
  • High Throughput
  • Durability
  • Large Datasets
  • Consistency?
slide-10
SLIDE 10
  • Memory First
  • Horizontal Scalability /

Elasticity

  • Data Aware Routing
  • Serialization /

Deserialization

Design Goals

  • Extremely Low Latency
  • High Throughput
  • Durability
  • Large Datasets
  • Consistency
slide-11
SLIDE 11

https://github.com/apache/geode

slide-12
SLIDE 12

Memory First

slide-13
SLIDE 13

Latency Comparison

Latency Comparison Numbers

  • L1 cache reference 0.5 ns

Branch mispredict 5 ns L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Compress 1K bytes with Zippy 3,000 ns 3 us Send 1K bytes over 1 Gbps network 10,000 ns 10 us SSD Seek 100,000 ns 100 us Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD Read 1 MB sequentially from memory 250,000 ns 250 us Round trip within same datacenter 500,000 ns 500 us Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms

1 Credit Jeff Dean, Peter Norvig, and Jonas Bonér

slide-14
SLIDE 14

Hardware True Time Scaled Time Memory 250,100 ns 2 days SSD 1,100,000 ns 9 days Disk 30,000,000 8 months

Why Memory?

Read 1 MB Comparison

slide-15
SLIDE 15

Horizontal Scalability / Elasticity

slide-16
SLIDE 16

System Architecture

Server Server Server Server Locator Locator Client

...

Client Client Client Client Client Client Client Client Client

slide-17
SLIDE 17

System Architecture

Server Server Locator Locator Client

...

Client Client Client Client Client Client Client Client Client

slide-18
SLIDE 18

System Architecture

Server Server Server Locator Locator Client

...

Client Client Client Client Client Client Client Client Client

slide-19
SLIDE 19

IMDGs & CAP Theorem

Availability Consistency Partition

Tolerance

slide-20
SLIDE 20

WAN Replication

lient

S S S S L L S S S S L L

Data Center (NYC) Data Center (Tokyo)

slide-21
SLIDE 21

Data Aware Routing

slide-22
SLIDE 22

Latency Comparison

Latency Comparison Numbers

  • L1 cache reference 0.5 ns

Branch mispredict 5 ns L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Compress 1K bytes with Zippy 3,000 ns 3 us Send 1K bytes over 1 Gbps network 10,000 ns 10 us SSD Seek 100,000 ns 100 us Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD Read 1 MB sequentially from memory 250,000 ns 250 us

Round trip within same datacenter 500,000 ns 500 us

Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms

1 Credit Jeff Dean, Peter Norvig, and Jonas Bonér

slide-23
SLIDE 23

Single Hop

Server Server Server Server Locator Locator Client

...

Client Client Client Client Client Client Client Client Client

slide-24
SLIDE 24

Local Cache

Server Server Server Server Locator Locator Client

...

Client Client Client Client Client Client Client Client Client

slide-25
SLIDE 25

Local Cache

Server Server Server Server Locator Locator Client

...

Client Client Client Client Client Client Client Client Client

slide-26
SLIDE 26

Serialization

1. Only (de)serialize when it is necessary 2. Only (de)serialize what is absolutely necessary 3. Distribute (de)serialize cost as much as possible

slide-27
SLIDE 27

Basic User Operations

slide-28
SLIDE 28

What have we created?

  • Put/Get
  • Queries
  • Server-side functions
  • Registered Interests
  • Continuous Queries
  • Event Queues
  • Key/Value Object Store
  • Share-nothing

architecture

  • Memory Oriented
  • Strongly Consistent
slide-29
SLIDE 29

Use Cases

slide-30
SLIDE 30

In-line Caching

S S S S L L Client Client Client Client C

RDBMS

slide-31
SLIDE 31

Look-Aside Caching

S S S S L L Client Client Client Client C

RDBMS

slide-32
SLIDE 32

Look-Aside Caching

S S S S L L Client Client Client Client C

RDBMS

slide-33
SLIDE 33

Pub / Sub System

Server Server Server Server Locator Locator Client

...

Client Client Client Client Client Client Client Client Client

1 2 2

slide-34
SLIDE 34

Real-Time Analytics with Functions

Server Server Server Server Locator Locator Client

...

Client Client Client Client Client Client Client Client Client

slide-35
SLIDE 35

Distributed Computation

Server Server Server Server Client Client Cient

slide-36
SLIDE 36

Real-Time Analytics

Server Server Server Server Client Client Client Client Client Client Client Client Client Rapidly Changing Data

slide-37
SLIDE 37

O’Reilly Book

slide-38
SLIDE 38

Questions

@addisonhuddy