Redis for Fast Data Ingest Agenda Fast Data Ingest and its - - PowerPoint PPT Presentation

redis for fast data ingest agenda
SMART_READER_LITE
LIVE PREVIEW

Redis for Fast Data Ingest Agenda Fast Data Ingest and its - - PowerPoint PPT Presentation

Home of Redis Redis for Fast Data Ingest Agenda Fast Data Ingest and its challenges Redis for Fast Data Ingest Pub/Sub List Sorted Sets as a Time Series Database The Demo Scaling with Redis e Flash 2


slide-1
SLIDE 1

Home of Redis

Redis for Fast Data Ingest

slide-2
SLIDE 2

2

Agenda

  • Fast Data Ingest and its challenges
  • Redis for Fast Data Ingest
  • Pub/Sub
  • List
  • Sorted Sets as a Time Series

Database

  • The Demo
  • Scaling with Redise Flash
slide-3
SLIDE 3

Fast Data Ingest Scenarios

slide-4
SLIDE 4

4

IOT

slide-5
SLIDE 5

5

Network Traffic Inspection

slide-6
SLIDE 6

6

Social Media Analysis

slide-7
SLIDE 7

7

More Scenarios And more…

Log Collection User Activity Tracking Multi-player Gaming Fintech

slide-8
SLIDE 8

8

Fast Data Ingest Challenges

  • Keeping up with the pace of data arrival
  • Data from multiple sources with no standard data format
  • Filter, analyze, and transform data in real-time
  • Managing data arriving from sources distributed geographically
slide-9
SLIDE 9

9

Requirements for Fast Data Ingest

  • Physical infrastructure – network, computational resources, etc.
  • Software stack to:
  • Filter
  • Aggregate
  • Transform
  • Distribute

data in real-time with sub-millisecond latency

slide-10
SLIDE 10

Fast Data Ingest with Redis

slide-11
SLIDE 11

11

About Redis

Open source. The leading in-memory database platform, supporting any high performance

  • perational, analytics or hybrid use case.

The open source home and commercial provider

  • f Redis Enterprise (Redise) technology, platform,

products & services.

slide-12
SLIDE 12

12

Redis for Fast Data Ingest

slide-13
SLIDE 13

13

Redis for Fast Data Ingest

Lists Sorted Sets Hashes Hyperloglog Geospatial Indexes Bitmaps Sets Strings Bit field

Redis Data Structures

Publisher Channel Subscriber 1 Subscriber 2 Subscriber 3 Subscriber n

Redis Pub/Sub

slide-14
SLIDE 14

Common Ingest Techniques in Redis

slide-15
SLIDE 15

15

Pub/Sub

Commands Publisher:

publish <channel name> <message>

Subscriber:

subscribe <channel name>

Publisher Channel Subscriber 1 Subscriber 2 Subscriber 3 Subscriber n

slide-16
SLIDE 16

16

List

Publisher Subscriber 1 Subscriber 2 Subscriber 3 Subscriber n

Commands Publisher:

lpush <list name> <message>

Subscriber:

brpop <list name> <timeout>

slide-17
SLIDE 17

17

Sorted Set

Commands Publisher:

zadd <timeseries name> <timestamp> <message>

Subscriber:

zrangebyscore <timeseries name> <last timestamp> <current timestamp> WITHSCORES

Publisher Subscriber 1 Subscriber 2 Subscriber 3 Subscriber n

slide-18
SLIDE 18

The Demo

slide-19
SLIDE 19

19

Demo: Problem Description

English Tweets Filter Influencer Tweets Filter Popular hashtags among English tweets Influencer Catalog All Tweets

Sample Tweet Message in the JSON format:

{ "created_at":"Tue Jul 11 17:06:03 +0000 2017", "id":884821096440004600, "text":"USGS reports a M2 #earthquake 31km WSW of Enterprise, Utah on 7/11/17 @ 17:01:53 UTC https://t.co/xXQH2Mfy93 #quake", "user":{ "id":1414684496, "name":"Every Earthquake", "screen_name":"everyEarthquake", "location":"Earth", "followers_count":18978, "friends_count":17, "lang":"en" } }

"lang":"en" followers_count > 10000 Match pattern “#(\\w+)” Increment count for that pattern Map influencer id to profile Sorted Set: follower count -> id

slide-20
SLIDE 20

20

Demo Setup

Service Provider for Messages Programming Language for the demo IDE Redis container on Docker

slide-21
SLIDE 21

21

The Three Data Ingest Techniques

Fast Data Ingest Technique Pros Cons Pub/Sub

  • Easy
  • Decoupled setup
  • Good for geographically

distributed setup

  • Not resilient to connection

loss

  • Requires many connections

Lists

  • Easy
  • Resilient to connection loss
  • Tightly coupled producers and

consumers

  • Data duplication

Sorted Sets

  • Resilient to connection loss
  • Least chance of losing data
  • Access to historical data
  • Loosely coupled producers and

consumers

  • Consumes space
  • Complex logic
slide-22
SLIDE 22

Technique 1: Fast Data Ingest with Pub/Sub

slide-23
SLIDE 23

23

Fast Data Ingest with Pub/Sub

EnglishTweetsFilter InfluencerTweetsFilter HashTagCollector InfluencerCollector Ingest PubSub AllTweets English Tweets Influencer Tweets

  • Easy
  • Decoupled setup
  • Good for geographically distributed setup

Advantages

slide-24
SLIDE 24

24

Class Diagrams and Sample Code

https://github.com/redislabsdemo/IngestPubSub

slide-25
SLIDE 25

Technique 2: Fast Data Ingest with Lists

slide-26
SLIDE 26

26

Fast Data Ingest with Lists

EnglishTweetsFilter InfluencerFilter HashTagFilter Ingest Stream AllTweets Listener EnglishTweets Listener

alldata englishtweets

  • Easy
  • Resilient to connection loss

Advantages

slide-27
SLIDE 27

27

Class Diagrams and Sample Code

https://github.com/redislabsdemo/IngestList

slide-28
SLIDE 28

Technique 3: Fast Data Ingest with Sorted Sets

slide-29
SLIDE 29

29

Fast Data Ingest with Sorted Sets

EnglishTweetsFilter InfluencerFilter HashTagFilter Ingest Stream

alltweets englishtweets

  • Resilient to connection loss
  • Least chance of losing data
  • Access to historical data
  • Loosely coupled producers and consumers

Advantages

slide-30
SLIDE 30

30

Class Diagrams and Sample Code

https://github.com/redislabsdemo/IngestSortedSet

slide-31
SLIDE 31

Redise for Fast Data Ingest

slide-32
SLIDE 32

32

Redise Technology

Redis Database Instances

slide-33
SLIDE 33

33

Redise Technology

Cluster Manager

Enterprise Layer Open Source Layer

REST API Zero latency proxy

slide-34
SLIDE 34

34

Redise Technology

Enterprise Layer Open Source Layer

Zero latency proxy Cluster Manager REST API

Redise Node

slide-35
SLIDE 35

35

Redise Technology

Redise Cluster

  • Shared nothing cluster architecture
  • Fully compatible with open source

commands & data structures

slide-36
SLIDE 36

36

Redise - Shared Nothing Symmetric Architecture

Cluster Management Path Proxies Node Watchdog Cluster Watchdog Node 1 Node 2 Node N (odd number) … Redis Shards Unique multi-tenant “Docker” like architecture enables running hundreds of databases over a single, average cloud instance without performance degradation and with maximum security provisions Data Path Distributed Proxies Single or Multiple Endpoints

slide-37
SLIDE 37

37

Redise Benefits for Data Ingest

Effortless Scaling

Simple, Seamless

  • Clustering. Linear scaling

ACID Compliance in Cluster Architecture

Substantially Lower Costs

Run on Flash as a RAM extension Top notch 24x7 expert support

Always On Availability

Instant Failure Recovery, No Data loss Stable and Predictable High Performance

slide-38
SLIDE 38

38

Redise Flash

  • Near-RAM performance at 70%+ lower

costs

  • Technology treats Flash as a RAM

replacement (or extension)

  • RAM/Flash ratio can be easily configured
  • Pluggable storage engine
  • Available on SATA-based SSD, NVMe-based

SSD, NVDIMM like 3D XPoint/SCM on x86 and P8 platforms

2048 GB

RAM

204 GB

RAM

1844 GB

Flash

10% 90%

Keys & hot values Cold values

slide-39
SLIDE 39

39

Redise Flash - 10TB Redis Deployment on EC2

Redis on RAM Redise Flash

Dataset size 10 TB 10 TB Database size with replication 30 TB 20 TB AWS instance type x1.32xlarge i3.16xlarge Actual instance size (RAM, and RAM+Flash) 1.46 TB 3.66 TB # of instances needed 21 6 + 1 (for quorum) Persistent Storage (EBS) 154 TB 110 TB 1 year cost (reserved instances) $1,595,643 $298,896 Savings

  • 81.27%

*

* Redis Enterprise only needs 1 copy of the data because quorum issues are solved at the node level

slide-40
SLIDE 40

40

Questions

?

?

?

?

?

?

?

? ?

? ?

slide-41
SLIDE 41

41

One more thing….

redis.conf setting:

client-output-buffer-limit pubsub 32mb 8mb 60

With this setting, Redis will force the clients to disconnect under two situations:

  • If the output buffer grows more than 32mb
  • If the output buffer holds 8mb of data consistently for 60 seconds
slide-42
SLIDE 42

42

Thank You

roshan@redislabs.com @roshankumar

Roshan Kumar

expert@redislabs.com @redislabs

Redis Labs