redis for fast data ingest agenda
play

Redis for Fast Data Ingest Agenda Fast Data Ingest and its - PowerPoint PPT Presentation

Home of Redis Redis for Fast Data Ingest Agenda Fast Data Ingest and its challenges Redis for Fast Data Ingest Pub/Sub List Sorted Sets as a Time Series Database The Demo Scaling with Redis e Flash 2


  1. Home of Redis Redis for Fast Data Ingest

  2. Agenda • Fast Data Ingest and its challenges • Redis for Fast Data Ingest • Pub/Sub • List • Sorted Sets as a Time Series Database • The Demo Scaling with Redis e Flash • 2

  3. Fast Data Ingest Scenarios

  4. IOT 4

  5. Network Traffic Inspection 5

  6. Social Media Analysis 6

  7. More Scenarios User Activity Tracking Log Collection Multi-player Gaming And more… Fintech 7

  8. Fast Data Ingest Challenges • Keeping up with the pace of data arrival • Data from multiple sources with no standard data format • Filter, analyze, and transform data in real-time • Managing data arriving from sources distributed geographically 8

  9. Requirements for Fast Data Ingest • Physical infrastructure – network, computational resources, etc. • Software stack to: • Filter • Aggregate • Transform • Distribute data in real-time with sub-millisecond latency 9

  10. Fast Data Ingest with Redis

  11. About Redis Open source. The leading in-memory database platform , supporting any high performance operational, analytics or hybrid use case. The open source home and commercial provider of Redis Enterprise (Redis e ) technology, platform, products & services. 11

  12. Redis for Fast Data Ingest 12

  13. Redis for Fast Data Ingest Subscriber 1 Geospatial Indexes Sets Strings Subscriber 2 Channel Publisher Sorted Sets Lists Bitmaps Subscriber 3 Subscriber n Hashes Hyperloglog Bit field Redis Pub/Sub Redis Data Structures 13

  14. Common Ingest Techniques in Redis

  15. Pub/Sub Subscriber 1 Subscriber 2 Publisher Channel Subscriber 3 Subscriber n Commands Publisher: publish <channel name> <message> Subscriber: subscribe <channel name> 15

  16. List Subscriber 1 Subscriber 2 Publisher Subscriber 3 Subscriber n Commands Publisher: lpush <list name> <message> Subscriber: brpop <list name> <timeout> 16

  17. Sorted Set Subscriber 1 Subscriber 2 Publisher Subscriber 3 Subscriber n Commands Publisher: zadd <timeseries name> <timestamp> <message> Subscriber: zrangebyscore <timeseries name> <last timestamp> <current timestamp> WITHSCORES 17

  18. The Demo

  19. Demo: Problem Description Popular hashtags among English English Tweets Filter tweets "lang":"en" Match pattern “#( \\w +)” Increment count for that pattern All Tweets Influencer Catalog Influencer Tweets Filter followers_count > 10000 Sample Tweet Message in the JSON format: Map influencer id to profile Sorted Set: follower count -> id { "created_at":"Tue Jul 11 17:06:03 +0000 2017", "id":884821096440004600, "text":"USGS reports a M2 #earthquake 31km WSW of Enterprise, Utah on 7/11/17 @ 17:01:53 UTC https://t.co/xXQH2Mfy93 #quake", "user":{ "id":1414684496, "name":"Every Earthquake", "screen_name":"everyEarthquake", "location":"Earth", "followers_count":18978, "friends_count":17, "lang":"en" } } 19

  20. Demo Setup Service Provider for Messages Programming Language for the demo IDE Redis container on Docker 20

  21. The Three Data Ingest Techniques Fast Data Ingest Technique Pros Cons • Easy • Not resilient to connection Pub/Sub • Decoupled setup loss • Good for geographically • Requires many connections distributed setup • Easy • Tightly coupled producers and Lists • Resilient to connection loss consumers • Data duplication • Resilient to connection loss • Consumes space Sorted Sets • Least chance of losing data • Complex logic • Access to historical data • Loosely coupled producers and consumers 21

  22. Technique 1: Fast Data Ingest with Pub/Sub

  23. Fast Data Ingest with Pub/Sub English EnglishTweetsFilter HashTagCollector Tweets Ingest AllTweets PubSub Influencer InfluencerTweetsFilter InfluencerCollector Tweets • Easy Advantages • Decoupled setup • Good for geographically distributed setup 23

  24. Class Diagrams and Sample Code https://github.com/redislabsdemo/IngestPubSub 24

  25. Technique 2: Fast Data Ingest with Lists

  26. Fast Data Ingest with Lists EnglishTweets HashTagFilter EnglishTweetsFilter Listener englishtweets Ingest AllTweets Stream Listener alldata InfluencerFilter • Easy Advantages • Resilient to connection loss 26

  27. Class Diagrams and Sample Code https://github.com/redislabsdemo/IngestList 27

  28. Technique 3: Fast Data Ingest with Sorted Sets

  29. Fast Data Ingest with Sorted Sets EnglishTweetsFilter HashTagFilter englishtweets Ingest Stream alltweets InfluencerFilter • Resilient to connection loss • Least chance of losing data Advantages • Access to historical data • Loosely coupled producers and consumers 29

  30. Class Diagrams and Sample Code https://github.com/redislabsdemo/IngestSortedSet 30

  31. Redis e for Fast Data Ingest

  32. Redis e Technology Redis Database Instances 32

  33. Redis e Technology Enterprise Layer Cluster Manager Zero latency proxy REST API Open Source Layer 33

  34. Redis e Technology Redis e Node Enterprise Layer Cluster Manager Zero latency proxy REST API Open Source Layer 34

  35. Redis e Technology Redis e Cluster • Shared nothing cluster architecture • Fully compatible with open source commands & data structures 35

  36. Redis e - Shared Nothing Symmetric Architecture Distributed Proxies Single or Multiple Endpoints Cluster Proxies Management Node Watchdog Path Cluster Watchdog Redis Data Path Shards … Node 1 Node 2 Node N (odd number) Unique multi- tenant “Docker” like architecture enables running hundreds of databases over a single, average cloud instance witho ut performance degradation and with maximum security provisions 36

  37. Redis e Benefits for Data Ingest Substantially Lower Always On Availability Effortless Scaling Costs Instant Failure Recovery, Run on Flash as a RAM Simple, Seamless No Data loss extension Clustering. Linear scaling Stable and Predictable ACID Compliance in Top notch 24x7 expert High Performance Cluster Architecture support 37

  38. Redis e Flash • Near-RAM performance at 70%+ lower 2048 GB costs RAM • Technology treats Flash as a RAM replacement (or extension) • RAM/Flash ratio can be easily configured Keys & hot values Cold values • Pluggable storage engine 204 GB 1844 GB RAM Flash • Available on SATA-based SSD, NVMe-based SSD, NVDIMM like 3D XPoint/SCM on x86 and P8 platforms 10% 90% 38

  39. Redis e Flash - 10TB Redis Deployment on EC2 Redis e Flash Redis on RAM Dataset size 10 TB 10 TB * Database size with replication 30 TB 20 TB AWS instance type x1.32xlarge i3.16xlarge Actual instance size 1.46 TB 3.66 TB (RAM, and RAM+Flash) # of instances needed 21 6 + 1 (for quorum) Persistent Storage (EBS) 154 TB 110 TB 1 year cost (reserved instances) $1,595,643 $298,896 Savings - 81.27% * Redis Enterprise only needs 1 copy of the data because quorum issues are solved at the node level 39

  40. Questions ? ? ? ? ? ? ? ? ? ? ? 40

  41. One more thing…. redis.conf setting: client-output-buffer-limit pubsub 32mb 8mb 60 With this setting, Redis will force the clients to disconnect under two situations: • If the output buffer grows more than 32mb • If the output buffer holds 8mb of data consistently for 60 seconds 41

  42. Thank You Roshan Kumar Redis Labs roshan@redislabs.com expert@redislabs.com @roshankumar @redislabs 42

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend