Buffering to Redis for Efficient Real-Time Processing Percona - PowerPoint PPT Presentation

Buffering to Redis for   Efficient Real-Time Processing Percona Live, April 24, 2018

Presenting Today Jon Hyman   CTO & Co-Founder Braze (Formerly Appboy) @jon_hyman

Mobile is at the vanguard of a new wave of borderless engagement. […] the roller coaster will be accelerating Digital is the main reason just over faster than ever, only this time it’ll be half of Fortune 500 companies have about actual experiences, with much disappeared since the year 2000 less emphasis on the way those experiences get made PIERRE NANTERME, CEO, ACCENTURE WALT MOSSBERG, AMERICAN JOURNALIST & FORMER RECODE EDITOR AT LARGE SOURCE: DIGITAL DISRUPTION HAS ONLY JUST BEGUN (DAVOS WORLD ECONOMIC FORUM), THE DISAPPEARING COMPUTER (RECODE)

More than   1 Billion   MAU Braze empowers you to humanize your brand-customer relationships at scale. Tens of   Billions of Messages   Sent   Monthly Global   Customer   Presence ON SIX CONTINENTS

Quick Intro to Redis Today Coordinating Customer Journeys with Redis Buffering Analytics to Redis TOC

Quick Intro to Redis

What is Redis? • Redis is an open source (BSD licensed), in-memory data structure store, used as database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs and geospatial indexes with radius queries. Redis has built- in replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster. • Braze uses all the data types from Redis • Today’s talk we’ll look at sorted sets, sets, hashes, and strings

Redis data types • Strings: key value storage. • Redis has atomic operations to set a key if it doesn’t exist and to set expiry • You can use this to create a basic locking mechanism SET key value NX EX 10 • Set “key” to “value” if it does not exist, and expire the key in 10 seconds • Redis returns whether or not the set succeeded

Redis data types • Sets: Lists of string values that do not contain any duplicates.   Sets do not have an ordering. SADD key “a” SADD key “b” SADD key “a” SMEMBERS key [ “a”, “b”]

Redis data types • Hashes: A data structure that can store string keys and string values HSET key foo bar HSET key bar bang HGETALL key {“foo”:”bar”, “bar”:”bang”} • Hashes can also have keys be incremented HINCRBY key baz 1 HINCRBY key baz 3 HGET key baz “4”

Redis data types • Sorted Sets: Like sets, but each element also has a numerical score associated with it. Sorted Sets are ordered by that score. ZADD scores alice 100 ZADD scores bob 80 ZADD scores carol 110 ZRANGEBYSCORE scores 0 -1 [ [bob, 80], [alice, 100], [carol, 110] ] ZREVRANGEBYSCORE scores 0 -1 [ [carol, 110], [alice, 100], [bob, 80] ]

Coordinating Customer Journeys with Redis

Canvas Allows customers to create multi-step, multi-message, multi-day customer journeys

Canvas • Canvas is distributed and event driven • When messages are sent, we fire “received campaign event” • Processes listen for the “received campaign event” and determine if that should schedule new message • If a new message should be scheduled, enqueue a new job process to send the message.

Using Redis as a Job Queue • Jobs are added to Redis sorted set with Unix timestamp as the score and value as job data • One new job added per message • Worker processes on servers poll scheduled set with ZRANGEBYSCORE -INF <now> LIMIT 0 1 , then one worker process ZREM s • ZRANGEBYSCORE -INF <now> LIMIT 0 1 has O(1) runtime due to Redis implementation of sorted sets • ZREM has O(log N) runtime • For canvas, enqueue one job per each branch. • When the job runs, the process determine if the branch path is valid and grab a lock to prevent other branches from processing • Lock takes the form of a SET NX EX operation

Canvas • This architecture worked great in staging, in beta, and for the first few months of the general release and all was good • Processing runtime depends on number of branches a canvas has and the number of users entering the canvas. • January, 2017 one customer created a canvas with 11 branches targeting more than 10 million users to run at 10am the next day. • Canvas architecture design meant we had to process 110 million jobs right at 10am

What happened?

Thundering Herd: Enqueuing Jobs • This particular canvas created 110 million jobs to all run at 10am the next morning at the same timestamp • These jobs are stored in a sorted set, where workers are polling to move jobs from sorted set to queues • ZRANGEBYSCORE -INF <now> LIMIT 0 1 has O(1) runtime due to Redis implementation of sorted sets • ZREM has O(log N) runtime • Every worker server’s ZRANGEBYSCORE would return something, only one process would successfully ZREM the job • Excessive ZREM operations slowed down Redis • It took more than 40 minutes just to enqueue the jobs, meaning that if it was 10:35am, we hadn’t finished enqueuing the 10am jobs yet. This was now a customer facing incident.

One user per job inefficiencies • Each job was one {user, branch} pair • Determining if the user should go down that path involves querying database state and making Redis locks • 110 million roundtrips to each database to determine if processing should continue • It took more than 90 minutes to process the next steps

What did we do?

Fixing Canvas architectural issues • Initial code design was inefficient: one job per {user, branch} pair. Each job needs access to database state, so we made a lot of extra database calls. • Because messages tend to go to multiple users around the same time, we figured we could buffer them and have a single job process multiple users at once.

Use Redis sets as a buffer

Fixing one user per job inefficiencies • When a “received campaign event” is fired, instead of enqueueing a new job to send a message, create a new set with key “buffer:STEP_ID:TIMESTAMP”. Add user to this set. • This lets users buffer up for the same timestamp. • Periodically flush this set in batches of 100 users: • When doing an SADD , also do a SET NX EX to a key to determine if we should enqueue a job to run in 3 seconds which will flush the set. • The job does an SPOP 100 to get 100 elements, and will   re-enqueue other jobs to run to continue flushing the set if it is non-empty

Fixing the thundering herd • Added random microsecond jitter to all jobs in the sorted set to split up one second into a million pieces • Existing code used ZRANGEBYSCORE -INF <now> 0 1 to consume from left side of sorted set • Consume from the right side with ZREVRANGEBYSCORE • Consume from the middle • Keep track of how far backlogged we are in the set • Randomly add jitter or whole seconds to move along the set to start consuming the middle

Results of architectural changes • Saved more than 50 gigabytes of RAM for the original canvas • Instead of 110 million jobs, we enqueued only about 1.4 million jobs • Instead of 40 minutes to enqueue from the sorted set, all jobs enqueued in a few seconds • Next steps of the canvas processed in about 14 minutes, down from 90 minutes.

We adapted buffering in other places, such as our REST API

REST API Buffering • Braze has REST APIs to ingest user attribute data, event data and purchases • Application servers query user state when processing, it is more efficient to make batch roundtrips to databases • We encourage customers to batch data, but some integrations make 1 API call per data point Less Efficient, 2 Round Trips to Query State More Efficient, 1 Round Trip to Query State POST /users/track POST /users/track { { attributes: [{“user_id”: “123”, “first_name”:”Alice”}], attributes: [ } {“user_id”: “123”, “first_name”:”Alice”}, {“user_id”: “456”, “first_name”:”Bob”}, POST /users/track ], } { attributes: [{“user_id”: “456”, “first_name”:”Bob”}], }

REST API Buffering • Braze has REST APIs to ingest user attribute data, event data and purchases • Application servers query user state when processing, it is more efficient to make batch roundtrips to databases • We encourage customers to batch data, but some integrations make 1 API call per data point Less Efficient, 2 Round Trips to Query State More Efficient, 1 Round Trip to Query State POST /users/track POST /users/track { { attributes: [{“user_id”: “123”, “first_name”:”Alice”}], attributes: [ } {“user_id”: “123”, “first_name”:”Alice”}, {“user_id”: “456”, “first_name”:”Bob”}, POST /users/track ], } { attributes: [{“user_id”: “456”, “first_name”:”Bob”}], } • We use the same pattern and SADD data to a Redis set and flush it every second • This lets us buffer multiple API calls and process them together

Improving Writes for   Time Series Analytics

We collect a lot of time series analytics

Time series analytics are stored in MongoDB Non-hashed MongoDB sharding divides data into ranges and puts them on different nodes

Buffering to Redis for Efficient Real-Time Processing Percona - PowerPoint PPT Presentation

Buffering to Redis for Efficient Real-Time Processing Percona Live, April 24, 2018 Presenting Today Jon Hyman CTO & Co-Founder Braze (Formerly Appboy) @jon_hyman Mobile is at the vanguard of a new wave of borderless engagement.

Redis for Fast Data Ingest Agenda Fast Data Ingest and its challenges Redis for Fast

Redis Graph A graph database built on top of redis Whats Redis? Open source in-memory

RedisGears Redis in memory data processing JUNE 2019 | PIETER CAILLIAU About me Produced

Multiple NoSQL Use Cases with Redis Modules Kamran Yousaf kamran@redislabs.com About Redis Open

Redis 2.2 October 27 th 2010 Pieter Noordhuis Who am I? Live in Groningen, NL Redis

Redis Presentation by Atreyee Maiti What is redis? an in-memory key-value store, with

Intro to Redis Streams IMCSUMMIT - NOVEMBER 2019 | DAVE NIELSEN What is a data stream?

redis cluster or: distributed systems are hard Jan-Erik Rediger 28. Mai 2015 Hi, Im Jan-Erik

Real- Real -Time Systems Time Systems Real- -Time Systems Time Systems Real

Real Real- -Time Systems Time Systems Designing a real- Designing a real -time system time

Real- Real -time systems time systems Real- Real -time programming time programming

What is Redis? Open source in-memory data structure store used as What is A database A

Build Highly Resilient Applications with Redis Enterprise Clustering MAY 2019 | MANUEL HURTADO

Real graduates, Real graduates, real transitions, real transitions, real stories: real

Embedding Lua scripts for Redis in C & other lessons learned https://nchan.slact.net talk

Real-Time Response Time Measurement by Integration of Trace Buffering and Aggregation Tools

RE mote DI ctionary S erver Chris Keith James Tavares Overview History Users Logical Data

Promotion Analysis in Multi-Dimensional Space Tianyi Wu (UIUC) Dong Xin (Microsoft Research)

Personal Factors Make a Difference! Research from more than 1,000 published studies on

Multi-Dimensional Arrays Chapter 8 1-Dimentional and 2-Dimentional Arrays In the previous

Organizational Culture Chris May: Vitae Consulting LLC 630-608-7072 cmayconsulting@gmail.com

Creative Audio Programming for the Web with Tone.js Anna Xamb anna.xambo@dmu.ac.uk Music,

LEEN : Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud i # Shadi Ibr ahim,

What does random mean? Random - Something or a group of things that follow no criteria

Buffering to Redis for Efficient Real-Time Processing Percona - PowerPoint PPT Presentation

Buffering to Redis for Efficient Real-Time Processing Percona Live, April 24, 2018 Presenting Today Jon Hyman CTO & Co-Founder Braze (Formerly Appboy) @jon_hyman Mobile is at the vanguard of a new wave of borderless engagement.

Redis for Fast Data Ingest Agenda Fast Data Ingest and its challenges Redis for Fast

Redis Graph A graph database built on top of redis Whats Redis? Open source in-memory

RedisGears Redis in memory data processing JUNE 2019 | PIETER CAILLIAU About me Produced

Multiple NoSQL Use Cases with Redis Modules Kamran Yousaf kamran@redislabs.com About Redis Open

Redis 2.2 October 27 th 2010 Pieter Noordhuis Who am I? Live in Groningen, NL Redis

Redis Presentation by Atreyee Maiti What is redis? an in-memory key-value store, with

Intro to Redis Streams IMCSUMMIT - NOVEMBER 2019 | DAVE NIELSEN What is a data stream?

redis cluster or: distributed systems are hard Jan-Erik Rediger 28. Mai 2015 Hi, Im Jan-Erik

Real- Real -Time Systems Time Systems Real- -Time Systems Time Systems Real

Real Real- -Time Systems Time Systems Designing a real- Designing a real -time system time

Real- Real -time systems time systems Real- Real -time programming time programming

What is Redis? Open source in-memory data structure store used as What is A database A

Build Highly Resilient Applications with Redis Enterprise Clustering MAY 2019 | MANUEL HURTADO

Real graduates, Real graduates, real transitions, real transitions, real stories: real

Embedding Lua scripts for Redis in C &amp; other lessons learned https://nchan.slact.net talk

Real-Time Response Time Measurement by Integration of Trace Buffering and Aggregation Tools

RE mote DI ctionary S erver Chris Keith James Tavares Overview History Users Logical Data

Promotion Analysis in Multi-Dimensional Space Tianyi Wu (UIUC) Dong Xin (Microsoft Research)

Personal Factors Make a Difference! Research from more than 1,000 published studies on

Multi-Dimensional Arrays Chapter 8 1-Dimentional and 2-Dimentional Arrays In the previous

Organizational Culture Chris May: Vitae Consulting LLC 630-608-7072 cmayconsulting@gmail.com

Creative Audio Programming for the Web with Tone.js Anna Xamb anna.xambo@dmu.ac.uk Music,

LEEN : Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud i # Shadi Ibr ahim,

What does random mean? Random - Something or a group of things that follow no criteria

Embedding Lua scripts for Redis in C & other lessons learned https://nchan.slact.net talk