Buffering to Redis for Efficient Real-Time Processing
Percona Live, April 24, 2018
Buffering to Redis for Efficient Real-Time Processing Percona - - PowerPoint PPT Presentation
Buffering to Redis for Efficient Real-Time Processing Percona Live, April 24, 2018 Presenting Today Jon Hyman CTO & Co-Founder Braze (Formerly Appboy) @jon_hyman Mobile is at the vanguard of a new wave of borderless engagement.
Percona Live, April 24, 2018
Jon Hyman
CTO & Co-Founder Braze (Formerly Appboy) @jon_hyman
Digital is the main reason just over half of Fortune 500 companies have disappeared since the year 2000
PIERRE NANTERME, CEO, ACCENTURE
[…] the roller coaster will be accelerating faster than ever, only this time it’ll be about actual experiences, with much less emphasis on the way those experiences get made
WALT MOSSBERG, AMERICAN JOURNALIST & FORMER RECODE EDITOR AT LARGE
Mobile is at the vanguard of a new wave of borderless engagement.
SOURCE: DIGITAL DISRUPTION HAS ONLY JUST BEGUN (DAVOS WORLD ECONOMIC FORUM), THE DISAPPEARING COMPUTER (RECODE)Tens of Billions of Messages Sent Monthly
Global Customer Presence More than 1 Billion MAU
ON SIX CONTINENTS
Quick Intro to Redis Coordinating Customer Journeys with Redis Buffering Analytics to Redis
What is Redis?
used as database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs and geospatial indexes with radius queries. Redis has built- in replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.
SET key value NX EX 10
Redis data types
SADD key “a” SADD key “b” SADD key “a” SMEMBERS key [ “a”, “b”]
Redis data types
Sets do not have an ordering.
Redis data types
HSET key foo bar HSET key bar bang HGETALL key
{“foo”:”bar”, “bar”:”bang”}
HINCRBY key baz 1 HINCRBY key baz 3 HGET key baz
“4”
Redis data types
Sorted Sets are ordered by that score.
ZADD scores alice 100 ZADD scores bob 80 ZADD scores carol 110 ZRANGEBYSCORE scores 0 -1 [ [bob, 80], [alice, 100], [carol, 110] ] ZREVRANGEBYSCORE scores 0 -1 [ [carol, 110], [alice, 100], [bob, 80] ]
Canvas
Allows customers to create multi-step, multi-message, multi-day customer journeys
Canvas
Using Redis as a Job Queue
then one worker process ZREMs
branches from processing
Canvas
few months of the general release and all was good
and the number of users entering the canvas.
targeting more than 10 million users to run at 10am the next day.
jobs right at 10am
Thundering Herd: Enqueuing Jobs
the 10am jobs yet. This was now a customer facing incident.
One user per job inefficiencies
Fixing Canvas architectural issues
access to database state, so we made a lot of extra database calls.
we could buffer them and have a single job process multiple users at once.
Fixing one user per job inefficiencies
enqueueing a new job to send a message, create a new set with key “buffer:STEP_ID:TIMESTAMP”. Add user to this set.
to determine if we should enqueue a job to run in 3 seconds which will flush the set.
re-enqueue other jobs to run to continue flushing the set if it is non-empty
Fixing the thundering herd
Results of architectural changes
REST API Buffering
POST /users/track { attributes: [{“user_id”: “123”, “first_name”:”Alice”}], } POST /users/track { attributes: [{“user_id”: “456”, “first_name”:”Bob”}], } POST /users/track { attributes: [ {“user_id”: “123”, “first_name”:”Alice”}, {“user_id”: “456”, “first_name”:”Bob”}, ], }
Less Efficient, 2 Round Trips to Query State More Efficient, 1 Round Trip to Query State
REST API Buffering
POST /users/track { attributes: [{“user_id”: “123”, “first_name”:”Alice”}], } POST /users/track { attributes: [{“user_id”: “456”, “first_name”:”Bob”}], } POST /users/track { attributes: [ {“user_id”: “123”, “first_name”:”Alice”}, {“user_id”: “456”, “first_name”:”Bob”}, ], }
Less Efficient, 2 Round Trips to Query State More Efficient, 1 Round Trip to Query State
We collect a lot of time series analytics
Time series analytics are stored in MongoDB
Non-hashed MongoDB sharding divides data into ranges and puts them on different nodes
Time series data is easy to pre-aggregate
Shard on {app_id:1, name:1, date:1}
{ app_id: “www.braze.com”, date: 2018-04-23, name: “website_visits”, 6: 120, 7: 541, 8: 1200, 9: 800, … }
Use a hash based on shard key where keys are hours and values are the amount to increment by
HINCRBY “www.braze.com|2018-04-23|website_visits” 8 1 SADD "buffered" “www.braze.com|2018-04-23|website_visits”
Flush buffer from Redis to MongoDB keys = SMEMBERS(“buffered”) increment_hashes = REDIS MULTI keys.each {|key| HGETALL(key) } SREM(“buffered”, k) keys.each {|key| DEL(key) } END MULTI keys.each_with_index do |key, i| app_id, name, date = deserialize(key) db.my_timeseries.find( {app_id: app_id, name: name, date: date} ).update_one($inc: increment_hashes[i]) end
* This example algorithm is vulnerable to data loss, do not use directly
We do this with 12 Redis servers to shard out writes to a single MongoDB document
Can buffer the same hash key to each Redis and flush independently
Scale
Mongo for analytics
Summary
efficient to improve throughput
5