timelines at scale @ra ffi qcon sf 2012 Pull Push Targeted - - PowerPoint PPT Presentation

timelines at scale
SMART_READER_LITE
LIVE PREVIEW

timelines at scale @ra ffi qcon sf 2012 Pull Push Targeted - - PowerPoint PPT Presentation

timelines at scale @ra ffi qcon sf 2012 Pull Push Targeted twitter.com User / Site Streams home_timeline API Mobile Push (SMS, etc.) Queried Search API Track / Follow Streams the challenge > 150M world wide active users > 300K


slide-1
SLIDE 1

timelines at scale

@raffi qcon sf 2012

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4

Pull Push Targeted twitter.com

home_timeline API User / Site Streams Mobile Push (SMS, etc.)

Queried Search API

Track / Follow Streams

slide-5
SLIDE 5

the challenge

⇢> 150M world wide active users ⇢> 300K QPS for timelines ⇢naïve timeline “materialization” can be slow

slide-6
SLIDE 6

Timeline Service Ingester Search Cache Redis Redis Earlybird Blender Push Compute HTTP Push Mobile Push Batch Compute Hadoop Write API Fanout Redis Redis Timeline Cache Redis

slide-7
SLIDE 7

Timeline Service Ingester Search Cache Redis Redis Earlybird Blender Push Compute HTTP Push Mobile Push Batch Compute Hadoop Write API Fanout Redis Redis Timeline Cache Redis Social Graph Service

slide-8
SLIDE 8

Timeline Service Ingester Search Cache Redis Redis Earlybird Blender Push Compute HTTP Push Mobile Push Batch Compute Hadoop Write API Fanout Redis Redis Timeline Cache Redis Social Graph Service

insert

⇢keyed off “recipient” ⇢pipelined 4k “destinations” at a time ⇢replicated

slide-9
SLIDE 9

Timeline Service Ingester Search Cache Redis Redis Earlybird Blender Push Compute HTTP Push Mobile Push Batch Compute Hadoop Write API Fanout Redis Redis Timeline Cache Redis

using redis

⇢native list structure

Tweet ID Bits User ID 8 bytes 4 bytes 8 bytes

slide-10
SLIDE 10

Timeline Service Ingester Search Cache Redis Redis Earlybird Blender Push Compute HTTP Push Mobile Push Batch Compute Hadoop Write API Fanout Redis Redis Timeline Cache Redis

using redis

⇢native list structure ⇢RPUSHX to

  • nly add to

cached timelines

Tweet ID Bits User ID Tweet ID Bits User ID Tweet ID Bits User ID Tweet ID Bits User ID Tweet ID Bits User ID Tweet ID Bits User ID Tweet ID Bits User ID Tweet ID Bits User ID Tweet ID Bits User ID Tweet ID Bits User ID Tweet ID Tweet ID Tweet ID

slide-11
SLIDE 11

Ingester Search Cache Redis Redis Earlybird Blender Push Compute HTTP Push Mobile Push Batch Compute Hadoop Write API Fanout Redis Redis Timeline Cache Timeline Service Redis

slide-12
SLIDE 12

Timeline Service Write API Fanout Redis Redis Timeline Cache Redis TweetyPie Gizmoduck

slide-13
SLIDE 13

Pull Push Targeted twitter.com

home_timeline API User / Site Streams Mobile Push (SMS, etc.)

Queried Search API

Track / Follow Streams

slide-14
SLIDE 14

Ingester Search Cache Redis Redis Earlybird Blender Push Compute HTTP Push Mobile Push Batch Compute Hadoop Write API Fanout Redis Redis Timeline Cache Timeline Service Redis

slide-15
SLIDE 15

Push Compute HTTP Push Mobile Push Batch Compute Hadoop Search Index Blender Redis Timeline Service Ingester Earlybird Write API Fanout Redis Redis Timeline Cache Redis Earlybird

blender

⇢queries one replica of all indexes ⇢merges & ranks results

slide-16
SLIDE 16

Push Compute HTTP Push Mobile Push Batch Compute Hadoop Search Index Blender Redis Timeline Service Ingester Earlybird Write API Fanout Redis Redis Timeline Cache Redis Earlybird

slide-17
SLIDE 17

Write API Redis Redis Redis Write API Earlybird Earlybird Earlybird API Cache Read API Redis Redis Redis Read API Earlybird Earlybird Earlybird API Cache

⇢O(n) write ⇢O(1) write ⇢O(1) read ⇢O(n) read

slide-18
SLIDE 18

the challenge (part #2)

⇢fanout can be really slow! ⇢...especially for high follower counts

slide-19
SLIDE 19
slide-20
SLIDE 20

@barackobama 23 million followers 31 million followers @katyperry 28 million followers @justinbieber 28 million followers @raffi 0.019 million followers @ladygaga

slide-21
SLIDE 21

there are over 400 million tweets a day

slide-22
SLIDE 22

a second 4600 tweets 0.2 ms a twee ≈

slide-23
SLIDE 23
slide-24
SLIDE 24

Write API Ingester Fanout Search Index Redis Earlybird Earlybird Redis Redis Redis Timeline Cache

search index ⇢[‘hello’,‘world’] fanout index ⇢[@danadanger, ...]

slide-25
SLIDE 25

User Intent Query Expansion “Hello, world” “Hello” AND “world” @raffi’s home timeline home_timeline:raffi

slide-26
SLIDE 26

User Intent Query Expansion “Hello, world” “Hello” AND “world” @raffi’s home timeline user_timeline:nelson OR user_timeline:danadanger

slide-27
SLIDE 27

User Intent Query Expansion “Hello, world” “Hello” AND “world” @raffi’s home timeline home_timeline:raffi

slide-28
SLIDE 28

User Intent Query Expansion “Hello, world” “Hello” AND “world” @raffi’s home timeline home_timeline:raffi OR user_timeline:taylorswift13

slide-29
SLIDE 29

Batch Compute Hadoop Push Compute HTTP Push Search Index Blender Redis Timeline Service Ingester Earlybird Write API Fanout Redis Redis Timeline Cache Redis Earlybird Mobile Push

slide-30
SLIDE 30

Asynchronous Path Query Path Batch Compute Hadoop Synchronous Path Push Compute HTTP Push Search Index Blender Redis Timeline Service Ingester Earlybird Write API Fanout Redis Redis Timeline Cache Redis Earlybird Mobile Push

slide-31
SLIDE 31

Synchronous Path Query Path Batch Compute Hadoop Asynchronous Path Push Compute HTTP Push Search Index Blender Redis Timeline Service Ingester Earlybird Write API Fanout Redis Redis Timeline Cache Redis Earlybird Mobile Push

slide-32
SLIDE 32

Asynchronous Path Synchronous Path Batch Compute Hadoop Query Path Push Compute HTTP Push Search Index Blender Redis Timeline Service Ingester Earlybird Write API Fanout Redis Redis Timeline Cache Redis Earlybird Mobile Push

slide-33
SLIDE 33

timeline query statistics

⇢>150m active users worldwide ⇢>300k qps poll-based timelines @ 1ms p50 / 4ms p99 ⇢>30k qps search-based timelines

slide-34
SLIDE 34

tweet input

⇢~400m tweets per day ⇢~5K/sec daily average ⇢~7K/sec daily peak ⇢>12K/sec during large events

slide-35
SLIDE 35

timeline delivery statistics

⇢30b deliveries / day (~21m / min) ⇢3.5 seconds @ p50 to deliver to 1m ⇢~300k deliveries / sec

slide-36
SLIDE 36
slide-37
SLIDE 37
slide-38
SLIDE 38
slide-39
SLIDE 39
slide-40
SLIDE 40
slide-41
SLIDE 41
slide-42
SLIDE 42

thanks!