SCALING INSTAGRAM INFRA Lisa Guo Nov 7th, 2016 lguo@instagram.com - - PowerPoint PPT Presentation

scaling instagram infra
SMART_READER_LITE
LIVE PREVIEW

SCALING INSTAGRAM INFRA Lisa Guo Nov 7th, 2016 lguo@instagram.com - - PowerPoint PPT Presentation

SCALING INSTAGRAM INFRA Lisa Guo Nov 7th, 2016 lguo@instagram.com INSTAGRAM HISTORY 2012/4/3 2010 Android 2014/1 release 2012/4/9 2011 Facebook 14M users acquisition INSTAGRAM EVERYDAY 300 Million Users 4.2 Billion likes 95 Million


slide-1
SLIDE 1

SCALING INSTAGRAM INFRA

Lisa Guo— Nov 7th, 2016 lguo@instagram.com

slide-2
SLIDE 2
slide-3
SLIDE 3

INSTAGRAM HISTORY

2010 2011 14M users 2012/4/3 Android release 2012/4/9 Facebook acquisition 2014/1

slide-4
SLIDE 4

INSTAGRAM EVERYDAY

300 Million Users 4.2 Billion likes 95 Million photo/video uploads 100 Million followers

slide-5
SLIDE 5
slide-6
SLIDE 6

SCALING MEANS

Scale out

Scale up

Scale dev team

slide-7
SLIDE 7

SCALE OUT

slide-8
SLIDE 8

SCALE OUT

“To scale horizontally means to add more nodes to a system, such as adding a new computer to a distributed software application. An example might involve scaling out from one Web server system to three.”

  • Wikipedia
slide-9
SLIDE 9

MICROSERVICE

slide-10
SLIDE 10

SCALING OUT

—> —> —> vertical partition horizontal sharding

slide-11
SLIDE 11

SCALING OUT

slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14

INSTAGRAM STACK

memcache RabbitMQ PostgreSQL Cassandra Celery Other Services

Django

slide-15
SLIDE 15

STORAGE VS. COMPUTING

  • Storage: needs to be consistent across data centers
  • Computing: driven by user traffic, as needed basis
slide-16
SLIDE 16

SCALE OUT: STORAGE

  • Masterless
  • Async, low latency
  • Multiple data center ready
  • Tunable latency vs consistency trade-off

user feeds, stories, activities, and other logs

slide-17
SLIDE 17

SCALE OUT: STORAGE

user, media, friendship etc

  • One master, replicas are in each region
  • Reads are done locally
  • Writes are cross region to the master.
slide-18
SLIDE 18

COMPUTING

slide-19
SLIDE 19

Django RabbitMQ

PostgreSQL Cassandra

Celery Django RabbitMQ

PostgreSQL Cassandra

Celery

memcache

DC1 DC2

memcache

slide-20
SLIDE 20

MEMCACHE

  • Millions of reads/writes per second
  • Sensitive to network condition
  • Cross region operation is prohibitive
slide-21
SLIDE 21

feed

get

Django

User R DC1

Django

PostgreSQL

memcache

User C

comment

set insert

slide-22
SLIDE 22

Django memcache PostgreSQL

User C comment insert set DC1

Django memcache PostgreSQL

User R feed get DC2 replication

slide-23
SLIDE 23

Django memcache PostgreSQL

User C comment insert set DC1

Django memcache PostgreSQL

User R feed set DC2 replication Cache invalidate Cache invalidate get

slide-24
SLIDE 24

COUNTERS

select count(*) from user_likes_media where media_id=12345; 100s ms

slide-25
SLIDE 25

COUNTERS

slide-26
SLIDE 26

COUNTER

select count from media_likes where media_id=12345;

10s us

slide-27
SLIDE 27

Cache invalidated All djangos try to access DB

slide-28
SLIDE 28

MEMCACHE LEASE

d1 d2 memcache db time

lease-get fill lease-get wait or use stale read from DB lease-set lease-get hit

slide-29
SLIDE 29

INSTAGRAM STACK - MULTI REGION

Django RabbitMQ PostgreSQL Cassandra Celery memcache Django RabbitMQ PostgreSQL Cassandra Celery memcache

DC1 DC2

slide-30
SLIDE 30

SCALING OUT

  • Capacity
  • Reliability
  • Regional failure ready

Requests/second

slide-31
SLIDE 31

LOAD TEST

Servers

20 40 60 80 100 2 4 6 8 10 12 14 16 18 20 22 24

CPU instructions Loaded Regular Load Balancer Django Servers

slide-32
SLIDE 32

20 40 60 80 100 2 4 6 8 10 12 14 16 18 20 22 24

User growth Server growth

slide-33
SLIDE 33

“Don’t count the servers, make the servers count”

slide-34
SLIDE 34

SCALE UP

slide-35
SLIDE 35

SCALE UP

Use as few CPU instructions as possible Use as few servers as possible

slide-36
SLIDE 36

SCALE UP

Use as few CPU instructions as possible Use as few servers as possible Scale up

slide-37
SLIDE 37

CPU

Monitor Optimize Analyze

slide-38
SLIDE 38

COLLECT

struct perf_event_attr pe; pe.type = PERF_TYPE_HARDWARE; pe.config = PERF_COUNT_HW_INSTRUCTIONS; fd = perf_event_open(&pe, 0, -1, -1, 0);

ioctl(fd, PERF_EVENT_IOC_ENABLE, 0); <code you want to measure> ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);

read(fd, &count, sizeof(long long));

slide-39
SLIDE 39

DYNOSTATS

20 40 60 80 100 2 4 6 8 10 12 14 16 18 20 22 24

Follow Feed Explore

slide-40
SLIDE 40

REGRESSION

20 40 60 80 100 2 4 6 8 10 12 14 16 18 20 22 24

slide-41
SLIDE 41

GRADUAL REGRESSION

20 40 60 80 100 2 4 6 8 10 12 14 16 18 20 22 24

slide-42
SLIDE 42

With new feature Without new feature

slide-43
SLIDE 43
slide-44
SLIDE 44

CPU

Monitor Optimize Analyze

slide-45
SLIDE 45

PYTHON CPROFILE

import cProfile, pstats, StringIO pr = cProfile.Profile()

pr.enable() # ... do something ... pr.disable()

s = StringIO.StringIO() sortby = 'cumulative' ps = pstats.Stats(pr, stream=s).sort_stats(sortby) ps.print_stats() print s.getvalue()

slide-46
SLIDE 46
slide-47
SLIDE 47

CPU - ANALYZE

continuous profiling

generate_profile explore --start <start-time> --duration <minutes>

slide-48
SLIDE 48

CPU - ANALYZE

continuous profiling

20 40 60 80 100 2 4 6 8 10 12 14 16 18 20 22 24

Caller Callee

slide-49
SLIDE 49

CPU - ANALYZE

decorator

def get_photos(): …… def feed(): get_photos() @log_stats def get_follows(): …… def follow(): get_follows() @log_stats

slide-50
SLIDE 50

get_follows get_photos feed follow log_stats

slide-51
SLIDE 51

get_follows get_photos feed follow

Keeping Demand in Check

slide-52
SLIDE 52
slide-53
SLIDE 53

CPU

Monitor Optimize Analyze

slide-54
SLIDE 54

igcdn-photos-d-a.akamaihd.net/hphotos-ak-xpl1/t51.2885-19/ s300x300/12345678_1234567890_987654321_a.jpg

slide-55
SLIDE 55

igcdn-photos-d-a.akamaihd.net/hphotos-ak-xpl1/t51.2885-19/ s150x150/12345678_1234567890_987654321_a.jpg igcdn-photos-d-a.akamaihd.net/hphotos-ak-xpl1/t51.2885-19/ s400x600/12345678_1234567890_987654321_a.jpg igcdn-photos-d-a.akamaihd.net/hphotos-ak-xpl1/t51.2885-19/ s200x200/12345678_1234567890_987654321_a.jpg igcdn-photos-d-a.akamaihd.net/hphotos-ak-xpl1/t51.2885-19/ s300x300/12345678_1234567890_987654321_a.jpg

slide-56
SLIDE 56

CPU - OPTIMIZE

slide-57
SLIDE 57

igcdn-photos-d-a.akamaihd.net/hphotos-ak-xpl1/t51.2885-19/ s300x300/12345678_1234567890_987654321_a.jpg 150x150 400x600 200x200

slide-58
SLIDE 58

CPU - OPTIMIZE

C is really faster

  • Candidate functions:
  • Used extensively
  • Stable
  • Cython or C/C++
slide-59
SLIDE 59

CPU - CHALLENGE

cProfile is not free False positive alerts Better automation

slide-60
SLIDE 60

Use as few CPU instructions as possible

Use as few servers as possible

Scale up

slide-61
SLIDE 61

SCALE UP: MEMORY

(memory budget /process) X (# of processes) < system memory Less memory budget/process ===> Dies sooner ===> More processes

slide-62
SLIDE 62

LOAD TEST

Servers

20 40 60 80 100 2 4 6 8 10 12 14 16 18 20 22 24

CPU instructions Loaded Regular Load Balancer Django Servers

slide-63
SLIDE 63

SCALE UP: MEMORY

Code Large configuration

slide-64
SLIDE 64

SCALE UP: MEMORY

  • Run in optimized mode (-O)
  • Use shared memory
  • NUMA
  • Remove dead code
slide-65
SLIDE 65

SCALE UP: LATENCY

Synchronous Processing model ===> All user experience impacted ===> Worker starvation Single service degradation ===> Fewer CPU instr executed Longer latency

slide-66
SLIDE 66

Stories Feed

Django

Feed Stories Suggested Users

ASYNC IO

slide-67
SLIDE 67

Use as few CPU instructions as possible

Use as few servers as possible

Scale up

slide-68
SLIDE 68

SCALE DEV TEAM

slide-69
SLIDE 69

SCALING TEAM

30% engineers joined in last 6 months Bootcampers - 1 week Hack-A-Month - 4 weeks Intern - 12 weeks

slide-70
SLIDE 70

Comment Filtering Self-harm Prevention Windows App Story Viewer Ranking Video View Notification Save Draft First Story Notification

slide-71
SLIDE 71

Which server? NewTable

  • r New Column?

What Index? Should I cache it? Will I lock up DB?

Will I bring down Instagram?

slide-72
SLIDE 72

WHAT WE WANT

  • Automatically handle cache
  • Define relations, not worry about implementations
  • Self service by product engineers
  • Infra focuses on scale this service
slide-73
SLIDE 73

TAO

USER1 USER2 USER3 media posted posted by likes liked by likes liked by

slide-74
SLIDE 74

SCALE DEV - END OF POSTGRES

slide-75
SLIDE 75

SHIPPING LOVE

60-80 daily diffs >120 engineers committed code last month

slide-76
SLIDE 76

RELEASE

  • Master, no branch
  • All features developed on master

gated by configuration

  • Continuous integration
  • No branch integration overhead
  • No surprises
  • Iterate fast, collaborate easily
  • Fast bisect and revert
slide-77
SLIDE 77

Once a week? 40-50 rollouts per day Once a day? Once a diff!!

slide-78
SLIDE 78

CHECKS AND BALANCES

Code review unittest Code accepted committed Canary To the Wild Dark launch Load test

slide-79
SLIDE 79
slide-80
SLIDE 80

TAKEAWAYS

Scaling is a continuous effort Scaling is multi-dimensional Scaling is everybody’s responsibility

slide-81
SLIDE 81

QUESTIONS?

slide-82
SLIDE 82