SCALING INSTAGRAM INFRA Lisa Guo March 7th, 2017 lguo@instagram.com - - PowerPoint PPT Presentation

scaling instagram infra
SMART_READER_LITE
LIVE PREVIEW

SCALING INSTAGRAM INFRA Lisa Guo March 7th, 2017 lguo@instagram.com - - PowerPoint PPT Presentation

SCALING INSTAGRAM INFRA Lisa Guo March 7th, 2017 lguo@instagram.com INSTAGRAM HISTORY 2010 2014/1 2012/4/9 2017 joined Facebook 600M users/month INSTAGRAM EVERYDAY 400 Million Users 4+ Billion likes 100 Million photo/video uploads


slide-1
SLIDE 1

SCALING INSTAGRAM INFRA

Lisa Guo— March 7th, 2017 lguo@instagram.com

slide-2
SLIDE 2
slide-3
SLIDE 3

2017

INSTAGRAM HISTORY

2010 2012/4/9 joined Facebook 2014/1 600M users/month

slide-4
SLIDE 4

INSTAGRAM EVERYDAY

400 Million Users 4+ Billion likes 100 Million photo/video uploads Top account: 110 Million followers

slide-5
SLIDE 5
slide-6
SLIDE 6

SCALING MEANS

Scale out

Scale up

Scale dev team

slide-7
SLIDE 7

SCALE OUT

slide-8
SLIDE 8

SCALE OUT

slide-9
SLIDE 9
slide-10
SLIDE 10

SCALE OUT

slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13

“Let’s all pray that Amazon gets everything sorted out in short order.”

slide-14
SLIDE 14

INSTAGRAM STACK

memcache RabbitMQ PostgreSQL Cassandra Celery Other Services

Django

slide-15
SLIDE 15

STORAGE VS. COMPUTING

  • Storage: needs to be consistent across data centers
  • Computing: driven by user traffic, as needed basis
slide-16
SLIDE 16

SCALE OUT: STORAGE

user, media, friendship etc

slide-17
SLIDE 17

SCALE OUT: STORAGE

user, media, friendship etc

Master Replica Replica Django Write Read

slide-18
SLIDE 18

SCALE OUT: STORAGE

user, media, friendship etc

Master Replica Replica Django Write Read DC1 DC2 DC3

slide-19
SLIDE 19

SCALE OUT: STORAGE

user feeds, activities etc

Replica Replica Replica Write - 2 Read - 1

slide-20
SLIDE 20

SCALE OUT: STORAGE

user feeds, activities etc

Replica Replica Replica Write - 2 Read - 1

slide-21
SLIDE 21

COMPUTING

slide-22
SLIDE 22

Django RabbitMQ

PostgreSQL Cassandra

Celery Django RabbitMQ

PostgreSQL Cassandra

Celery

memcache

DC1 DC2

memcache

slide-23
SLIDE 23

MEMCACHE

  • High performance key-value store in memory
  • Millions of reads/writes per second
  • Sensitive to network condition
  • Cross region operation is prohibitive

No global consistency

slide-24
SLIDE 24

feed

get

Django

User R DC1

Django

PostgreSQL

memcache

User C

comment

set insert

slide-25
SLIDE 25

Django memcache PostgreSQL

User C comment insert set DC1

Django memcache PostgreSQL

User R feed get DC2 replication

slide-26
SLIDE 26

Django memcache PostgreSQL

User C comment insert set DC1

Django memcache PostgreSQL

User R feed DC2 replication Cache invalidate Cache invalidate get

slide-27
SLIDE 27

COUNTERS

select count(*) from user_likes_media where media_id=12345; 100s ms

slide-28
SLIDE 28

COUNTER

select count from media_likes where media_id=12345;

10s us

slide-29
SLIDE 29

Cache invalidated All djangos try to access DB

slide-30
SLIDE 30

MEMCACHE LEASE

d1 d2 memcache db time

lease-get fill lease-get wait or use stale read from DB lease-set lease-get hit

slide-31
SLIDE 31

INSTAGRAM STACK - MULTI REGION

Django RabbitMQ PostgreSQL Cassandra Celery memcache Django RabbitMQ PostgreSQL Cassandra Celery memcache

DC1 DC2

slide-32
SLIDE 32

SCALING OUT

  • Capacity
  • Reliability
  • Regional failure ready
slide-33
SLIDE 33

SCALING OUT - CHALLENGES, OPPORTUNITIES

  • Beyond North America
  • More localized social network
  • Direct messaging
  • Live streaming
slide-34
SLIDE 34

20 40 60 80 100 2 4 6 8 10 12 14 16 18 20 22 24

User growth Server growth

slide-35
SLIDE 35

“Don’t count the servers, make the servers count”

slide-36
SLIDE 36

SCALE UP

slide-37
SLIDE 37

SCALE UP

Use as few CPU instructions as possible Use as few servers as possible

slide-38
SLIDE 38

SCALE UP

Use as few CPU instructions as possible

Use as few servers as possible

slide-39
SLIDE 39

CPU

Monitor Optimize Analyze

slide-40
SLIDE 40

COLLECT

struct perf_event_attr pe; pe.type = PERF_TYPE_HARDWARE; pe.config = PERF_COUNT_HW_INSTRUCTIONS; fd = perf_event_open(&pe, 0, -1, -1, 0);

ioctl(fd, PERF_EVENT_IOC_ENABLE, 0); <code you want to measure> ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);

read(fd, &count, sizeof(long long));

slide-41
SLIDE 41

DYNOSTATS

20 40 60 80 100 2 4 6 8 10 12 14 16 18 20 22 24

Follow Feed Explore

slide-42
SLIDE 42

REGRESSION

20 40 60 80 100 2 4 6 8 10 12 14 16 18 20 22 24

slide-43
SLIDE 43

With new feature Without new feature

slide-44
SLIDE 44
slide-45
SLIDE 45

CPU

Monitor Optimize Analyze

slide-46
SLIDE 46

PYTHON CPROFILE

import cProfile, pstats, StringIO pr = cProfile.Profile()

pr.enable() # ... do something ... pr.disable()

s = StringIO.StringIO() sortby = 'cumulative' ps = pstats.Stats(pr, stream=s).sort_stats(sortby) ps.print_stats() print s.getvalue()

slide-47
SLIDE 47
slide-48
SLIDE 48

CPU - ANALYZE

continuous profiling

generate_profile explore --start <start-time> --duration <minutes>

slide-49
SLIDE 49

CPU - ANALYZE

continuous profiling

20 40 60 80 100 2 4 6 8 10 12 14 16 18 20 22 24

Caller Callee Callee

slide-50
SLIDE 50
slide-51
SLIDE 51

CPU

Monitor Optimize Analyze

slide-52
SLIDE 52

igcdn-photos-d-a.akamaihd.net/hphotos-ak-xpl1/t51.2885-19/ s300x300/12345678_1234567890_987654321_a.jpg

slide-53
SLIDE 53

igcdn-photos-d-a.akamaihd.net/hphotos-ak-xpl1/t51.2885-19/ s150x150/12345678_1234567890_987654321_a.jpg igcdn-photos-d-a.akamaihd.net/hphotos-ak-xpl1/t51.2885-19/ s400x600/12345678_1234567890_987654321_a.jpg igcdn-photos-d-a.akamaihd.net/hphotos-ak-xpl1/t51.2885-19/ s200x200/12345678_1234567890_987654321_a.jpg igcdn-photos-d-a.akamaihd.net/hphotos-ak-xpl1/t51.2885-19/ s300x300/12345678_1234567890_987654321_a.jpg

slide-54
SLIDE 54

CPU - OPTIMIZE

slide-55
SLIDE 55

igcdn-photos-d-a.akamaihd.net/hphotos-ak-xpl1/t51.2885-19/ s300x300/12345678_1234567890_987654321_a.jpg 150x150 400x600 200x200

slide-56
SLIDE 56

CPU - OPTIMIZE

C is really faster

  • Candidate functions:
  • Used extensively
  • Stable
  • Cython or C/C++
slide-57
SLIDE 57

Use as few CPU instructions as possible

Use as few servers as possible

SCALE UP

slide-58
SLIDE 58

ONE WEB SERVER

Process 1 Shared Memory Private Memory Process N

slide-59
SLIDE 59

SCALE UP: MEMORY

  • Run in optimized mode (-O)
  • Remove dead code

Reduce code

slide-60
SLIDE 60

SCALE UP: MEMORY

  • Move configuration into shared memory
  • Disable garbage collection

Share more

slide-61
SLIDE 61

SCALE UP: MEMORY 20+% capacity increase

slide-62
SLIDE 62

SCALE UP: NETWORK LATENCY

Synchronous processing model with long latency ===> Worker starvation and fewer CPU instr executed

slide-63
SLIDE 63

Stories Feed

Django

Feed Stories Suggested Users

ASYNC IO

slide-64
SLIDE 64

Use as few CPU instructions as possible

Use as few servers as possible

Scale up

slide-65
SLIDE 65

SCALE UP: CHALLENGES, OPPORTUNITIES

  • Faster python run-time
  • Async web framework
  • Better memory analysis
  • etc etc
slide-66
SLIDE 66

SCALE DEV TEAM

slide-67
SLIDE 67

SCALING TEAM

30% engineers joined in last 6 months Bootcampers - 1 week Hack-A-Month - 4 weeks Intern - 12 weeks

slide-68
SLIDE 68

Comment Filtering Self-harm Prevention Windows App Multiple media in

  • ne post

Video View Notification Saved Posts First Story Notification Instagram Live Instagram Stories

slide-69
SLIDE 69

Which server? NewTable

  • r New Column?

What Index? Should I cache it? Will I lock up DB?

Will I bring down Instagram?

slide-70
SLIDE 70

WHAT WE WANT

  • Automatically handle cache
  • Define relations, not worry about implementations
  • Self service by product engineers
  • Infra focuses on scale
slide-71
SLIDE 71

TAO

USER1 USER2 USER3 media posted posted by likes liked by likes liked by

slide-72
SLIDE 72

Comment Filtering Self-harm Prevention Windows App Multiple media in

  • ne post

Video View Notification Saved Posts First Story Notification Instagram Live Instagram Stories

slide-73
SLIDE 73

SOURCE CONTROL

Master Live Direct

slide-74
SLIDE 74

SOURCE CONTROL

  • Context switching
  • Code sync/merge overhead
  • Surprises
  • Refactor/major upgrade
  • Performance tracking harder

With branches

slide-75
SLIDE 75

SOURCE CONTROL

Master Live Direct

slide-76
SLIDE 76

SOURCE CONTROL

Master Live Direct

slide-77
SLIDE 77

SOURCE CONTROL

  • Continous integration
  • Collaborate easily
  • Fast bisect and revert
  • Continuous performance monitoring

No branches

slide-78
SLIDE 78

FEATURE LAUNCH

Engineers Employees Dogfooder Some demographics World

slide-79
SLIDE 79

FEATURE LOAD TEST

slide-80
SLIDE 80

Once a 40-60 rollouts per day day diff week? !!

slide-81
SLIDE 81

CHECKS AND BALANCES

Code review unittest Code accepted committed Canary To the Wild

slide-82
SLIDE 82
slide-83
SLIDE 83

SCALING MEANS

Scale out

Scale up

Scale dev team

slide-84
SLIDE 84
slide-85
SLIDE 85

TAKEAWAYS

Scaling is everybody’s responsibility Scaling is continuous effort Scaling is multi-dimensional

slide-86
SLIDE 86

QUESTIONS?

slide-87
SLIDE 87