SCALING INSTAGRAM INFRA Lisa Guo Nov 7th, 2016 lguo@instagram.com - PowerPoint PPT Presentation

SCALING INSTAGRAM INFRA Lisa Guo— Nov 7th, 2016 lguo@instagram.com

INSTAGRAM HISTORY 2012/4/3 2010 Android 2014/1 release 2012/4/9 2011 Facebook 14M users acquisition

INSTAGRAM EVERYDAY 300 Million Users 4.2 Billion likes 95 Million photo/video uploads 100 Million followers

SCALING MEANS Scale up Scale out Scale dev team

SCALE OUT

SCALE OUT “To scale horizontally means to add more nodes to a system, such as adding a new computer to a distributed software application. An example might involve scaling out from one Web server system to three .” - Wikipedia

MICROSERVICE

SCALING OUT —> —> —> vertical partition horizontal sharding

SCALING OUT

INSTAGRAM STACK Cassandra PostgreSQL Other Django Services memcache RabbitMQ Celery

STORAGE VS. COMPUTING • Storage: needs to be consistent across data centers • Computing: driven by user tra ffi c, as needed basis

SCALE OUT: STORAGE user feeds, stories, activities, and other logs - Masterless - Async, low latency - Multiple data center ready - Tunable latency vs consistency trade-o ff

SCALE OUT: STORAGE user, media, friendship etc • One master, replicas are in each region • Reads are done locally • Writes are cross region to the master.

COMPUTING

DC1 DC2 memcache memcache Django Django PostgreSQL PostgreSQL RabbitMQ RabbitMQ Cassandra Cassandra Celery Celery

MEMCACHE • Millions of reads/writes per second • Sensitive to network condition • Cross region operation is prohibitive

DC1 User C User R comment feed Django Django insert set get PostgreSQL memcache

DC2 DC1 User C User R comment feed Django Django set insert get replication memcache PostgreSQL PostgreSQL memcache

DC2 DC1 User C User R comment feed Django Django set insert set get replication memcache PostgreSQL PostgreSQL memcache Cache Cache invalidate invalidate

COUNTERS select count(*) from user_likes_media where media_id=12345; 100s ms

COUNTERS

COUNTER select count from media_likes where media_id=12345; 10s us

Cache invalidated All djangos try to access DB

MEMCACHE LEASE time d1 d2 memcache db lease-get fill lease-get wait or use stale read from DB lease-set lease-get hit

INSTAGRAM STACK - MULTI REGION DC1 DC2 Django Django memcache PostgreSQL PostgreSQL memcache RabbitMQ Cassandra Cassandra RabbitMQ Celery Celery

SCALING OUT • Capacity • Reliability • Regional failure ready Requests/second

LOAD TEST Loaded Django 100 Servers Servers Load 80 Balancer 60 40 Regular 20 0 2 4 6 8 10 12 14 16 18 20 22 24 CPU instructions

100 80 60 40 20 0 2 4 6 8 10 12 14 16 18 20 22 24 User growth Server growth

“Don’t count the servers, make the servers count”

SCALE UP

SCALE UP Use as few CPU instructions as possible Use as few servers as possible

SCALE UP Scale up Use as few CPU instructions as possible Use as few servers as possible

CPU Monitor Analyze Optimize

COLLECT struct perf_event_attr pe; pe.type = PERF_TYPE_HARDWARE; pe.config = PERF_COUNT_HW_INSTRUCTIONS; fd = perf_event_open(&pe, 0, -1, -1, 0); ioctl(fd, PERF_EVENT_IOC_ENABLE, 0); <code you want to measure> ioctl(fd, PERF_EVENT_IOC_DISABLE, 0); read(fd, &count, sizeof(long long));

DYNOSTATS 100 Explore 80 60 Feed 40 Follow 20 0 2 4 6 8 10 12 14 16 18 20 22 24

REGRESSION 100 80 60 40 20 0 2 4 6 8 10 12 14 16 18 20 22 24

GRADUAL REGRESSION 100 80 60 40 20 0 0 2 4 6 8 10 12 14 16 18 20 22 24

With new feature Without new feature

PYTHON CPROFILE import cProfile, pstats, StringIO pr = cProfile.Profile() pr.enable() # ... do something ... pr.disable() s = StringIO.StringIO() sortby = 'cumulative' ps = pstats.Stats(pr, stream=s).sort_stats(sortby) ps.print_stats() print s.getvalue()

CPU - ANALYZE continuous profiling generate_profile explore --start <start-time> --duration <minutes>

CPU - ANALYZE 100 continuous profiling Caller 80 60 40 Callee 20 0 2 4 6 8 10 12 14 16 18 20 22 24

CPU - ANALYZE decorator @log_stats @log_stats def get_follows(): def get_photos(): …… …… def follow(): def feed(): get_follows() get_photos()

feed follow log_stats get_follows get_photos

Keeping Demand in Check feed follow get_photos get_follows

igcdn-photos-d-a.akamaihd.net/hphotos-ak-xpl1/t51.2885-19/ s300x300/12345678_1234567890_987654321_a.jpg

igcdn-photos-d-a.akamaihd.net/hphotos-ak-xpl1/t51.2885-19/ s300x300/12345678_1234567890_987654321_a.jpg igcdn-photos-d-a.akamaihd.net/hphotos-ak-xpl1/t51.2885-19/ s150x150/12345678_1234567890_987654321_a.jpg igcdn-photos-d-a.akamaihd.net/hphotos-ak-xpl1/t51.2885-19/ s400x600/12345678_1234567890_987654321_a.jpg igcdn-photos-d-a.akamaihd.net/hphotos-ak-xpl1/t51.2885-19/ s200x200/12345678_1234567890_987654321_a.jpg

CPU - OPTIMIZE

igcdn-photos-d-a.akamaihd.net/hphotos-ak-xpl1/t51.2885-19/ s300x300/12345678_1234567890_987654321_a.jpg 150x150 400x600 200x200

CPU - OPTIMIZE C is really faster • Candidate functions: • Used extensively • Cython or C/C++ • Stable

CPU - CHALLENGE cProfile is not free False positive alerts Better automation

Scale up Use as few CPU instructions as possible Use as few servers as possible

SCALE UP: MEMORY (memory budget /process) X (# of processes) < system memory Less memory budget/process ===> More processes ===> Dies sooner

LOAD TEST Loaded Django 100 Servers Servers Load 80 Balancer 60 40 Regular 20 0 2 4 6 8 10 12 14 16 18 20 22 24 CPU instructions

SCALE UP: MEMORY Code Large configuration

SCALE UP: MEMORY • Run in optimized mode (-O) • Use shared memory • NUMA • Remove dead code

SCALE UP: LATENCY Synchronous Processing model ===> Worker starvation Single service degradation ===> All user experience impacted Longer latency ===> Fewer CPU instr executed

ASYNC IO Stories Feed Django Stories Feed Suggested Users

Scale up Use as few CPU instructions as possible Use as few servers as possible

SCALE DEV TEAM

SCALING TEAM 30% engineers joined in last 6 months Bootcampers - 1 week Hack-A-Month - 4 weeks Intern - 12 weeks

Save Draft Story Viewer Ranking Comment Filtering First Story Notification Windows App Video View Notification Self-harm Prevention

Will Which server? I lock up DB? NewTable Will I bring down or New Column? Instagram? Should I cache it? What Index?

WHAT WE WANT • Automatically handle cache • Define relations, not worry about implementations • Self service by product engineers • Infra focuses on scale this service

liked by USER1 posted by likes media USER3 likes posted TAO USER2 liked by

SCALE DEV - END OF POSTGRES

SHIPPING LOVE >120 engineers committed code last month 60-80 daily di ff s

RELEASE • Master, no branch • All features developed on master gated by configuration • No branch integration overhead • Continuous integration • No surprises • Iterate fast, collaborate easily • Fast bisect and revert

Once a week? Once a di ff !! Once a day? 40-50 rollouts per day

CHECKS AND BALANCES Code review Code accepted Dark launch Canary To the Wild unittest committed Load test

TAKEAWAYS Scaling is a continuous e ff ort Scaling is multi-dimensional Scaling is everybody’s responsibility

QUESTIONS?

SCALING INSTAGRAM INFRA Lisa Guo Nov 7th, 2016 lguo@instagram.com - PowerPoint PPT Presentation

SCALING INSTAGRAM INFRA Lisa Guo Nov 7th, 2016 lguo@instagram.com INSTAGRAM HISTORY 2012/4/3 2010 Android 2014/1 release 2012/4/9 2011 Facebook 14M users acquisition INSTAGRAM EVERYDAY 300 Million Users 4.2 Billion likes 95 Million

SCALING INSTAGRAM INFRA Lisa Guo March 7th, 2017 lguo@instagram.com INSTAGRAM HISTORY 2010

Value Proposition By Kate Ray PLAN OF ACTION CREATE A BUSINESS INSTAGRAM ACCOUNT Using INSTAGRAM

INSTAGRAM #CambSMmeetup @lenkakopp Is INSTAGRAM right for your business? Additional Resources

SUMMARY Who is BLUE INFRA BLUE INFRA activities: introduction to BrestIX BREST IX

INSTAGRAM MASTERCLASS MIKE KRIEGER KEVIN SYSTROM @UNTAPPEDDIGITAL WEVE GONE FROM LOOKING

40 Billion photos have been shared on Instagram Insert All Instagram Slides Here Mention LDS

Instagram Basics Today well cover: Creating and Downloading an Instagram account

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Novatec Infra-Red Die Oven Summary This article shows the superiority of the Infra- Red

INFRA G RANT P ROPOSAL U NLOCKING H AMPTON R OADS Presented by: Mr. Robert A. Crum Jr.

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

A PARENTS GUIDE TO INSTAGRAM AUSTRALIAN EDITION 2019 In partnership with 1 A Parents

Worth its Weight in Likes: Towards Detecting Fake Likes on Instagram What is Instagram? A media

INSTAGRAM PRIMER & TIPS DAN FOLEY How To Reach People Without Ads Facebook Shares, Retweets

Instagram Picture & Video Sharing Site Owned by Facebook Insta Facts Facts about Instagram

Celery Fields Update Board of County Commissioners August 26, 2014 Todays Discussion

North of England A SCIENCE AND INNOVATION AUDIT LED BY THE UNIVERSITY OF YORK AND SPONSORED BY

Th The Fo Food Indust ndustry ry Post Post-Brexi Brexit Tim Rycroft @TimRycroft_FDF

Princess AVK | 47.25m/155 | Sunseeker | 2016 Main Salon Main Salon Aft Formal Dining

LEAF Marque and environmental responsibility at Gs Claire Donovan ~ Technical Director, Gs

Scoring Lexical Entailment with a Supervised Directional Similarity Network Marek Rei, Daniela

Alejandra Arias EED 480, Belgrad Student Project-Based Learning Outlines A. Title and Grade

Growing for a better tomorrow. CSE: ERTH Forward-looking statements This presentation is

SCALING INSTAGRAM INFRA Lisa Guo Nov 7th, 2016 lguo@instagram.com - PowerPoint PPT Presentation

SCALING INSTAGRAM INFRA Lisa Guo Nov 7th, 2016 lguo@instagram.com INSTAGRAM HISTORY 2012/4/3 2010 Android 2014/1 release 2012/4/9 2011 Facebook 14M users acquisition INSTAGRAM EVERYDAY 300 Million Users 4.2 Billion likes 95 Million

SCALING INSTAGRAM INFRA Lisa Guo March 7th, 2017 lguo@instagram.com INSTAGRAM HISTORY 2010

Value Proposition By Kate Ray PLAN OF ACTION CREATE A BUSINESS INSTAGRAM ACCOUNT Using INSTAGRAM

INSTAGRAM #CambSMmeetup @lenkakopp Is INSTAGRAM right for your business? Additional Resources

SUMMARY Who is BLUE INFRA BLUE INFRA activities: introduction to BrestIX BREST IX

INSTAGRAM MASTERCLASS MIKE KRIEGER KEVIN SYSTROM @UNTAPPEDDIGITAL WEVE GONE FROM LOOKING

40 Billion photos have been shared on Instagram Insert All Instagram Slides Here Mention LDS

Instagram Basics Today well cover: Creating and Downloading an Instagram account

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Novatec Infra-Red Die Oven Summary This article shows the superiority of the Infra- Red

INFRA G RANT P ROPOSAL U NLOCKING H AMPTON R OADS Presented by: Mr. Robert A. Crum Jr.

Analysis of Scaling Algorithms for Matrix &amp; Operator Scaling Contents Scaling Algorithms

A PARENTS GUIDE TO INSTAGRAM AUSTRALIAN EDITION 2019 In partnership with 1 A Parents

Worth its Weight in Likes: Towards Detecting Fake Likes on Instagram What is Instagram? A media

INSTAGRAM PRIMER &amp; TIPS DAN FOLEY How To Reach People Without Ads Facebook Shares, Retweets

Instagram Picture &amp; Video Sharing Site Owned by Facebook Insta Facts Facts about Instagram

Celery Fields Update Board of County Commissioners August 26, 2014 Todays Discussion

North of England A SCIENCE AND INNOVATION AUDIT LED BY THE UNIVERSITY OF YORK AND SPONSORED BY

Th The Fo Food Indust ndustry ry Post Post-Brexi Brexit Tim Rycroft @TimRycroft_FDF

Princess AVK | 47.25m/155 | Sunseeker | 2016 Main Salon Main Salon Aft Formal Dining

LEAF Marque and environmental responsibility at Gs Claire Donovan ~ Technical Director, Gs

Scoring Lexical Entailment with a Supervised Directional Similarity Network Marek Rei, Daniela

Alejandra Arias EED 480, Belgrad Student Project-Based Learning Outlines A. Title and Grade

Growing for a better tomorrow. CSE: ERTH Forward-looking statements This presentation is

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

INSTAGRAM PRIMER & TIPS DAN FOLEY How To Reach People Without Ads Facebook Shares, Retweets

Instagram Picture & Video Sharing Site Owned by Facebook Insta Facts Facts about Instagram