Shopifys Architecture to Handle 80K RPS Celebrity Sales Simon - - PowerPoint PPT Presentation

shopify s architecture to handle 80k rps celebrity sales
SMART_READER_LITE
LIVE PREVIEW

Shopifys Architecture to Handle 80K RPS Celebrity Sales Simon - - PowerPoint PPT Presentation

Shopifys Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen @Sirupsen Production Engineering Lead, Shopify Shopify is handling some of the largest sales in the world from Kylie Jenner, Kanye, Superbowl, and others We


slide-1
SLIDE 1

Shopify’s Architecture to Handle 80K RPS Celebrity Sales

Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify

slide-2
SLIDE 2

Shopify is handling some of the largest sales in the world from Kylie Jenner, Kanye, Superbowl, and others

slide-3
SLIDE 3

— Tobi Lütke, CEO in internal essay on why we optimize for flash sales

“We learned to absorb these shocks and become stronger as a result. [..] The school of hard knocks has taught us well.”

slide-4
SLIDE 4

500K $5.8B

Merchants powered Processed Q2, 2017

80K 40+

Peak RPS Daily deploys

Rails 2000+

Ruby on Rails since 2006 Employees

slide-5
SLIDE 5

Trafgic Application Data Application Data Region A Region B

slide-6
SLIDE 6

Trafgic Application Data Application Data Region A Region B

slide-7
SLIDE 7
  • Global Routing
  • Openresty
  • Bots
  • Cache hits
  • Checkout Throttling

Trafgic

slide-8
SLIDE 8

ISP ISP ISP ISP ISP ISP ISP ISP ISP ISP Region A

BGP ANNOUNCE 23.227.38.0/24 BGP ANNOUNCE 23.227.38.0/24

Region B

walrusser.myshopify.com 23.227.38.64

slide-9
SLIDE 9

OpenResty allows Lua scripting of your load balancers, it’s been

  • ne of the most

impactful additions to our stack in recent memory

https://github.com/openresty/openresty

Nginx with OpenResty Rule Banner Kafka Logging Edgecache Checkout Throttle

slide-10
SLIDE 10

worker_processes 1; error_log logs/error.log; events { worker_connections 1024; } http { server { listen 8080; location / { default_type text/html; content_by_lua ' ngx.say("<p>hello, world</p>") '; } } }

slide-11
SLIDE 11

Bot squasher analyzes the Kafka stream of incoming requests to ban bots with a rule banner module

Nginx with OpenResty Rule Banner Kafka Bot Squasher Kafka Logger

POST /checkout BAN 23.227.38.178

slide-12
SLIDE 12

Nginx with OpenResty Edgecache

Memcached

GET /collections/walruses HIT

Edgecache can serve full page cache hits out of the load-balancers in microseconds

Web Process

MISS FILL

slide-13
SLIDE 13

Nginx with OpenResty Checkout Throttle

GET /checkout

Queue

/wait_area /checkout

Throttle

Checkout Throttle throttles the number of customers in the processing heavy checkout path

slide-14
SLIDE 14

Trafgic Application Data Application Data Region A Region B

slide-15
SLIDE 15

Pod is an isolated unit of one or more shops

slide-16
SLIDE 16

s h

  • p

1 s h

  • p

4 s h

  • p

9 s h

  • p

1 7 s h

  • p

7 2

Data in Region A

s h

  • p

3 s h

  • p

7 2 s h

  • p

9 2 s h

  • p

1 8 s h

  • p

6 4 s h

  • p

2 2 s h

  • p

8 8 s h

  • p

s h

  • 5

2 s h

  • p

2 3

Pod 14 Pod 2 Pod 7

slide-17
SLIDE 17

Pod 14 Each Pod in Region A Pod 2 Pod 7

MySQL Redis Memcache MySQL Redis Memcache MySQL Redis Memcache Cron Cron Cron

slide-18
SLIDE 18

Pod 14 Pod 2 Pod 7

MySQL Redis Memcache MySQL Redis Memcache MySQL Redis Memcache Cron Cron Cron

Shared Workers

slide-19
SLIDE 19

Pod 14 Pod 2 Pod 7

MySQL Redis Memcache MySQL Redis Memcache MySQL Redis Memcache Cron Cron Cron

Shared Load Balancing

slide-20
SLIDE 20

Genghis is our load-testing tool to test scale

slide-21
SLIDE 21

Pod Balancer balances shops between pods with minimal downtime to keep load and size even

slide-22
SLIDE 22

s h

  • p

1 s h

  • p

4 s h

  • p

9 s h

  • p

1 7 s h

  • p

7 2

Pod Balancer

s h

  • p

3 s h

  • p

7 2 s h

  • p

9 2 s h

  • p

1 8 s h

  • p

6 4 s h

  • p

2 2 s h

  • p

8 8 s h

  • p

s h

  • p

5 2 s h

  • p

2 3

Pod 14 Pod 2 Pod 7

slide-23
SLIDE 23

s h

  • p

1 s h

  • p

4 s h

  • p

9 s h

  • p

1 7 s h

  • p

7 2

Pod Balancer

s h

  • p

3 s h

  • p

7 2 s h

  • p

9 2 s h

  • p

1 8 s h

  • p

6 4 s h

  • p

2 2 s h

  • p

8 8 s h

  • p

s h

  • p

5 2 s h

  • p

2 3

Pod 14 Pod 2 Pod 7

slide-24
SLIDE 24

s h

  • p

1 s h

  • p

4 s h

  • p

9 s h

  • p

1 7 s h

  • p

7 2

Pod Balancer

s h

  • p

3 s h

  • p

7 2 s h

  • p

9 2 s h

  • p

1 8 s h

  • p

6 4 s h

  • p

2 2 s h

  • p

8 8 s h

  • p

s h

  • p

5 2 s h

  • p

2 3

Pod 14 Pod 2 Pod 7

s h

  • p

9 8

slide-25
SLIDE 25

s h

  • p

1 s h

  • p

4 s h

  • p

9 s h

  • p

1 7 s h

  • p

7 2

Pod Balancer

s h

  • p

3 s h

  • p

7 2 s h

  • p

9 2 s h

  • p

1 8 s h

  • p

6 4 s h

  • p

2 2 s h

  • p

8 8 s h

  • p

s h

  • p

5 2 s h

  • p

2 3

Pod 14 Pod 2 Pod 7

s h

  • p

9 8 s h

  • p

9 9 s h

  • p

1

slide-26
SLIDE 26

s h

  • p

1 s h

  • p

4 s h

  • p

9 s h

  • p

1 7 s h

  • p

7 2

Pod Balancer

s h

  • p

3 s h

  • p

7 2 s h

  • p

9 2 s h

  • p

1 8 s h

  • p

6 4 s h

  • p

2 2 s h

  • p

8 8 s h

  • p

s h

  • p

5 2 s h

  • p

2 3

Pod 14 Pod 2 Pod 7

s h

  • p

9 8 s h

  • p

9 9 s h

  • p

1

Pod 74

slide-27
SLIDE 27

s h

  • p

1 s h

  • p

4 s h

  • p

9 s h

  • p

1 7 s h

  • p

7 2

Pod Balancer

s h

  • p

3 s h

  • p

7 2 s h

  • p

9 2 s h

  • p

1 8 s h

  • p

6 4 s h

  • p

2 2 s h

  • p

8 8 s h

  • p

s h

  • p

5 2 s h

  • p

2 3

Pod 14 Pod 2 Pod 7

s h

  • p

9 8 s h

  • p

9 9 s h

  • p

1

Pod 74

slide-28
SLIDE 28

MySQL Redis MySQL Redis

COPY SHOP SELECT * FROM products WHERE shop_id = 38493 SELECT * from orders WHERE shop_id = 38493

Source Pod 9 Target Pod 23

slide-29
SLIDE 29

MySQL Redis MySQL Redis

COPY SHOP SELECT * FROM products WHERE shop_id = 38493 SELECT * from orders WHERE shop_id = 38493 NEW CHECKOUT INSERT INTO CHECKOUTS …

Source Pod 9 Target Pod 23

slide-30
SLIDE 30

MySQL Redis

Source Pod 9

MySQL Redis

Target Pod 23

COPY SHOP_ID 238 SELECT * FROM products WHERE shop_id = 238 SELECT * from orders WHERE shop_id = 238

Bin Log

REPLICATE SHOP_ID 238 CHECKOUT id: 383293

slide-31
SLIDE 31

MySQL Redis

Source Pod 9

MySQL Redis

Target Pod 23

LOCK SHOP_ID 238

Routing

UPDATE SHOP_ID 238 pod_id=23

slide-32
SLIDE 32

Trafgic Application Data Application Data Region A Region B

slide-33
SLIDE 33

Sorting Hat routes requests for a shop to the region the pod is active in

slide-34
SLIDE 34

Trafgic Region A Region B

Active Pod 7 Inactive Pod 2 Active Pod 14 Pod 14 Inactive Inactive Active Pod 2 Pod 7 Pod 14

Sorting Hat

GET /products Host: sneakershop.com

Routing

ROUTE sneakershop.com shop238 pod2:B

slide-35
SLIDE 35

Trafgic Application Data Application Data Region A Region B

slide-36
SLIDE 36

Pod Mover moves pods between regions with minimal downtime

slide-37
SLIDE 37

Trafgic Region A Region B

Active Pod 7 Pod 2 Active Pod 14 Pod 14 Inactive Inactive Active Pod 2 Pod 7 Pod 14

Sorting Hat

Inactive Pod 2

slide-38
SLIDE 38

Trafgic Region A Region B

Active Pod 7 Pod 2 Active Pod 14 Pod 14 Inactive Inactive Active Pod 2 Pod 7 Pod 14

Sorting Hat

Inactive Pod 2

slide-39
SLIDE 39

Update Routing for pod to target region pod2:b -> pod2:a Sorting Hat routes requests to target region Disable cron in both regions Fail over MySQL to target region Enable cron in both regions Transfer jobs to target region

slide-40
SLIDE 40

What about errors while the database fails over?

slide-41
SLIDE 41

Nginx with OpenResty Pauser

POST /checkout (during failover)

Pauser will pause requests in the middle of failovers to avoid serving errors

Queue Throttle

HTTP 200 (seconds later)

slide-42
SLIDE 42

Update Routing for pod to target region pod2:b -> pod2:a Sorting Hat routes requests to target region and pause requests Disable cron in both regions Fail over MySQL to target region Enable cron in both regions Resume requests Transfer jobs to target region

slide-43
SLIDE 43
slide-44
SLIDE 44

Cloud Migration with the Pods Architecture

slide-45
SLIDE 45

s h

  • p

1 s h

  • p

4 s h

  • p

9 s h

  • p

1 7 s h

  • p

7 2

Region A

s h

  • p

3 s h

  • p

7 2 s h

  • p

9 2 s h

  • p

1 8 s h

  • p

6 4 s h

  • p

2 2 s h

  • p

8 8 s h

  • p

s h

  • 5

2 s h

  • p

2 3

Cloud Region C

slide-46
SLIDE 46
slide-47
SLIDE 47

Thanks!

@Sirupsen

slide-48
SLIDE 48
slide-49
SLIDE 49
slide-50
SLIDE 50