Globalizing Player Accounts with MySQL at Riot Games Tyler Turk - - PowerPoint PPT Presentation

globalizing player accounts with mysql at riot games
SMART_READER_LITE
LIVE PREVIEW

Globalizing Player Accounts with MySQL at Riot Games Tyler Turk - - PowerPoint PPT Presentation

Globalizing Player Accounts with MySQL at Riot Games Tyler Turk Riot Games, Inc. About Me Senior Infrastructure Engineer Player Platform at Riot Games Father of three Similar talk at re:Invent last year 2 Accounts Team Responsible for


slide-1
SLIDE 1

Globalizing Player Accounts with MySQL at Riot Games

Tyler Turk Riot Games, Inc.

slide-2
SLIDE 2

About Me

Senior Infrastructure Engineer Player Platform at Riot Games Father of three Similar talk at re:Invent last year

2

slide-3
SLIDE 3

Accounts Team

Responsible for account data Provides account management Ensures players can login Aims to mitigate account compromises

3

slide-4
SLIDE 4

Overview

The old and the new

slide-5
SLIDE 5

5

League’s growth and shard deployment

Launched in 2009 Experienced rapid growth Deployed multiple game shards Each shard used their own MySQL DBs

slide-6
SLIDE 6

6

Some context

Hundreds of millions of players worldwide Localized primary / secondary replication Data federated with each shard Account transfers were difficult

slide-7
SLIDE 7

7

Why MySQL?

Widely used & adopted at Riot Used extensively by Tencent Ensures ACID compliance

slide-8
SLIDE 8

8

Catalysts for globalization

General Data Protection Regulation Decoupling from game platform Single source of truth for accounts

slide-9
SLIDE 9

Globalization of Player Accounts

Migrating from 10 isolated databases to a single globally replicated database

slide-10
SLIDE 10

10

Data deployment considerations

Globally replicated, multi-master Globally replicated, single master Federated or sharded data To cache or not to cache

slide-11
SLIDE 11

11

Global database expectations

Highly available Geographically distributed < 1 sec latency replication < 20ms read latency Enables a better player experience

slide-12
SLIDE 12

12

Continuent Tungsten

Third-party vendor Provides cluster orchestration Manages data replication MySQL connector proxy

slide-13
SLIDE 13

13

Why Continuent Tungsten?

Prior issues with Aurora RDS was not multi-region Preferred asynchronous replication Automated cluster management

slide-14
SLIDE 14

14

Explanation & tolerating failure

slide-15
SLIDE 15

15

Deployment

Terraform & Ansible (docker initially) 4 AWS regions r4.8xlarge (10Gbps network) 5TB GP2 EBS for data 15TB for logs / backups

slide-16
SLIDE 16

16

Migrating the data

Multi-step migration of data Consolidated data into 1 DB Multiple rows for a single account

slide-17
SLIDE 17

17

Load testing

slide-18
SLIDE 18

18

Chaos testing

slide-19
SLIDE 19

19

Monitoring

slide-20
SLIDE 20

20

Performing backups

Leverage standalone replicator Backup with xtrabackup Compress and upload to S3 Optional delay on replicator

slide-21
SLIDE 21

21

Performing maintenance

Cluster policies Offline and shun nodes Perform cluster switch

slide-22
SLIDE 22

Performing schema changes

Schema MUST be backwards compatible Order of operations for schema change:

  • 1. Replicas in non-primary region
  • 2. Cluster switch on relay
  • 3. Perform change on former relay
  • 4. Repeat steps 1-3 on all non-primary

regions

  • 5. Replicas in primary region
  • 6. Cluster switch on write primary
  • 7. Perform change on former write

22

The Process

  • Offline node
  • Wait for connections to drain
  • Stop replicator
  • Perform schema change
  • Start replicator
  • Wait for replication
  • Online node
slide-23
SLIDE 23

23

De-dockering

Fully automated the process One server at a time Performed live Near zero downtime

slide-24
SLIDE 24

24

Current state

Database deployed on host No docker for database / sidecars Accounts are distilled to a single row Servicing all game shards

slide-25
SLIDE 25

Lessons Learned

Avoiding the same mistakes we made

slide-26
SLIDE 26

26

Databases in docker

Partially immutable infrastructure Configuration divergence possible Upgrades required container restarts Pain in automating deploys

slide-27
SLIDE 27

27

Large data imports

Consider removing indexes Perform daily delta syncs Migrate in chunks if possible

slide-28
SLIDE 28

28

Think about data needs

Synchronous vs asynchronous Read heavy vs write heavy

slide-29
SLIDE 29

29

Impacts of replication latency

Replication can take >1 second Impacts strongly consistent expectations Immediate read-backs can fail Think about “eventual” consistency

slide-30
SLIDE 30

30

WAN replication is fragile

Not completely infallible Think through your needs Architect and design accordingly Even with RiotDirect, it’s not perfect

slide-31
SLIDE 31

31

Backup with caution (aka backups v1)

slide-32
SLIDE 32

32

Demo Time!

slide-33
SLIDE 33

Thank You!

Tyler Turk tturk@riotgames.com

slide-34
SLIDE 34

34

Rate My Session