A MySQL Perspective John Scott Mailchimp What is Mailchimps - - PowerPoint PPT Presentation

a mysql perspective
SMART_READER_LITE
LIVE PREVIEW

A MySQL Perspective John Scott Mailchimp What is Mailchimps - - PowerPoint PPT Presentation

Mailchimp Scale: A MySQL Perspective John Scott Mailchimp What is Mailchimps secret sauce? Hint: Its not much of a secret. 2 Focus on the small business Empowering the Underdog 3 We give marketers production-ready


slide-1
SLIDE 1

Mailchimp Scale: A MySQL Perspective

John Scott Mailchimp

slide-2
SLIDE 2

What is Mailchimp’s secret sauce?

Hint: It’s not much of a secret.

2

slide-3
SLIDE 3

3

Focus on the small business

“Empowering the Underdog”

slide-4
SLIDE 4

“We give marketers production-ready software designed to help them grow…”

Mailchimp Engineering Mission Statement https://mailchimp.com/culture/how-our-engineering-team-found-its-mission-statement/

slide-5
SLIDE 5

5

slide-6
SLIDE 6

6

Another way to say it

“WeSCALE through togetherness, momentum, and pragmatism.”

slide-7
SLIDE 7

Old Mentality: The 3 Disciplines of Data Administration

  • OPS / KTLO
  • Support
  • Performance
slide-8
SLIDE 8

Old Mentality: The 3 Disciplines of Data Administration

  • OPS / KTLO
  • Support
  • Performance

“I’m a DevOps DBA”

slide-9
SLIDE 9

Old Mentality: The 3 Disciplines of Data Administration

  • OPS / KTLO
  • Support
  • Performance
slide-10
SLIDE 10

Old Mentality: The 3 Disciplines of Data Administration

  • OPS / KTLO
  • Support
  • Performance

“I help other departments work with databases”

slide-11
SLIDE 11

Old Mentality: The 3 Disciplines of Data Administration

  • OPS / KTLO
  • Support
  • Performance
slide-12
SLIDE 12

Old Mentality: The 3 Disciplines of Data Administration

  • OPS / KTLO
  • Support
  • Performance

“over the fence”

slide-13
SLIDE 13

New Mentality:

“Ops is product”

slide-14
SLIDE 14

Ops is Product

“If you improve database performance resulting in 10% reduction in churn, you would create an additional <big revenue number>.”

slide-15
SLIDE 15

Ops is Product

“Developer Enablement”

New paradigm “looking at ops through the lens of product” --Tyler Treat

  • https://bravenewgeek.com/operations-in-the-world-of-developer-enablement/
  • https://www.youtube.com/watch?v=JUy3GYkPfto

OR in the case of Mailchimp, ops actually developing software, too.

slide-16
SLIDE 16

Developer Enablement Product Enablement

In most organizations “Product enablement” is sales term with the “four Ps”

  • Positioning
  • Pitch
  • Play
  • Program
slide-17
SLIDE 17

Developer Enablement Product Enablement

1000 350+

employees engineers salespeople

slide-18
SLIDE 18

Mailchimp “Board Room”

slide-19
SLIDE 19
slide-20
SLIDE 20

Sounds great. But what does that mean for a database engineer?

slide-21
SLIDE 21

#togetherness in action

MySQL log analysis based on pt-query-digest and Elasticsearch / Kibana resulted in a Top 20 table activity graph

slide-22
SLIDE 22

End of story?

“Toss it over the wall.” “Not my problem.” “I don’t have commit rights.”

slide-23
SLIDE 23

This is Mailchimp Engineering

“We succeed through togetherness, Momentum, and Pragmatism”

slide-24
SLIDE 24

We identified an N+1 pattern and fixed it, together.

slide-25
SLIDE 25

But wait....

slide-26
SLIDE 26

What was the impact to the user experience?

slide-27
SLIDE 27

265 247 2200

thousand unique query fingerprints Instances of mysql billion queries per week

slide-28
SLIDE 28

Old Mentality: Effective Slow Query Log Analysis Across The Infrastructure FTW!

“Query Macroeconomics”

https://johnscott.net/2018/08/03/query-macroeconomics/

  • Prioritize query fixes by how much DB capacity you get back

○ MySQL not stressed with contention equals what? ■ A pretty innodb status? ■ Nice looking graphs?

slide-29
SLIDE 29

Old Mentality: Effective Slow Query Log Analysis Across The Infrastructure FTW!

“Query Macroeconomics”

https://johnscott.net/2018/08/03/query-macroeconomics/

  • Prioritize query fixes by how much DB capacity you get back

○ MySQL not stressed with contention equals what? ■ A pretty innodb status? ■ Nice looking graphs?

slide-30
SLIDE 30

Old Mentality: Effective Slow Query Log Analysis Across The Infrastructure FTW!

“Query Macroeconomics”

https://johnscott.net/2018/08/03/query-macroeconomics/

  • Prioritize query fixes by how much DB capacity you get back

○ MySQL not stressed with contention equals what? ■ A pretty innodb status? ■ Nice looking graphs?

slide-31
SLIDE 31

Old Mentality: Effective Slow Query Log Analysis Across The Infrastructure FTW!

“Query Macroeconomics”

https://johnscott.net/2018/08/03/query-macroeconomics/

  • Prioritize query fixes by how much DB capacity you get back

○ MySQL not stressed with contention equals what? ■ A pretty innodb status? ■ Nice looking graphs?

slide-32
SLIDE 32

“Ops is Product”

Can a DBE team improve performance and capacity in a silo?

slide-33
SLIDE 33

“Ops is Product”

Can a DBE team improve performance and capacity in a silo?

slide-34
SLIDE 34

“Ops is Product”

Can a DBE team reduce churn by 10% in a silo?

slide-35
SLIDE 35

“Ops is Product”

Can a DBE team reduce churn by 10% in a silo?

slide-36
SLIDE 36

We identified an N+1 pattern and fixed it, together.

slide-37
SLIDE 37

We enriched the sessions with context about the user, how the session was accessed and

  • ther pertinent information. This context was

sent to the slow query logs and included in the session data.

slide-38
SLIDE 38

This new session analysis led to more improvements, more togetherness, and a better experience for our customers.

slide-39
SLIDE 39

How Mailchimp Avoids Silo #togetherness

  • All engineers have code repository access
  • Transparent, pragmatic standards
  • Empowering each other to suggest and make changes outside of core role
  • Everyone is on Slack
  • Multi-Disciplinary approach

○ We don’t make infrastructure decisions alone as DBEs ○ DBEs are not on-call alone ○ DBEs contribute code

slide-40
SLIDE 40

How Mailchimp Avoids Silo #togetherness

  • All engineers have code repository access
  • Transparent, pragmatic standards
  • Empowering each other to suggest and make changes outside of core role
  • Everyone is on Slack
  • Multi-Disciplinary approach

○ We don’t make infrastructure decisions alone as DBEs ○ DBEs are not on-call alone ○ DBEs contribute code

slide-41
SLIDE 41

How Mailchimp Avoids Silo #togetherness

  • All engineers have code repository access
  • Transparent, pragmatic standards
  • Empowering each other to suggest and make changes outside of core role
  • Everyone is on Slack
  • Multi-Disciplinary approach

○ We don’t make infrastructure decisions alone as DBEs ○ DBEs are not on-call alone ○ DBEs contribute code

slide-42
SLIDE 42

How Mailchimp Avoids Silo #togetherness

  • All engineers have code repository access
  • Transparent, pragmatic standards
  • Empowering each other to suggest and make changes outside of core role
  • Everyone is on Slack
  • Multi-Disciplinary approach

○ We don’t make infrastructure decisions alone as DBEs ○ DBEs are not on-call alone ○ DBEs contribute code

slide-43
SLIDE 43

How Mailchimp Avoids Silo #togetherness

  • All engineers have code repository access
  • Transparent, pragmatic standards
  • Empowering each other to suggest and make changes outside of core role
  • Everyone is on Slack
  • Multi-Disciplinary approach

○ We don’t make infrastructure decisions alone as DBEs ○ DBEs are not on-call alone ○ DBEs contribute code

slide-44
SLIDE 44

DBE code contributions (current)

  • Fixing bad queries
  • Code /process improvement
  • Data residence change
  • Participation in green field projects
  • Compliance
  • Wherever we find we are needed / useful
slide-45
SLIDE 45

DBE code contributions (current)

  • Fixing bad queries
  • Code /process improvement
  • Data residence change
  • Participation in green field projects
  • Compliance
  • Wherever we find we are needed / useful
slide-46
SLIDE 46

DBE code contributions (current)

  • Fixing bad queries
  • Code /process improvement
  • Data residence change
  • Participation in green field projects
  • Compliance
  • Wherever we find we are needed / useful
slide-47
SLIDE 47

DBE code contributions (current)

  • Fixing bad queries
  • Code /process improvement
  • Data residence change
  • Participation in green field projects
  • Compliance
  • Wherever we find we are needed / useful
slide-48
SLIDE 48

DBE code contributions (current)

  • Fixing bad queries
  • Code /process improvement
  • Data residence change
  • Participation in green field projects
  • Compliance
  • Wherever we find we are needed / useful
slide-49
SLIDE 49

DBE code contributions (current)

  • Fixing bad queries
  • Code /process improvement
  • Data residence change
  • Participation in green field projects
  • Compliance
  • Wherever we find we are needed / useful
slide-50
SLIDE 50

“The Boring Part”

A few technical details about Mailchimp and the simplistic way we run MySQL

slide-51
SLIDE 51

MySQL Instances at Mailchimp

slide-52
SLIDE 52

MySQL Instances at Mailchimp

slide-53
SLIDE 53

Infrastructure Evolution

Instances used to be

  • standalone. Each on its
  • wn server on spinny disk,

but not anymore.

slide-54
SLIDE 54

Infrastructure Evolution

Average density: 2200 (instances) / 725 (hosts) (3 instances per host and climbing)

slide-55
SLIDE 55

How we got to 2200 instances easily

Automated user moves: Add instances, adjust configs, users get rebalanced across new instances

slide-56
SLIDE 56

Infrastructure Evolution

  • Old way (instance per server)

○ ex: HP Gen 8, 32 core, 48GB RAM, 512G RAID 10 (spinner) ○ Instance split case: “bufferpool calculated by disk usage”

  • New(er) way: multi-instance servers

○ Ex: HP Gen 10, 56 core, 256GB RAM, 6T (NVME) ○ Up to 8 instances ○ Split case “divide bufferpool evenly”

  • Both single tenant and multi-tenant schemata (hundreds of thousands of schemata, millions of

innodb containers)

slide-57
SLIDE 57

“Standing on the shoulders of giants”

slide-58
SLIDE 58

Tooling (3rd party)

  • Infrastructure automation (puppet)
  • Decent monitoring, alerting and trending

○ Zabbix ○ OpsGenie ○ Prometheus ○ Grafana ○ ELK Administered in collaboration with other specialized teams Using open source templating in some cases (PMM dashboards)

slide-59
SLIDE 59

Tooling (home grown)

slide-60
SLIDE 60

DCM or “Data Center Manager”

  • Add/drop instances without logging into servers
  • Use the list function to return lists of servers within other scripts
  • Automatic configuration (interoperation with puppet)

○ Backups ○ Replication ○ Virtual IP

slide-61
SLIDE 61

Great Support

slide-62
SLIDE 62

Pragmatism

MySQL Orchestration Technology:

  • Past: MMM
  • Present: home grown
  • Future: Orchestrator?
slide-63
SLIDE 63

Pragmatism

MySQL Orchestration Technology:

  • Past: MMM
  • Present: home grown
  • Future: Orchestrator?
slide-64
SLIDE 64

Pragmatism

MySQL Orchestration Technology:

  • Past: MMM
  • Present: home grown
  • Future: Orchestrator?
slide-65
SLIDE 65

Pragmatism

MySQL Orchestration Technology:

  • Past: MMM
  • Present: home grown
  • Future: Orchestrator?
  • MHA
slide-66
SLIDE 66

Pragmatism: Why MHA?

  • Orchestrator requires its own infrastructure.

○ its own database pair ○ its own web server

  • We already have a kubernetes cluster.
  • MHA docker containers can be managed through DCM, github and

existing PR/merge process.

  • Easy to deploy, easy to monitor with existing infrastructure.
slide-67
SLIDE 67

Pragmatism: Why MHA?

  • Orchestrator requires its own infrastructure.

○ its own database pair ○ its own web server

  • We already have a kubernetes cluster.
  • MHA docker containers can be managed through DCM, github and

existing PR/merge process.

  • Easy to deploy, easy to monitor with existing infrastructure.
slide-68
SLIDE 68

Pragmatism: Why MHA?

  • Orchestrator requires its own infrastructure.

○ its own database pair ○ its own web server

  • We already have a kubernetes cluster.
  • MHA docker containers can be managed through DCM, github and

existing PR/merge process.

  • Easy to deploy, easy to monitor with existing infrastructure.
slide-69
SLIDE 69

Pragmatism: Why MHA?

  • Orchestrator requires its own infrastructure.

○ its own database pair ○ its own web server

  • We already have a kubernetes cluster.
  • MHA docker containers can be managed through DCM, github and

existing PR/merge process.

  • Easy to deploy, easy to monitor with existing infrastructure.
slide-70
SLIDE 70

Pragmatism: Why MHA?

  • Orchestrator requires its own infrastructure.

○ its own database pair ○ its own web server

  • We already have a kubernetes cluster.
  • MHA docker containers can be managed through DCM, github and

existing PR/merge process

  • Easy to deploy, easy to monitor with existing infrastructure
slide-71
SLIDE 71

Old virtual IP management

  • Puppet pushes instance configs to

centralized daemon.

  • The “mysql-vip” homegrown daemon

checks DB availability & replication lag.

  • The daemon SSHs to db servers to

move VIP. ○

  • n demand or

○ in the event of issue

  • Downstream replicas are not managed.
  • The read_only flag is not set on off-

master.

slide-72
SLIDE 72

MHA Deployment

  • The MHA repository in git contains:

○ Docker entrypoint ○ MHA itself ○ Supporting scripts to avoid split brain ○ Container definition per instance generated via script against configuration files

  • Changes peer reviewed in github
  • Downstream replicas managed
  • The read_only flag is set.

○ supports ProxySQL in the future

slide-73
SLIDE 73

What’s Next

  • ProxySQL
  • Cloud
  • Many other team-enabled optimizations

○ Data tenancy ○ Legacy replacement ○ New features

slide-74
SLIDE 74

What’s Next

  • ProxySQL
  • Cloud
  • Many other team-enabled optimizations

○ Data tenancy ○ Legacy replacement ○ New features

slide-75
SLIDE 75

What’s Next

  • ProxySQL
  • Cloud
  • Many other team-enabled optimizations

○ Data tenancy ○ Legacy replacement ○ New features

slide-76
SLIDE 76

What’s Next

  • ProxySQL
  • Cloud
  • Many other team-enabled optimizations

○ Data tenancy ○ Legacy replacement ○ New features

slide-77
SLIDE 77

DBE Empowerment & Product Enablement

Don’t be afraid to seek #togetherness in your own company. How can you make OPS=PRODUCT in your org? Pragmatism vs newest tech. Feel empowered to fix what is within your power to change. Inspire others. Each one teach one, each one reach one.

slide-78
SLIDE 78

DBE Empowerment & Product Enablement

Don’t be afraid to seek #togetherness in your own company. How can you make OPS=PRODUCT in your org? Pragmatism vs newest tech. Feel empowered to fix what is within your power to change. Inspire others. Each one teach one, each one reach one.

slide-79
SLIDE 79

DBE Empowerment & Product Enablement

Don’t be afraid to seek #togetherness in your own company. How can you make OPS=PRODUCT in your org? Pragmatism vs newest tech. Feel empowered to fix what is within your power to change. Inspire others. Each one teach one, each one reach one.

slide-80
SLIDE 80

DBE Empowerment & Product Enablement

Don’t be afraid to seek #togetherness in your own company. How can you make OPS=PRODUCT in your org? Pragmatism vs newest tech Feel empowered to fix what is within your power to change. Inspire others. Each one teach one, each one reach one.

slide-81
SLIDE 81

DBE Empowerment & Product Enablement

Don’t be afraid to seek #togetherness in your own company. How can you make OPS=PRODUCT in your org? Pragmatism vs newest tech. Feel empowered to fix what is within your power to change. Inspire others. Each one teach one, each one reach one.

slide-82
SLIDE 82

Thank you.