percona live europe 2016 launching vitess
play

Percona Live Europe 2016 Launching Vitess Anthony Yeh, Dan Rogart - PowerPoint PPT Presentation

Percona Live Europe 2016 Launching Vitess Anthony Yeh, Dan Rogart Amsterdam, Netherlands | October 3 5, 2016 Overview http://vitess.io Why Vitess? Their App YouTube Your App Their Vitess Vitess Sharding Magic Sharding Magic


  1. Percona Live Europe 2016 Launching Vitess Anthony Yeh, Dan Rogart Amsterdam, Netherlands | October 3 – 5, 2016

  2. Overview http://vitess.io

  3. Why Vitess? Their App YouTube Your App Their Vitess Vitess Sharding Magic Sharding Magic Sharding Magic MySQL MySQL MySQL 3

  4. Why not Vitess? Vitess is... Vitess is not... ● an opinionated cluster ● a proxy ○ Many ways to scale; this is one. ○ Understands the query. ○ More on those opinions next. ○ Generates queries of its own. ● a powerful tool ● plug-and-play ○ Huge problems get easier. ○ ... yet. ○ Simple things get more complex. ○ This talk is about the gaps. 4

  5. Launching Vitess http://vitess.io/user-guide/launching.html

  6. Scalability Philosophy

  7. Horizontal Scaling Small Instances Cluster Orchestration ● Many instances per host ● Containers isolate ports, files, ● Faster replication, backup/restore compute ● Less contention, outages isolated ● Scheduling for resilience ● Improves HW utilization Self-Healing, Automation ● Health checks ● Ops work should be O(1) 7

  8. Durability and Consistency Durability through replication Sharded consistency model ● Disk is not durable ● Single-shard transactions ○ sync_binlog off ○ Same guarantees as MySQL ● Data must be on multiple ● Cross-shard transactions machines ○ May fail partially across shards ○ Work in progress on 2PC ○ semisync ● Cross-shard reads ○ lossless failover ○ routine reparent ○ Even with 2PC, may read from shards in different states 8

  9. Globally Distributed Multi-Cell Deployment Cluster Metadata ("Topology") ● Cell = Zone | Availability Zone ● Distributed, consistent, highly ○ Possible shared fate within cell available key-value store ○ But failures shouldn't propagate ○ e.g. etcd, ZooKeeper ● Multi-Region ● Global Topology Store ○ Survive fiber cuts, regional outages ○ Quorum across multiple cells ○ Lower regional read latency ○ Survives any given cell death ● Single-Master ● Local Topology Store ○ Writes redirected at frontend ○ Quorum within a single cell ○ Only one inter-cell roundtrip ○ Independent of any other cell ○ DB writes intra-cell 9

  10. Production Planning

  11. Testing Integration Tests Query Compatibility ● Run app tests against Vitess ● Bind Variables ○ Use real schema ○ Client-side prepared statements ○ Test sharding ○ Vitess query plan cache ● py/vttest ● Tablet Types ○ Small footprint to run on 1 machine ○ master: writes, read-after-write ○ Emulate a full cluster for tests ○ replica: live site read traffic ○ Loads schema from .sql files ○ rdonly: batch jobs, backups ○ 1 vtcombo = all Vitess servers ● Query Support ○ 1 mysqld = all shards ○ Vitess SQL parser is incomplete ○ Report important use cases 11

  12. Replication Binary Logging Side Effects ● Enabled everywhere (slaves too) ● Triggers ● Statement-based ● Stored procedures ○ Rewrite to PK lookups ● Foreign key constraints ● GTID required ● These can break resharding ● Used for master management, resharding, update stream, schema swap, etc. 12

  13. Monitoring Status URLs (vtgate, vttablet, etc.) ● /debug/status ● /debug/vars ○ Prometheus, InfluxDB ● /healthz ● /queryz ● /schemaz Coming soon... ○ Realtime fleet-wide health map 13

  14. Backups Built-in Backups ● Part of cloning, schema swap ○ Restores every day ● Storage Plugins ○ Filesystem (NFS, etc.) ○ Google Cloud Storage ○ Amazon S3 ○ Ceph ● Needs to be triggered periodically 14

  15. Migration Strategies Tribute

  16. Migration New Workloads Online Migration ● Getting Started + Launch Guide ● Run Vitess above existing MySQL Offline Migration ● Previously Unsharded ● Import data to Vitess ● Already Sharded ○ Custom Vindex 16

  17. YouTube Production Dan Rogart, YouTube SRE

  18. Run Vitess the SRE Way! • Cattle, not pets • Systemic failure is more important than individual failure • Failure is constant • Automate responses to failure when appropriate • Or detect and alert a human if required • The atomic unit is a mysql instance - for durability, availability, replacement 18

  19. "If I have seen further than others, it is by standing upon the shoulders of giants" -- Isaac Newton • s/seen/scaled/ • Vitess runs on MySQL... • MySQL runs on Borg (Google's container cloud)... • Borg runs on Google datacenters and networks... • Each level is supported by amazing teams and we rely heavily upon their work 19

  20. Vitess runs on MySQL on Borg • YouTube/Vitess did not fully migrate into Borg until 2013 • So, it's actually a pretty good example of how a Vitess integration with an existing MySQL stack went (pretty well, so far) • MoB had a lot of mature tools that Vitess leveraged: • Backups • Failover • Schema Management 20

  21. Decider vtctld shard decider vttablet mysqld vtgate master vtgate vttablet vttablet vttablet vttablet mysqld mysqld mysqld mysqld replicas batch replicas 21

  22. Decider...(vastly simplified): • Polls all mysql instances every n seconds • If the old master is unhealthy it elects a new master from the replica pool • It re-masters all the other replicas to properly replicate from the new master • Is the reason TabletExternallyReparented exists in Vitess • Total failover times for YouTube Vitess are around 5 seconds 22

  23. Schema Management (small changes) • Autoschema • A "small" change is basically an ALTER against a table with < 2M rows • When executed on a replica it won't block the replication stream • Defined paths in source control are monitored • When a peer reviewed file containing sql is submitted... • ...autoschema will validate the change and apply it to all masters in a cluster 23

  24. Schema Management (big changes) • Pivot • A "big" change is basically an ALTER that will block traffic for too long on the master or block replication too long when executed on a slave • Defined paths in source control are monitored • When a peer reviewed file containing sql is submitted... • ...an SRE will start a pivot • The ALTER is applied to a single replica and a seed backup is taken • All other replicas are restarted such that they restore from the backup that contains the change • Finally, the master is done last and a replica with the change is promoted 24

  25. Schema Management • Autoschema changes take minutes • Pivots take days • At YouTube all schema changes must be forwards and backwards compatible with code. Enforced with extensive automated tests. • Sometimes dangerous: common example is removing a column using a pivot. This can break replication, so we have to block access. • Sometimes confusing for our developers: they shouldn't really care about how a change happens • Open source pivot is coming. 25

  26. Resharding Automation • Online copy of data performed n times • Final offline copy of data to sync to a gtid • Filtered replication • Traffic redirect • ??? • Profit! 26

  27. Resharding Automation (online copy) unsharded shard 0 shard 1 vttablet vttablet vttablet vtworker mysqld mysqld mysqld • Replication running • Read chunks from master master master source • Read chunks from target vttablet vttablet vttablet vttablet • Reconcile and write vttablet vttablet diff to target mysqld mysqld mysqld • Adaptive throttle mysqld mysqld mysqld replicas replicas replicas 27

  28. Resharding Automation (offline copy) unsharded shard 0 shard 1 vttablet vttablet vttablet vtworker mysqld mysqld mysqld • Replication stopped • Read chunks from master master master source • Read chunks from target vttablet vttablet vttablet vttablet • Reconcile and write vttablet vttablet diff to target mysqld mysqld mysqld • Adaptive throttle mysqld mysqld mysqld replicas replicas replicas 28

  29. Resharding Automation (filtered repl) unsharded shard 0 shard 1 vttablet vttablet vttablet mysqld mysqld mysqld • Target master tablets connect to a master master master source replica • Parse binlogs and apply statements vttablet vttablet vttablet vttablet that belong in that vttablet vttablet shard mysqld mysqld mysqld • gtid is stored and mysqld mysqld mysqld replicated on target replicas replicas replicas to survive restarts 29

  30. Resharding Automation (redirection) • Finally application traffic is redirected: - vtctl-prod MigrateServedTypes keyspace_name/0 replica - (^^^^ sends replica traffic from unsharded to sharded) - vtctl-prod MigrateServedTypes keyspace_name/0 master - (^^^^ master cutover, point of no return) • < 5s of downtime during master cutover (faster than a normal decider failover, since only the vitess layer is touched) 30

  31. Regression Testing • We use the Yahoo Cloud Serving Benchmark • Allows for comparison of Vitess to other storage solutions using the same workloads • A daily Vitess/YCSB sandbox is run to measure qps per core and latency • Deviations from previous results (postive or negative) are noted and investigated 31

  32. Rate My Session! 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend