Replicating MySQL Data to TiDB For Near Real-Time Analytics Who - PowerPoint PPT Presentation

Replicating MySQL Data to TiDB For Near Real-Time Analytics

Who Are We Jervin Real, Architect Francisco Bordenave, Architect 2 / 27

TiDB at a Glance 3 / 27

TiDB at a Glance TiDB cluster server process SQL and interacts with clients MySQL Compatibility 4 / 27

TiDB at a Glance TiKV cluster distributed storage, RocksDB 5 / 27

TiDB at a Glance PD cluster stores metadata, load balancing and monitor storage 6 / 27

MySQL Compatibility MySQL clients and drivers DML, DDL compatibility - some limitations Easy to start querying DB 7 / 27

Replication Goals Consistency Resume from previous checkpoint Verifying consistency Performance Can we avoid replication lag? Observability Is the process broken? Are we querying stale data? 8 / 27

Toolbox Ready To Use Tools

Toolbox Data Migration (full suite) dm-master dm-worker mydumper loader syncer tidb-lightning DIY 10 / 27

mydumper https:/ /github.com/maxbube/mydumper fork Additional customizations Uncompressed dump only for loader mydumper ‑B percona ‑o . ‑C ‑t 3 ‑h 10.3.4.4 \ ‑u tidb ‑p tidb ‑v 3 11 / 27

loader (1) Loads mydumper data in parallel, ~1MB chunk per load, resumable Uncompressed dump only myloader is possible too Be aware of trx size limits (i.e. use ‑q 1 )* Not resumable Database/table �lters/routing. loader ‑d ./ ‑h 127.0.0.1 ‑P 4000 ‑u root ‑t 3 \ ‑rm‑checkpoint ‑L info myloader ‑d ./ ‑B percona ‑h 127.0.0.1 ‑u root \ ‑P 4000 ‑t 3 ‑q 1 12 / 27 *https:/ /github.com/tikv/tikv/issues/2986

loader (2) Database/table �lters/routing. { "log‑level": "info", "log‑file": "", "status‑addr": ":8272", "pool‑size": 3, "dir": "./", "db": { "host": "127.0.0.1", "user": "root", "port": 4000, "sql‑mode": "@DownstreamDefault" }, "checkpoint‑schema": "tidb_loader", "config‑file": "", "route‑rules": null, "do‑table": null, "do‑db": null, "ignore‑table": null, "ignore‑db": null, "rm‑checkpoint": false } 13 / 27

loader Status Available from tidb_loader.checkpoint by default mysql> select * from tidb_loader.checkpoint \G *************************** 1. row *************************** id: 13917f filename: percona.history.sql cp_schema: percona cp_table: history offset: 55998837 end_pos: 2685020785 create_time: 2019‑05‑29 02:56:14 update_time: 2019‑05‑29 03:08:53 *************************** 2. row *************************** id: 13917f filename: percona.search.sql cp_schema: percona cp_table: search offset: 59999013 end_pos: 812604151 create_time: 2019‑05‑29 02:56:14 update_time: 2019‑05‑29 03:08:52 ... 14 / 27

syncer (1) Resumes from loader (or any replication coordinates) Runs per table, per event (not at transaction level) Allows routing/�ltering per table Allows DML �ltering Multiple worker threads, but still in commit order. Eventually consistent # cat config.toml ... # [[skip‑dmls]] # db‑name = "foo" # tbl‑name = "bar" # type = "delete" ... syncer ‑config config.toml 15 / 27

syncer (2) { ... "log‑level": "info", "from": { "log‑file": "syncer.log", "host": "10.3.4.4", "log‑rotate": "day", "user": "tidb", "status‑addr": ":8271", "port": 3306 "server‑id": 1234567890, }, "meta": "./syncer.meta", "to": { "persistent‑dir": "", "host": "127.0.0.1", "flavor": "mysql", "user": "root", "worker‑count": 16, "port": 4000 "batch": 100, }, "max‑retry": 100, "enable‑gtid": false, "replicate‑do‑table": null, "auto‑fix‑gtid": false, "replicate‑do‑db": null, "disable‑detect": false, "replicate‑ignore‑table": null, "safe‑mode": false, "replicate‑ignore‑db": null, "config‑file": "config.toml", "skip‑ddls": [], "stop‑on‑ddl": false, "skip‑dmls": null, "execute‑ddl‑timeout": "3h", "route‑rules": null, "execute‑dml‑timeout": "1m", ... "execute‑queue‑length": 5000, "enable‑ansi‑quotes": false, "timezone": "" } 16 / 27

syncer Status HTTP Interface me@tidb:~/git/dm/syncer# curl localhost:8271/metrics \ | egrep '[syncer_binlog_file|syncer_binlog_pos]\{node' syncer_binlog_file{node="master"} 3 syncer_binlog_file{node="syncer"} 3 syncer_binlog_pos{node="master"} 7.31884075e+08 syncer_binlog_pos{node="syncer"} 7.32175424e+08 Log �le 2019/05/24 16:09:35 syncer.go:951: [info] [syncer]total events = 407220, tps = 719.666667, master‑ binlog = (mysql‑ bin.000009, 220613462), master‑ binlog‑gtid=, syncer‑ binlog = (mysql‑ bin.000006, 306307142), syncer‑ binlog‑gtid = TiDB checkpoint table when used with dm‑woker 17 / 27

dm - Data Migration Tool dm-master, dm-worker One dm-worker per replication source Sharded table to single table 18 / 27

lighting - super fast loader tool tikv-importer + tidb-lighting Parse mydumper or CSV �les and convert into kv �les Loads data directly to tikv server Disable compaction and set cluster in ‘import’ mode (makes writes way faster!) In case of import failure cluster needs to be set back to ‘normal’ mode During load cluster service is limited 20 / 27

How Others Do It Replicating to Another Platform

To Cassandra Apache Spark Striim https:/ /www.striim.com/integrations/mysql-cassandra/ Tungsten replicator All solutions requires some type of ETL process. None of them are easy to implement as requires model de-normalization 22 / 27

To Clickhouse Altinity https:/ /github.com/altinity/clickhouse-mysql-data-reader Tungsten replicator (of course) CSV to clickhouse-client (batch) DIY for tricky situations See how at 5:40PM at Bid Bend CD Jervin's next talk! 23 / 27

MariaDB ColumnStore Straight forward is distributed with MariaDB out of the box (not fully true) Some nice benchmarks: https:/ /www.percona.com/blog/2017/03/17/column-store- database-benchmarks-mariadb-columnstore-vs-clickhouse-vs- apache-spark/ 24 / 27

Rate Our Talk 25 / 27

Thank you for joining us this year! ... and thanks to our sponsors! 26 / 27

Questions?

Replicating MySQL Data to TiDB For Near Real-Time Analytics Who - PowerPoint PPT Presentation

Replicating MySQL Data to TiDB For Near Real-Time Analytics Who Are We Jervin Real, Architect Francisco Bordenave, Architect 2 / 27 TiDB at a Glance 3 / 27 TiDB at a Glance TiDB cluster server process SQL and interacts with clients MySQL

Performance Guide for MySQL Cluster Mikael Ronstrm, Ph.D Senior MySQL Architect Sun

TiDB: Overview, New Features, Multi-Cloud Shen Li, Kevin Xu PingCAP What is Ti(tanium) DB?

MySQL Replication Update MySQL 5.5 (GA) & MySQL 5.6.2 (Dev. Milestone) Lars Thalmann

MySQL Proxy meets: binlogs Jan Kneschke MySQL Enterprise Tools mailto: jan@mysql.com What is

MySQL Proxy Making MySQL more flexible Jan Kneschke jan@mysql.com MySQL Proxy proxy-servers

MySQL Group Replication & MySQL InnoDB Cluster Production Ready? Kenny Gryp MySQL Practice

Introducing TiDB For those coming from MySQL... Agenda History and Community

MySQL Cluster und MySQL Proxy Alles Online Diese Slides gibt es auch unter:

PHP and MySQL Dr. E. Benoist Winter Term 2006-2007 PHP and MySQL 1 PHP and MySQL Introduction

Gospel DNA Replicating Effective Ministry Afternoon Tea Back at 3:15pm Gospel DNA Replicating

Reducing Risk When Upgrading Your MySQL Environment Kenny Gryp MySQL Practice Manager My

More on gdb for MySQL DBAs or Using gdb to study MySQL internals and as a last resort Valerii

Percona MySQL About Me Qunar.com DB Director

Forecasting MySQL Scalability Baron Schwartz O'Reilly MySQL Conference & Expo 2011

PHP + MySQL MySQL on the command line is great and all well not its not really that great

gdb tips and tricks for MySQL DBAs or How gdb can help you to solve MySQL problems Valerii

Overview of this module Course 02429 Analysis of correlated data: Mixed Linear Models The

OVAR 16 QOL Update Health-Related Quality of Life Secondary endpoint: evaluate health-related

ANOVA There are only two mistakes one can make along the road to truth; not going all the

Outline Mixed models in R using the lme4 package Part 3: Longitudinal data Longitudinal data:

BEYOND THE CLUSTER: WAN DATA REPLICATION WITH GRIDGAIN YAKOV ZHDANOV WHO? Yakov Zhdanov: -

Replication Outline Failure Models Mirroring Quorums Spring 2002 CS 461 1 Why Replicate?

Replication and Robust Results Jim Herbsleb School of Computer Science Carnegie Mellon

Where Have all the Cycles Gone? Investigating the Runtime Overheads of OS- Assisted Replication

Replicating MySQL Data to TiDB For Near Real-Time Analytics Who - PowerPoint PPT Presentation

Replicating MySQL Data to TiDB For Near Real-Time Analytics Who Are We Jervin Real, Architect Francisco Bordenave, Architect 2 / 27 TiDB at a Glance 3 / 27 TiDB at a Glance TiDB cluster server process SQL and interacts with clients MySQL

Performance Guide for MySQL Cluster Mikael Ronstrm, Ph.D Senior MySQL Architect Sun

TiDB: Overview, New Features, Multi-Cloud Shen Li, Kevin Xu PingCAP What is Ti(tanium) DB?

MySQL Replication Update MySQL 5.5 (GA) &amp; MySQL 5.6.2 (Dev. Milestone) Lars Thalmann

MySQL Proxy meets: binlogs Jan Kneschke MySQL Enterprise Tools mailto: jan@mysql.com What is

MySQL Proxy Making MySQL more flexible Jan Kneschke jan@mysql.com MySQL Proxy proxy-servers

MySQL Group Replication &amp; MySQL InnoDB Cluster Production Ready? Kenny Gryp MySQL Practice

Introducing TiDB For those coming from MySQL... Agenda History and Community

MySQL Cluster und MySQL Proxy Alles Online Diese Slides gibt es auch unter:

PHP and MySQL Dr. E. Benoist Winter Term 2006-2007 PHP and MySQL 1 PHP and MySQL Introduction

Gospel DNA Replicating Effective Ministry Afternoon Tea Back at 3:15pm Gospel DNA Replicating

Reducing Risk When Upgrading Your MySQL Environment Kenny Gryp MySQL Practice Manager My

More on gdb for MySQL DBAs or Using gdb to study MySQL internals and as a last resort Valerii

Percona MySQL About Me Qunar.com DB Director

Forecasting MySQL Scalability Baron Schwartz O'Reilly MySQL Conference &amp; Expo 2011

PHP + MySQL MySQL on the command line is great and all well not its not really that great

gdb tips and tricks for MySQL DBAs or How gdb can help you to solve MySQL problems Valerii

Overview of this module Course 02429 Analysis of correlated data: Mixed Linear Models The

OVAR 16 QOL Update Health-Related Quality of Life Secondary endpoint: evaluate health-related

ANOVA There are only two mistakes one can make along the road to truth; not going all the

Outline Mixed models in R using the lme4 package Part 3: Longitudinal data Longitudinal data:

BEYOND THE CLUSTER: WAN DATA REPLICATION WITH GRIDGAIN YAKOV ZHDANOV WHO? Yakov Zhdanov: -

Replication Outline Failure Models Mirroring Quorums Spring 2002 CS 461 1 Why Replicate?

Replication and Robust Results Jim Herbsleb School of Computer Science Carnegie Mellon

Where Have all the Cycles Gone? Investigating the Runtime Overheads of OS- Assisted Replication

MySQL Replication Update MySQL 5.5 (GA) & MySQL 5.6.2 (Dev. Milestone) Lars Thalmann

MySQL Group Replication & MySQL InnoDB Cluster Production Ready? Kenny Gryp MySQL Practice

Forecasting MySQL Scalability Baron Schwartz O'Reilly MySQL Conference & Expo 2011