POLARDB for MyRocks Extending shared storage to MyRocks Zhang, Yuan - PowerPoint PPT Presentation

POLARDB for MyRocks Extending shared storage to MyRocks Zhang, Yuan Alibaba Cloud Apr, 2018

About me • Yuan Zhang • database engineer • Work at Ailbaba for 5 years • Focus on MySQL & MyRocks • email ： zhangyuan.zy@alibaba-inc.com MORE THAN JUST CLOUD

Agenda • Background • Basic Architecture • Implementation details • Performance Improment • Future plan MORE THAN JUST CLOUD

Background Why POLARDB for MyRocks MyRocks + Polarstore Benifits from MyRocks • Greate space efficiency, better compression • Greate write efficiency, lower write amplification • Fast data loading • Compatiable with MySQL Benifits from share-storage(polarstore) • Promising data consistency • Ability to scale read node immediately without full copy of data MORE THAN JUST CLOUD

Basic Architecture Primary • Accept Read/Write workload Replica • Only Accept Read workload • Share sst/wal with primary MORE THAN JUST CLOUD

Let’s Begin prepare for rocksdb wal replication • Base on AIiSQL5.7 • Port MyRocks from Facebook • Only support RocksDB and MyISAM engine • Convert system tables to RocksDB MORE THAN JUST CLOUD

Convert system tables to RocksDB Prepare for RocksDB WAL replication • Convert system tables to RocksDB • Except mysql.slow_log, mysql.general_log, they store in local disk, primary and replica have their owen mysql.slow_log, mysql.general_log tables. MORE THAN JUST CLOUD

Rocksdb WAL/Manifest replication Architecture MORE THAN JUST CLOUD

Rocksdb WAL/Manifest replication Asynchronous replication WAL Replication • Replay PUT/DELETE/MERGE Manifest Replicaion • Replay flush & compaction WAL and Manifest Coordination • Only apply VEdit while Applied lsn > VEdit lsn MORE THAN JUST CLOUD

Rocksdb WAL/Manifest replication Control Primary WAL and SST files deletion WAL deletion - original wal deletion will lead Replica lost wal • Lm : min_log_number on Primary • Ln : min_log_number on all Replicas • new_min_log_number = min( Lm , Ln ) • When WAL’s number < new_min_log_number , then this WAL can be deleted SST deletion - original SST deleteion will lead Replica cannot find SST and crash • min_version_number : the minimal version number replica is using • SST can be deleted only when It will’t be used by Primary and all Replicas MORE THAN JUST CLOUD

DDL&Cache replication Architecture MORE THAN JUST CLOUD

DDL Replication Remove frm,par files Frm,par files • Table metadata information • If Master and replica share frm,par files, DDL replication must be synchronous Remove frm,par files • Store these contents in RocksDB • Replica can read multi version of table schema • DDL replication is asynchronous MORE THAN JUST CLOUD

DDL Replication Remove frm,par files DDL replication is asynchronous • Multiple Table schema version in rocksdb • Row data also have different verisions MORE THAN JUST CLOUD

DDL Replication We have MDL lock to protect DDL operation in Primary. This lock also need in Replica’s DDL. Primary • Log MDL lock start and end. Replica • Replay MDL lock start A. lock MDL • Replay MDL lock end A. update table cache in myrocks B. unlock MDL MORE THAN JUST CLOUD

Cache Replication ACL, Procedure, Query cache Replicaition Primary • Log cache change in RocksDB WAL ACL, Procedure Replica • Replay this change from WAL and invaild this cache MORE THAN JUST CLOUD

Index Statistics Replication Persistent • Part index statistics information persist in each SST • Total index statistics store in INDEX_STATISTICS Memory • Rdb_dey_def::m_stats Update • Analyze table • Flush memtable • Compact   Replica listen PUT operation in INDEX_STATISTICS and reload statistic info to memory. MORE THAN JUST CLOUD

New Log Format log change for replication Log Types • DDL(START, END) • Cache change, ACL/Proc Log format • PUT/DELETE Log store location • __system__ column family MORE THAN JUST CLOUD

New Log Format New type in data dictionary // Data dictionary types enum DATA_DICT_TYPE { enum POLAR_LOG_TYPE { DDL_ENTRY_INDEX_START_NUMBER = 1, INDEX_INFO = 2, TABLE_DDL = 1, CF_DEFINITION = 3, CACHE_CHANGE = 2, BINLOG_INFO_INDEX_NUMBER = 4, …… DDL_DROP_INDEX_ONGOING = 5, INDEX_STATISTICS = 6, END_POLAR_ROCK_TYPE = 255 MAX_INDEX_ID = 7, }; DDL_CREATE_INDEX_ONGOING = 8, POLAR_LOG = 100, // for polar replication END_DICT_INDEX_ID = 255 }; MORE THAN JUST CLOUD

New Log Format New type in data dictionary DDL_START • type: PUT • key: POLAR_LOG+TABLE_DDL+dbname.tablename • value: NULL DDL_END • type: DELETE • key: POLAR_LOG+TABLE_DDL+dbname.tablename • value: NULL CACHE_CHANGE • type: PUT • key: POLAR_LOG+CACHE_CHANGE+ACL/Proc • value: NULL MORE THAN JUST CLOUD

    New Log Format Problems DDL_START and DDL_END must be a pair. Problem 1: Primary Crash DDL_START • type: PUT • Primary crash after DDL_START , Primary will • key: POLAR_LOG+TABLE_DDL+dbname.tablename resent DDL_START when restart, and the previous • value: NULL DDL_END will lost. DDL_END • type: DELETE • Replica replay DDL_START and hold MDL lock, It • key: POLAR_LOG+TABLE_DDL+dbname.tablename will not unlock with DDL_END • value: NULL MORE THAN JUST CLOUD

New Log Format Problems DDL_START and DDL_END must be a pair. Problem 1: Primary Crash • Primary crash after DDL_START , Primary will resent DDL_START when restart, and the previous DDL_END will lost. • Replica replay DDL_START and hold MDL lock, It will not unlock with DDL_END Solution • Primary Scan RocksDB to find record TABLE_DDL when restart, if found, Primary should resent DDL_END , and Replica will unlock the old lock MORE THAN JUST CLOUD

  New Log Format Problems DDL_START and DDL_END must be a pair. Problem 2: Replica Crash • Replica carsh after DDL_START , Replica will continue to replay DDL_END when restart • But the lock with DDL_START will not exist after restart, Replica replay DDL_END to unlock a MDL lock which is not exist MORE THAN JUST CLOUD

  New Log Format Problems DDL_START and DDL_END must be a pair. Problem 2: Replica Crash • Replica carsh after DDL_START , Replica will continue to replay DDL_END when restart • But the lock with DDL_START will not exist after restart, Replica replay DDL_END to unlock a MDL lock which is not exist Solution • Replica Scan RocksDB to find record TABLE_DDL when restart, if found, Replica should replay DDL_START to lock MORE THAN JUST CLOUD

MVCC MVCC based on RocksDB snapshot Keep a consistent snapshot in Replica • Replica can’t get the record after Primary compact Control compact in Primary • Compact in Primary should consider about Replica ’s snapshot • Only delete record when sequnce >= Sn , Sn is the laste seqence in Replica • Primary ’s snapshot list merge with replica ’ s snapshot list. MORE THAN JUST CLOUD

MVCC MVCC based on RocksDB snapshot Keep a consistent snapshot in Replica MORE THAN JUST CLOUD

Performance Improment Optimize write performance • Async-commit • Optimize auto_increment • MORE THAN JUST CLOUD

Performance Improment Async-commit Original pipeline write MORE THAN JUST CLOUD

Performance Improment Async-commit Async-commit MORE THAN JUST CLOUD

Performance Improment Optimize write performance Optimize auto_increment • write need check unique • Do Get first then write • Get is expensive Actually, most auto_increment check uniqueness is not necessary. Espacially, when all the auto_incment column is automatically generated. MORE THAN JUST CLOUD

Performance Improment Optimize write performance Optimize auto_increment • max_specify_pk: user sepcified max auto_increment value • if pk > max_specify_pk, skip unique check • if pk <= max_specify_pk nead unique check max_specify_pk update when user use sepcified auto_increment value MORE THAN JUST CLOUD

Future Feature • Online DDL • Multiple-Master Performance • Compaction optimize MORE THAN JUST CLOUD

Q&A MORE THAN JUST CLOUD

POLARDB for MyRocks Extending shared storage to MyRocks Zhang, Yuan - PowerPoint PPT Presentation

POLARDB for MyRocks Extending shared storage to MyRocks Zhang, Yuan Alibaba Cloud Apr, 2018 About me Yuan Zhang database engineer Work at Ailbaba for 5 years Focus on MySQL & MyRocks email

PolarDB Cloud Native DB @ Alibaba Lixun Peng Inaam Rana Alibaba Cloud Team Agenda

Parallel Query Execution in POLARDB for MySQL ystein Grvlen Benny Wang Alibaba Cloud Agenda

Purdue School of Engineering and Technology IUPUI Deans Industry Advisory Council April 2,

Combined mean-field and semiclassical limits of large fermionic systems Li CHEN Joint work with

Global solvability of some double-diffusive convection systems . Mitsuharu O TANI Waseda

Transmission of Classical Information through Gaussian Quantum Channels with Memory Oleg V.

THE JACOBIAN & CHANGE OF VARIABLES MATH 200 GOALS Be able to convert integrals in

School of Computer Science & Engineering UNSW http://www.cse.unsw.edu.au/ An

Fostering cooperation and synergies while avoiding unnecessary duplication of facilities Dr

GBIF MONTHLY UPDATE March 2016 GBIF BY THE NUMBERS 648,781,852 species occurrence records

Ext xtraction for Biocollections using Ensembles of f OCRs caro Alzuru, Rhiannon Stephens,

Who really pays for environmental crime? S entencing in Environmental Prosecutions

Law of the Land: Understanding Tasmanias Environmental Laws EDO Tasmania Community Legal

20 Million T r ees Pr ojec t: Ha ve n Sc ho o l Stude nts Sc ho o ls Na tio na l T re e Da y

of Solutions REMOTE CONNECTED CARE IN THE DIGITAL AGE Dr. Kanav Kahol

0-to-hero 04/10 Mentors <> 04/10 Sessions Me CCIE R&S, CCIE SP, JNSIP SP Network

Robert MacArthur and Edward O. Wilson The Theory of Island Biogeography Extinction balances

Prehistoric Britain YEAR THREE Autumn 1 LESSON TWO WHICH ANIMALS LIVED DURING THE ICE AGE?

2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 18 19 Evidence of bipedalism in the fossil

2 3 4 5 6 7 8 9 10 11 13 15 Evidence of bipedalism in the fossil record 16 Evidence

SURFdrive AN OWNCLOUD SYNC & SHARE SERVICE TF-storage Rogier Spoor SURFnet 22 SEPT 2014

A.I in Automotive? Why and When. AGENDA 01 02 03 04 Definitions A.I ? A.I in automotive

Maintenance in Vehicle Dynamics For Distributed Vehicle Platoon Networks Ankur Sarker , Chenxi

If theres anything that youre confused about, get it straight this week. Come see me for

POLARDB for MyRocks Extending shared storage to MyRocks Zhang, Yuan - PowerPoint PPT Presentation

POLARDB for MyRocks Extending shared storage to MyRocks Zhang, Yuan Alibaba Cloud Apr, 2018 About me Yuan Zhang database engineer Work at Ailbaba for 5 years Focus on MySQL & MyRocks email

PolarDB Cloud Native DB @ Alibaba Lixun Peng Inaam Rana Alibaba Cloud Team Agenda

Parallel Query Execution in POLARDB for MySQL ystein Grvlen Benny Wang Alibaba Cloud Agenda

Purdue School of Engineering and Technology IUPUI Deans Industry Advisory Council April 2,

Combined mean-field and semiclassical limits of large fermionic systems Li CHEN Joint work with

Global solvability of some double-diffusive convection systems . Mitsuharu O TANI Waseda

Transmission of Classical Information through Gaussian Quantum Channels with Memory Oleg V.

THE JACOBIAN &amp; CHANGE OF VARIABLES MATH 200 GOALS Be able to convert integrals in

School of Computer Science &amp; Engineering UNSW http://www.cse.unsw.edu.au/ An

Fostering cooperation and synergies while avoiding unnecessary duplication of facilities Dr

GBIF MONTHLY UPDATE March 2016 GBIF BY THE NUMBERS 648,781,852 species occurrence records

Ext xtraction for Biocollections using Ensembles of f OCRs caro Alzuru, Rhiannon Stephens,

Who really pays for environmental crime? S entencing in Environmental Prosecutions

Law of the Land: Understanding Tasmanias Environmental Laws EDO Tasmania Community Legal

20 Million T r ees Pr ojec t: Ha ve n Sc ho o l Stude nts Sc ho o ls Na tio na l T re e Da y

of Solutions REMOTE CONNECTED CARE IN THE DIGITAL AGE Dr. Kanav Kahol

0-to-hero 04/10 Mentors &lt;&gt; 04/10 Sessions Me CCIE R&amp;S, CCIE SP, JNSIP SP Network

Robert MacArthur and Edward O. Wilson The Theory of Island Biogeography Extinction balances

Prehistoric Britain YEAR THREE Autumn 1 LESSON TWO WHICH ANIMALS LIVED DURING THE ICE AGE?

2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 18 19 Evidence of bipedalism in the fossil

2 3 4 5 6 7 8 9 10 11 13 15 Evidence of bipedalism in the fossil record 16 Evidence

SURFdrive AN OWNCLOUD SYNC &amp; SHARE SERVICE TF-storage Rogier Spoor SURFnet 22 SEPT 2014

A.I in Automotive? Why and When. AGENDA 01 02 03 04 Definitions A.I ? A.I in automotive

Maintenance in Vehicle Dynamics For Distributed Vehicle Platoon Networks Ankur Sarker , Chenxi

If theres anything that youre confused about, get it straight this week. Come see me for

THE JACOBIAN & CHANGE OF VARIABLES MATH 200 GOALS Be able to convert integrals in

School of Computer Science & Engineering UNSW http://www.cse.unsw.edu.au/ An

0-to-hero 04/10 Mentors <> 04/10 Sessions Me CCIE R&S, CCIE SP, JNSIP SP Network

SURFdrive AN OWNCLOUD SYNC & SHARE SERVICE TF-storage Rogier Spoor SURFnet 22 SEPT 2014