mysql and zfs
play

MySQL and ZFS Yves Trudeau Yves Trudeau Percona Percona Who am - PowerPoint PPT Presentation

MySQL and ZFS Yves Trudeau Yves Trudeau Percona Percona Who am I? Principal architect at Percona since 2009 (10 years already) Principal architect at Percona since 2009 (10 years already) With Sun Microsystems and MySQL


  1. MySQL and ZFS Yves Trudeau Yves Trudeau Percona Percona

  2. Who am I? • Principal architect at Percona since 2009 (10 years already…) • Principal architect at Percona since 2009 (10 years already…) • With Sun Microsystems and MySQL before Percona • With Sun Microsystems and MySQL before Percona • Physicist by training • Physicist by training • I like to understand how things work • I like to understand how things work 2

  3. Why a talk on MySQL and ZFS? I like both and I couldn’t decide… I like both and I couldn’t decide… • • They go along well They go along well • • They share many points in common They share many points in common • • 3

  4. Plan A quick tour of ZFS A quick tour of ZFS • • Configuration guidelines for MySQL/ZFS Configuration guidelines for MySQL/ZFS • • A real world example A real world example • • 4

  5. A tour of ZFS Click to add text Click to add text

  6. ZFS Highlights ● Developed by Sun for Solaris ● Developed by Sun for Solaris ● Now in many platforms ● Now in many platforms ● B-tree file storage, not just the directories ● B-tree file storage, not just the directories ● 128 bits pointers!!! ● 128 bits pointers!!! ● Files are split in records (b-tree leaves) ● Files are split in records (b-tree leaves) ● Records can be compressed ● Records can be compressed ● Copy-On-Write ● Copy-On-Write ● Native encryption ● Native encryption ● Checksums and self-healing ● Checksums and self-healing 6

  7. ZPOOL ● Base unit of storage ● Base unit of storage ● Made of block devices or even just files ● Made of block devices or even just files ● Disks, files, LV, mirror of disks, stripping, raidz, raidz2, raidz3… ● Disks, files, LV, mirror of disks, stripping, raidz, raidz2, raidz3… ● Filesystems from zpool ● Filesystems from zpool ● A server → many zpools ● A server → many zpools ● SLOG: Separated log device ● SLOG: Separated log device ● Cache devices, L2ARC ● Cache devices, L2ARC 7

  8. ZFS Filesystems ● A filesystem is: ● A filesystem is: 1. a profile of settings 1. a profile of settings 2. a mount point 2. a mount point 3. a snapshotable entity 3. a snapshotable entity ● Settings adapted → expected workload ● Settings adapted → expected workload ● Can be nested ● Can be nested ● Can be based on a snapshot (clone) ● Can be based on a snapshot (clone) 8

  9. ZVols ● A block device from ZFS ● A block device from ZFS ● Uber cool for virtual images ● Uber cool for virtual images ● Steps for a 3 nodes cluster: ● Steps for a 3 nodes cluster: 1. Create a base image on a Zvol 1. Create a base image on a Zvol 2. Snapshot the ZVol 2. Snapshot the ZVol 3. Clone snapshot 3 times (yields 3 new ZVols) 3. Clone snapshot 3 times (yields 3 new ZVols) 4. Start 3 VMs using the new Zvols 4. Start 3 VMs using the new Zvols <disk type='block' device='disk'> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/zvol/data/vms/kvm_PXC2'/> <source dev='/dev/zvol/data/vms/kvm_PXC2'/> 9

  10. The COW Magic ● ZFS never overwrites directly ● ZFS never overwrites directly ● How ZFS overwrites a record? ● How ZFS overwrites a record? 1. Writes it somewhere else 1. Writes it somewhere else 2. De-references the old record → new record 2. De-references the old record → new record 3. GC frees up the old record 3. GC frees up the old record • Easy snapshot (think InnoDB MVCC) • Easy snapshot (think InnoDB MVCC) • Easy cloning • Easy cloning • Wonderful for backups • Wonderful for backups • Transactional! • Transactional! 1 0

  11. ARC for Adaptive Replacement Cache ● Sophisticated file cache ● Sophisticated file cache ● Configurable ● Configurable ● Can store compressed data ● Can store compressed data ● Can be layered to disk (SSD/Flash) → L2ARC ● Can be layered to disk (SSD/Flash) → L2ARC 1 1

  12. Kernel Modules ● Many configuration parameters (ls /sys/modules/zfs/parameters/) ● Many configuration parameters (ls /sys/modules/zfs/parameters/) ● Version 0.7.5 has 169… ● Version 0.7.5 has 169… ● Examples: ● Examples: ➔ zfs_arc_max: max size the ARC can be ➔ zfs_arc_max: max size the ARC can be ➔ zfs_arc_meta_limit: Caps the metadata limit in ARC ➔ zfs_arc_meta_limit: Caps the metadata limit in ARC ➔ zfs_free_max_blocks: How fast the GC is going (InnoDB purge batch) ➔ zfs_free_max_blocks: How fast the GC is going (InnoDB purge batch) ➔ l2arc_write_max: how fast you allow writes to L2ARC ➔ l2arc_write_max: how fast you allow writes to L2ARC ➔ zfs_txg_timeout:max time span of a trx (think async writes) ➔ zfs_txg_timeout:max time span of a trx (think async writes) 1 2

  13. Configuration Guidelines for MySQL/ZFS Click to add text Click to add text

  14. When Should You Use MySQL/ZFS? ● For large compressible datasets ● For large compressible datasets ● Challenges with backup (mix of engines) ● Challenges with backup (mix of engines) ● Spare CPU capacity (compression) ● Spare CPU capacity (compression) ● Not IO bound ● Not IO bound ● Active dataset fits L2ARC (compressed) ● Active dataset fits L2ARC (compressed) ● To save your flash devices... ● To save your flash devices... 1 4

  15. ZFS Configuration ● 2 file systems for easy snapshots ● 2 file systems for easy snapshots ➔ /var/lib/mysql → The parent, configured for sequential ops ➔ /var/lib/mysql → The parent, configured for sequential ops ✔ recordsize = 128KB ✔ recordsize = 128KB ✔ compression can be more aggressive (gzip) ✔ compression can be more aggressive (gzip) ➔ /var/lib/mysql/data → The dataset ➔ /var/lib/mysql/data → The dataset ✔ recordsize = InnoDB page size (likely 16KB) ✔ recordsize = InnoDB page size (likely 16KB) ✔ fast compressor like lz4 ✔ fast compressor like lz4 ● Cache device (L2ARC) are great ● Cache device (L2ARC) are great ● SLOG devices help with high durability requirements ● SLOG devices help with high durability requirements 1 5

  16. MySQL Configuration ● innodb_doublewrite = 0 ● innodb_doublewrite = 0 ● O_Direct? ● O_Direct? ● InnoDB buffer pool? leave some Ram for the ARC ● InnoDB buffer pool? leave some Ram for the ARC ➔ no L2ARC → target ARC 0.5% of the data set ➔ no L2ARC → target ARC 0.5% of the data set ➔ 1TB of data ~ 5GB ARC ➔ 1TB of data ~ 5GB ARC ➔ Not a hard rule ➔ Not a hard rule ● Datadir = /var/lib/mysql/data ● Datadir = /var/lib/mysql/data ● innodb_log_group_home_dir, log-bin, slow-log, relay-log to /var/lib/mysql ● innodb_log_group_home_dir, log-bin, slow-log, relay-log to /var/lib/mysql 1 6

  17. Real World Examples Click to add text Click to add text

  18. A DR MySQL Replica in Google Cloud Dataset 700GB (2.5x compressible), fair replication traffic, all dataset is active Dataset 700GB (2.5x compressible), fair replication traffic, all dataset is active (random primary keys) (random primary keys) XFS ZFS XFS ZFS ● n1-standard-2 (~68$/month) ● n1-standard-2 (~68$/month) ● n1-standard-2 (~68$/month) ● n1-standard-2 (~68$/month) ● 1TB SSD (~175$/month) ● local 375GB Nvme (30$/month) ● 1TB SSD (~175$/month) ● local 375GB Nvme (30$/month) ● 500GB standard disk (20$/month) ● 500GB standard disk (20$/month) Total: 243$/month Total: 243$/month Total: 118$/month Total: 118$/month ZFS saves 125$/month ZFS saves 125$/month 1 8

  19. A PXC Cluster in AWS Dataset 2TB (2.5x compressible), needs more than 20k iops Dataset 2TB (2.5x compressible), needs more than 20k iops XFS/i3 ZFS/i3 XFS/i3 ZFS/i3 ● 3x i3.4xlarge: $2700/month ● 3x i3.2xlarge: $1350/month ● 3x i3.4xlarge: $2700/month ● 3x i3.2xlarge: $1350/month ● 2TB SC1: $50/month ● 2TB SC1: $50/month XFS/EBS/io1 XFS/EBS/io1 ● 3x r5.2xlarge: $1080/month ● 3x r5.2xlarge: $1080/month ● 3x 3TB 20k piops: $3900/month ● 3x 3TB 20k piops: $3900/month ZFS saves 1300$/month ZFS saves 1300$/month 1 9

  20. Will ZFS Really Perform Well? Sysbench TPC-C workload emulation, GCE n1-standard-2 with local 375GB, Sysbench TPC-C workload emulation, GCE n1-standard-2 with local 375GB, Scale 300, 2 threads Scale 300, 2 threads ZFS/Gzip ZFS/Gzip ZFS/Lz4 ZFS/Lz4 XFS XFS ● 59 Trx/s ● 59 Trx/s ● 69 Trx/s ● 69 Trx/s ● 110 Trx/s ● 110 Trx/s ● 1551 Qps ● 1551 Qps ● 1954 Qps ● 1954 Qps ● 3100 Qps ● 3100 Qps ● 85 GB on disk ● 85 GB on disk ● 102 GB on disk ● 102 GB on disk ● 284 GB on disk ● 284 GB on disk ● 26% used ● 26% used ● 39% used ● 39% used ● 76% used ● 76% used 2 0

  21. Will ZFS Really Perform Well With L2ARC? Sysbench TPC-C workload emulation, GCE n1-standard-2 with 500GB normal Sysbench TPC-C workload emulation, GCE n1-standard-2 with 500GB normal disk, 375GB local disk, Scale 300, 2 threads disk, 375GB local disk, Scale 300, 2 threads XFS XFS ZFS/Lz4/L2ARC ZFS/Lz4/L2ARC ● 3 TRX/s ● 3 TRX/s ● 29 TRX/s (l2arc warm) ● 29 TRX/s (l2arc warm) ● 87 QPS ● 87 QPS ● 830 QPS ● 830 QPS ● 284 GB on disk ● 284 GB on disk ● 102 GB on disk ● 102 GB on disk ● 70% used ● 70% used ● 21% used ● 21% used 2 1

  22. Conclusion ● MySQL and ZFS are great together ● MySQL and ZFS are great together ● Try, it is pretty easy ● Try, it is pretty easy ● Careful, you’ll get addicted ● Careful, you’ll get addicted 2 2

  23. Thank You to Our Sponsors

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend