Accelerating MySQL with JIT Compilers David Yeager Percona Live - - PowerPoint PPT Presentation

accelerating mysql with jit compilers
SMART_READER_LITE
LIVE PREVIEW

Accelerating MySQL with JIT Compilers David Yeager Percona Live - - PowerPoint PPT Presentation

Accelerating MySQL with JIT Compilers David Yeager Percona Live Santa Clara April 2018 What is a Just-In-Time Compiler? Java source code C/C++ source code Java Compiler C Compiler Bytecode Machine code Profiling Information Java JIT


slide-1
SLIDE 1

Accelerating MySQL with JIT Compilers

David Yeager

Percona Live Santa Clara April 2018

slide-2
SLIDE 2

2

What is a Just-In-Time Compiler?

Java source code Bytecode Machine code Java Compiler Java JIT Compiler C/C++ source code Profiling Information Dynimizer JIT Compiler C Compiler Machine code Machine code

slide-3
SLIDE 3

3

How MySQL benefits from JITs

OVH public cloud, 2 vCores x 2.3 Ghz (Broadwell Xeon) template B2-7, BHS1 datacenter time time time

slide-4
SLIDE 4

4

How MySQL benefits from JITs

*OVH public cloud, 2 vCores x 2.3 Ghz (Broadwell Xeon) template EG-7-SSD, BHS1 datacenter *tpcc-mysql is not validated or certified by the TPC corporation and so this is not an official TPC-C result

MySQL 5.7 tpcc-mysq / Wordpress

slide-5
SLIDE 5

5 $ sudo bash -c 'bash <(wget -O - https://dynimize.com/install) -default'

Installation

Dynimizer Usage In a Nutshell

$ sudo dyni -start Dynimizer started $ sudo dyni -status Dynimizer is running mysqld, pid: 20722, dynimizing $ sudo dyni -status Dynimizer is running mysqld, pid: 20722, dynimized

Usage

slide-6
SLIDE 6

6 $ sudo dyni -start Dynimizer started

1 Start

$ sudo dyni -status Dynimizer is running mysqld, pid: 20722, profiling

3 Profjling

$ sudo dyni -status Dynimizer is running

2 Monitoring

Dynimizer Usage

4 Dynimizing

$ sudo dyni -status Dynimizer is running mysqld, pid: 20722, dynimizing $ sudo dyni -status Dynimizer is running mysqld, pid: 20722, dynimized

5 Dynimized

pid 20722 drastically change phase?

Y N Reoptimize (can be disabled)

slide-7
SLIDE 7

7

Hardening Dynimizer For Production

$ dyni -optimizeOnce:y Default is to reoptimize after large changes in workload

  • This setting disables it
  • Prevents temporary performance overhead if had to re-optimize in

middle of a workload

  • No changes to machine code == more stable
  • More conservative
  • If workload changes drastically, Dynimizer improvement will be reduced
slide-8
SLIDE 8

8

Hardening Dynimizer For Production

$ dyni -secureCodeCache:y

Default code cache is executable, readable and writable at the same time

  • This setting makes code cache executable and read-only
  • Enable automatically on SELinux for extra security
  • You may want this enabled regardless

$ dyni -pid <number>

You may want to limit Dynimizer to a specific mysqld process

slide-9
SLIDE 9

9

Configuring with /etc/dyni.conf

[options] log:/var/log/dyni.log maxLogSize:1MB

  • ptimizeOnce: n

fastCompile: n initdService: n secureCodeCache: n [exeList] mysqld #sysbench #tpcc_start #[users] #mysql

  • This is dyni.conf after default installation
  • Overridden by command-line options

For example: $ dyni -optimizeOnce:y will override dyni.conf

  • Can target other programs by adding exe names under

[exeList]

Non-mysqld targets not supported yet so test thoroughly!

slide-10
SLIDE 10

10

OLTP workloads are mostly front-end CPU stalls

  • Instruction cache misses, branch mispredictions, ITLB misses
  • Use profiling information to better layout the machine code, reduce

branching

Other profile guided optimizations

  • Hot call-site inlining, sparse conditional constant propagation
  • Dead code elimination, copy propagation
  • Loop unrolling, branch target alignment
  • Other optimizations

Sources of performance gain

slide-11
SLIDE 11

11

  • High CPU usage
  • Long running workloads
  • Well indexed queries
  • Have fully optimized MySQL,

want even more performance

  • Read heavy workload
  • SELECT: lots of front-end CPU stalls
  • Working set fits into the buffer pool
  • Low CPU usage scenarios
  • Lots of writes to slow disks

IO bottleneck

  • Working set doesn't fit in buffer pool
  • Full table scans
  • Short mysqld process lifetime
  • > 5 k threads

Current ptrace scales poorly

Most Beneficial Least Beneficial

When can Dynimizer help?

slide-12
SLIDE 12

12

$ perf stat -e r0280:u,r0380 -p 30041 sleep 30 Performance counter stats for process id '30041': 3,224,918,396 r0280:u [100.00%] 39,530,772,359 r0380

When can Dynimizer help?

  • I-cache misses a good indicator
  • r0280 means I-cache misses for

last several generations of Intel CPUs

  • u: is user mode, r0380 is instruction

fetches

  • 3,224,918,396/39,530,772,359 = 8%
  • > 5% indicates instruction bandwidth is

a serious bottleneck

slide-13
SLIDE 13

13

DYNIMIZER PROCESS TARGET PROCESS BEING OPTIMIZED

ORIGINAL PROGRAM MACHINE CODE CODE CACHE

+

HIGH-LEVEL OPTIMIZATIONS MICROARCHITECTURE- SPECIFIC OPTIMIZATIONS IR CONVERT TO MACHINE CODE COLLECT SAMPLE BASED PROFILING DATA LINUX PERF_EVENTS SUBSYSTEM READ PROCESS STATE (MACHINE CODE, DATA) LINUX PTRACE COMMIT OPTIMIZED MACHINE CODE LINUX PTRACE

MySQL + Dynimizer Architecture

IR IR

slide-14
SLIDE 14

14

Dynimizer is the Everyman's PGO

  • Available in GCC

Compile with instrumentation

Training run with profiling

Recompile

  • Difficult to find a representative

workload that will stand up over time

  • Labour intensive
  • For large scale MySQL

deployments that can amortize the labour

  • Orders of magnitude easier

Trivial usage: $ dyni -start

Not required to build from source

1-5 minutes to optimize

  • Zero downtime
  • Includes shared libraries
  • Way more flexible

Can optimize code for each run

Profile Guided Optimization Dynimizer JIT

slide-15
SLIDE 15

15

Supported Targets

  • Linux x86-64
  • That means mysqld

Optimization Target MySQL Server MariaDB Server Percona Server Version 5.5 – 5.7 5.5 – 10.2 5.5 – 5.7

slide-16
SLIDE 16

16

Sysbench: MySQL 5.7 OLTP-RO

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1 2 4 8 16 32 64 128 Transactions/Second Threads WITH Dynimizer WITHOUT Dynimizer

CPU: Intel(R) Xeon(R) CPU E3-1270 v6 @ 3.80GHz, 4 cores, 8 Threads (Kaby Lake) RAM: 32 GB of 2400 MHz DDR4 *This is a dedicated server rented from OVH, model: SP-32 Server, data center BHS 5 *Relative speedups the similar across various table size or number of tables, so long fits into memory

slide-17
SLIDE 17

17 20000 40000 60000 80000 100000 1 2 4 8 16 32 Transactions/Sec Threads WITH Dynimizer WITHOUT Dynimizer 20000 40000 60000 80000 100000 120000 140000 1 2 4 8 16 32 64 128 Transactions/Second Threads WITH Dynimizer WITHOUT Dynimizer

Sysbench: MySQL 5.7 OLTP Simple

20000 40000 60000 80000 100000 120000 140000 1 2 4 8 16 32 64 128 Transactions/Second Threads WITH Dynimizer WITHOUT Dynimizer

20000 40000 60000 80000 100000 120000 140000 1 2 4 8 16 32 64 128 Transactions/Second Threads

slide-18
SLIDE 18

18

5% 10% 15% 20% 25% 30% 35% 40% 45% 1 2 4 8 16 32 Threads

  • ltp read-only
  • ltp-simple

select select-random-ranges

Sysbench: TPS Increase

5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 1 2 4 8 16 32 64 128 Threads

  • ltp read-only
  • ltp-simple

Threads

slide-19
SLIDE 19

19

5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 1 2 4 8 16 32 Threads

  • ltp read-only
  • ltp-simple

select select-random-ranges

  • 70.0%
  • 65.0%
  • 60.0%
  • 55.0%
  • 50.0%
  • 45.0%
  • 40.0%
  • 35.0%
  • 30.0%
  • 25.0%
  • 20.0%

1 2 4 8 16 32 64 128 Threads

  • ltp read-only
  • ltp-simple

Reduction in Branch Mispredictions

slide-20
SLIDE 20

20

5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 1 2 4 8 16 32 Threads

  • ltp read-only
  • ltp-simple

select select-random-ranges

  • 100.0%
  • 80.0%
  • 60.0%
  • 40.0%
  • 20.0%

0.0% 20.0% 1 2 4 8 16 32 64 128 Threads

  • ltp read-only
  • ltp-simple

Reduction in ITLB Misses

slide-21
SLIDE 21

21

5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 1 2 4 8 16 32 Threads

  • ltp read-only
  • ltp-simple

select select-random-ranges

  • 60.0%
  • 55.0%
  • 50.0%
  • 45.0%
  • 40.0%
  • 35.0%
  • 30.0%
  • 25.0%

1 2 4 8 16 32 64 128 Threads

  • ltp read-only
  • ltp-simple

Reduction in I-Cache Misses

slide-22
SLIDE 22

22

0.0% 10.0% 20.0% 30.0% 1 2 4 8 16 32 6 Threads

  • ltp read-only
  • ltp-simple

select select-random-ranges

Increase in Instructions Per Cycle

0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 1 2 4 8 16 32 64 128 Threads

  • ltp read-only
  • ltp-simple

select select-random-ranges

slide-23
SLIDE 23

23

Caveats: Steep warmup curve

Will be reduced in next major release

slide-24
SLIDE 24

24

Caveats: Memory Usage

  • 4 GB per process during the dynimizing phase only

Freed once optimized

Extra RAM not necessary. Just increase swap by 4 GB

May not be appropriate for some micro cloud instances

  • Will be reduced in next major release.
slide-25
SLIDE 25

25

Noteworthy attributes

  • Exploiting Run-Time Information
  • Zero downtime
  • Optimize in minutes
  • Target app source code not required
  • Optimize across shared libraries
  • Simple usage
  • Little to no configuration necessary
slide-26
SLIDE 26

26

Coming soon...

  • Cache compilation for instant optimized restart of target processes (mysqld)
  • Lower profiling and memory overheads
  • Improved phase change detection
  • More optimizations
  • Toggle between code cache versions depending on program phase
  • Many more target programs to optimize.

Have observed similar improvements with MongoDB

  • Many new optimizations and speedups along the way
slide-27
SLIDE 27

27

Questions?

To learn more visit dynimize.com

slide-28
SLIDE 28

28

Rate My Session