Artificial Intelligence Database Performance Tuning Roel Van de - - PowerPoint PPT Presentation

artificial intelligence database performance tuning
SMART_READER_LITE
LIVE PREVIEW

Artificial Intelligence Database Performance Tuning Roel Van de - - PowerPoint PPT Presentation

Artificial Intelligence Database Performance Tuning Roel Van de Paar Percona Agenda GA: How it works, terminology, variables, example Database Tuning & Surrounding thoughts gaai POC Results 2 Define: GA A Genetic


slide-1
SLIDE 1

Artificial Intelligence Database Performance Tuning

Roel Van de Paar Percona

slide-2
SLIDE 2

2

Agenda

  • GA: How it works, terminology, variables, example
  • Database Tuning & Surrounding thoughts
  • gaai
  • POC
  • Results
slide-3
SLIDE 3

3

Define: GA

A Genetic Algorithm (GA) is an lightweight Artificial Intelligence (AI) evolutionary algorithm which mimics Darwin’s theory of natural evolution.

slide-4
SLIDE 4

4

GA Terminology

Population

(inc. any offspring)

v Chromosomes (“Individuals”) v Genes (“Chromosome Length”)

slide-5
SLIDE 5

5

GA: it’s all about the genes

2 Parents v Children Children may get;

  • Genes mixed from parents (“crossover”)
  • Modified (“mutated”) genes
slide-6
SLIDE 6

6

GA Loop

Population is created ‘randomly’ (can be pre-populated) v [loop]> Population is evaluated (i.e. each individual receive a fitness value) v A new population is created: Population can be sorted / kept or discarded in part (“selection”) Crossover, gene mutations etc. v possible intermediary re-eval <[loop]

slide-7
SLIDE 7

7

GA Fitness

A fitness value is the result of a chosen fitness function As a rather simple/limited example: FITNESS = RAND(A) + RAND(B) + RAND(C) where A,B,C are 0-1000: Highest fitness value=3000, lowest=0 Optimize towards a negative (lowest value=best) calculated fitness; FITNESS=-FITNESS i.e. -3000 becomes 3000 so the lowest value becomes best Optimize towards %: 1/FITNESS or 1-(1/FITNESS) etc. Basically; anything that can be optimized towards a best result can be GA’ed

slide-8
SLIDE 8

8

GA Variables

GA Variables are often binary (or represented in binary) where a single bit is a single gene But they do not need to be! One step further is variables as genes, where all variables are alike a=0-100 with step 1, b=0-100 with step 1, c=0-100 with step 1 The most advanced is variables that are in disparate ranges a=-1 to 1 with step 0.01, b=0-100 with step 0.5, etc.

slide-9
SLIDE 9

9

GA Dev TIP: GA Value-store Genes

Everyone working with GA can benefit from this hack/approach Example: significant genes (used in fitness): a,b,c non-significant genes (not used in fitness): d i.e. sub-eval (think sub-total) data can be stored in another gene where such gene is never set/updated/mutated, but only used for tracking certain calculations, results, statuses, etc. This optimizes (though not in all cases) the number of calculations

slide-10
SLIDE 10

10

So why do we need GA’s?

To optimize... everything

slide-11
SLIDE 11

11

How many persons can we fit...

slide-12
SLIDE 12

12

GA Application Domains #1

Bayesian inference links to particle methods in Bayesian statistics and hidden Markov chain models Artificial creativity Chemical kinetics (gas and solid phases) Calculation of bound states and local-density approximations Code-breaking, using the GA to search large solution spaces of ciphers for the one correct decryption. Computer architecture: using GA to find out weak links in approximate computing such as lookahead. Configuration applications, particularly physics applications of optimal molecule configurations for particular systems like C60 (buckyballs) Construction of facial composites of suspects by eyewitnesses in forensic science. Data Center/Server Farm. Distributed computer network topologies Electronic circuit design, known as evolvable hardware Feature selection for Machine Learning Feynman-Kac models File allocation for a distributed system Filtering and signal processing Finding hardware bugs. Game theory equilibrium resolution Genetic Algorithm for Rule Set Production Scheduling applications, including job-shop scheduling and scheduling in printed circuit board assembly. Learning robot behavior using genetic algorithms Image processing: Dense pixel matching Learning fuzzy rule base using genetic algorithms Molecular structure optimization (chemistry) Optimisation of data compression systems, for example using wavelets. Power electronics design. SOURCE: https://en.wikipedia.org/wiki/List_of_genetic_algorithm_applications Traveling salesman problem and its applications

slide-13
SLIDE 13

13

GA Application Domains #2

Climatology: Estimation of heat flux between the atmosphere and sea ice Climatology: Modelling global temperature changes Design of water resource systems Groundwater monitoring networks Design of anti-terrorism systems Linguistic analysis, including grammar induction and other aspects of Natural language processing (NLP) such as word sense disambiguation. Automated design of sophisticated trading systems in the financial sector Representing rational agents in economic models such as the cobweb model Real options valuation Audio watermark insertion/detection Airlines revenue management Automated design of mechatronic systems using bond graphs and genetic programming (NSF) Automated design = computer-automated design Automated design of industrial equipment using catalogs of exemplar lever patterns Automated design, including research on composite material design and multi-objective design of automotive components for crashworthiness, weight savings, and

  • ther characteristics

Container loading optimization Control engineering, Marketing mix analysis Mechanical engineering Mobile communications infrastructure optimization.Plant floor layout Pop music record production Quality control Timetabling problems, such as designing a non-conflicting class timetable for a large university Vehicle routing problem Optimal bearing placement

slide-14
SLIDE 14

14

GA Application Domains #3

Computer-automated design Bioinformatics Multiple Sequence Alignment Bioinformatics: RNA structure prediction Bioinformatics: Motif Discovery Biology and computational chemistry Building phylogenetic trees. Gene expression profiling analysis. Medicine: Clinical decision support in ophthalmology Computational Neuroscience: finding values for the maximal conductances of ion channels in biophysically detailed neuron models Protein folding and protein/ligand docking Selection of optimal mathematical model to describe biological systems Operon prediction. Neural Networks; particularly recurrent neural networks Training artificial neural networks when pre-classified training examples are not readily obtainable (neuroevolution) Clustering, using genetic algorithms to optimize a wide range of different fit-functions. Multidimensional systems Multimodal Optimization Multiple criteria production scheduling Multiple population topologies and interchange methodologies Mutation testing Parallelization of GAs/GPs including use of hierarchical decomposition of problem domains and design spaces nesting of irregular shapes using feature matching and GAs. Rare event analysis Solving the machine-component grouping problem required for cellular manufacturing systems Stochastic optimization Tactical asset allocation and international equity strategies Wireless sensor/ad-hoc networks.

slide-15
SLIDE 15

15

Simple GA Example

@ https://github.com/Percona-QA/gaai/blob/master/ga_example/ga_example.lua

git clone https://github.com/Percona-QA/gaai.git cd ga_example lua ga_example.lua Polation: 100, Genes: 10, Generations: 100 This GA simply takes sum(rand(0,9999999/10),rand(idem),...rand(n)) i.e. a random number between 0 and 9999999 divided by the number of genes * the number of genes (max fitness=9999999)

slide-16
SLIDE 16

16

A bit of (MySQL) database tuning history

  • Past: very poor defaults/templates, settings tuning a must
  • Current: more optimized/increased defaults, settings tuning

may still be recommended for high-use production systems

  • Future: automatically adjusting settings (GA or logic based)

Past: MANUAL > Future: AUTOMATED

Automated systems are less error prone and can be optimized over time!

slide-17
SLIDE 17

17

GA Database Tuning: a new concept / mindset

  • It does not really matter which workload GA optimizes

○ i.e. there is no “right”, “wrong”, “common” or “specific” one ○ GA will be able to optimize any of them

  • This is dissimilar to past performance benchmarking, which is usually

tuned towards/optimized for a specific load (or set of loads)

  • It matters less here how much effective % is gained using

a specific set of options for a specific semi-synthetic workload

  • It matters much more here how much overall improvement is seen
  • ver time as the workload changes (real production workloads)
slide-18
SLIDE 18

18

Thoughts on Database Tuning #1

  • R/O variable optimization require restart: not suitable for production

systems

  • Sysbench load is uniform/synthetic (easier to optimize), though

I expect that actual user loads will achieve similar (i.e. 80%) similar ROI’s, unless the data being processed is highly random

  • Tuning various memory buffers can be complex and requires

surrounding “safety” code calculations (or value ranges) to avoid OOM

  • Things may change over time, for ex. the number of client connections
slide-19
SLIDE 19

19

Thoughts on Database Tuning #2

  • It would be good to cover for special events like checkpoints

(Longer sample runtimes may be sufficient to cover this)

  • Not all mysqld variables automatically lend themselves to “pure

performance tuning” as some variables are features - setting them changes the performance, but only because the server functionality was modified also - i.e. the performance offset may be expected (credit: Laurynas Biveinis)

  • Some vars require longer runtime to sample (e.g. InnoDB buffer pool)
slide-20
SLIDE 20

20

Thoughts on Database Tuning #3

  • Optimizations are system dependent!

○ For example, using tmpfs vs ssd vs slow hdd’s, fast vs slow I/O controllors, number of cpu threads, OS configuration etc. - the speed will vary differently for different systems ○ This is one of the great strengths of using GA for database tuning: ■ Optimizes per-load, i.e. load-specific var adjustments/tuning ■ Optimizes per-system, i.e. hardware/OS optimized var tuning ■ Optimizes per-moment in time, i.e. changes in any area over time ■ Optimizes across all factors combined ○ For humans, this is possible only in a (very) limited fashion, and requires a good understanding of each optimization plane.

slide-21
SLIDE 21

21

Thoughts on Database Tuning #4

  • A human may miss non-obvious areas of optimization

○ For example, if many buffers were automatically made smaller then there would be more room for other workload-specific performance-affecting buffers.

  • Performance drops (usually light & short) may be seen

○ A possible fix gradual/staged/stepped changes ■ Example; stepped changes, i.e. +100/-100 instead of random

Note; the actual change would still be random (e.g. from -100 to +100 with step 1)

■ This may also help with variables that need a larger sample duration window - needs further evaluation

slide-22
SLIDE 22

22

Thoughts on Database Tuning #5

  • Optimization towards other fitness values is possible

○ For example, with what set of options do I see the least amount of client rejects (locking etc.), network disconnects, etc. ○ These can take a smaller/secondary importance value in the fitness (though “combining discongruous fitness values” is a complex topic) ○ Another example is taking errors as a guide for what value areas to avoid for given parameters, though human smarts is better (OOM etc.)

  • Each variable will be more or less optimizable by GA. For example,

https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysv ar_innodb_thread_concurrency would seem highly optimizable

slide-23
SLIDE 23

23

gaai

  • An advanced proof-of-concept/small framework which could easily be

expanded to become a full-fledged GA performance optimizer, or could easily be adapted to use more complex/different GA algo’s etc.

  • Code is GPL v2 licensed, GA code is MIT licensed
  • Not connected in any way with Ottertune. They’ve done some

interesting work also https://db.cs.cmu.edu/papers/2017/p1009-van-aken.pdf

  • As a POC, starts with a very poorly optimized server and tunes 13

InnoDB parameters automatically to improve performance

slide-24
SLIDE 24

24

gaai continued

  • There are likely many other variables which can be tuned by GA
  • The POC is (made) hyper-fast in applying changes, but for actual

production machines the pace of change can be; 1) slower 2) further controlled with sanity checks etc. (avoids major drops)

  • Further GA algo optimization is possible

○ Limit or eliminate the number of re-evals ○ Use a faster/more advanced GA algorithm

slide-25
SLIDE 25

25

POC: Start with poorly optimized server

  • MYSQLD_PRECONFIG="--innodb-buffer-pool-size=5242880
  • -table-open-cache=1 --innodb-io-capacity=100
  • -innodb-io-capacity-max=100000 --innodb-thread-concurrency=1
  • -innodb-concurrency-tickets=1 --innodb-flush-neighbors=2
  • -innodb-log-write-ahead-size=512 --innodb-lru-scan-depth=100
  • -innodb-random-read-ahead=1 --innodb-read-ahead-threshold=0
  • -innodb-commit-concurrency=1 --innodb-change-buffer-max-size=0
  • -innodb-change-buffering=none"
slide-26
SLIDE 26

26

POC: Genes: Tune 13 InnoDB Variables

= Approx 8.134713296270707e+36 possible combinations

slide-27
SLIDE 27

27

Sysbench Prepare/Run

  • Prepare

sysbench /usr/share/sysbench/oltp_insert.lua

  • -mysql-storage-engine=innodb --table-size=${TABLESIZE}
  • -tables=${NROFTABLES} --mysql-db=test --mysql-user=root
  • -db-driver=mysql --mysql-socket=${BASEDIR}/socket.sock prepare

TABLESIZE=1000000, NROFTABLES=4

  • Run

sysbench /usr/share/sysbench/oltp_read_write.lua --report-interval=${1}

  • -time=0 --events=0 --index_updates=10 --non_index_updates=10
  • -distinct_ranges=15 --order_ranges=15 --threads=${2} --table-size=${3}
  • -tables=${4} --percentile=95 --verbosity=3 --mysql-db=test
  • -mysql-user=root --db-driver=mysql
  • -mysql-socket=${BASEDIR}/socket.sock run

$1=1 (1 SEC SAMPLING), $2=5 (5 THREADS), $3=1000000, $4=4

slide-28
SLIDE 28

28

(X: QPS, Y: TIME, 5 Threads)

slide-29
SLIDE 29

29

(X: QPS, Y: TIME, 5 Threads)

slide-30
SLIDE 30

30

The future

  • Query indexing GA optimization (with thanks @Peter Zaitsev)
  • Overall GA, per-option GA, or rule based “smarts” can all be explored
  • Variables which require larger sample duration windows; stepped?
  • Tuned options expansion, more surrounding logic, solid ranges
  • Far future; learning accross systems, more advanced AI
  • More immediate; real workload testing
slide-31
SLIDE 31

31

Real Workload GA Optimization

If you are interested in trying GA optimized performance for your production load, we are happy to work with you!

slide-32
SLIDE 32

32

Contacts

  • Percona

○ https://www.percona.com/about-percona/contact ○ https://twitter.com/Percona ○ https://www.linkedin.com/company/percona

  • Roel Van de Paar

○ roel.vandepaar@percona.com ○ https://twitter.com/RoelVandePaar ○ https://au.linkedin.com/in/roelvandepaar

slide-33
SLIDE 33

33

slide-34
SLIDE 34

34

Join the Percona Product Managers for Lunch!

  • With Tyler Duzan, Michael Coburn, and Alexander Rubin
  • Share your feedback
  • Get to see the product roadmaps

Wednesday @ the reserved area in back of Gaia Restaurant

slide-35
SLIDE 35

35

Thank You Sponsors!!

slide-36
SLIDE 36

36

Rate My Session: Example

slide-37
SLIDE 37

37