Billion Goods in Few Categories: how Histograms Save a Life? - PowerPoint PPT Presentation

Example • Number of pages in the table: 20,000 • STATS SAMPLE PAGES : 20 (default) • Unique values in the secondary index: • In sample pages: 10 • In the table: 11 • Cardinality: 20,000 * 10 / 20 = 10,000 24

Example 2 • Number of pages in the table: 20,000 • STATS SAMPLE PAGES : 5,000 • Unique values in the secondary index: • In sample pages: 10 • In the table: 11 • Cardinality: 20,000 * 10 / 5,000 = 40 25

Use Larger STATS SAMPLE PAGES? • Time consuming mysql> select count(*) from goods; +----------+ | count(*) | +----------+ | 80303000 | +----------+ 1 row in set (35.95 sec) 26

Use Larger STATS SAMPLE PAGES? • Time consuming • With bigger number • 27.13/0.32 = 85 times slower! 26

Does ANALYZE TABLE Block Reads? User Manual claims it does not During the analysis, the table is locked with a read lock for InnoDB and MyISAM. 27

Does ANALYZE TABLE Block Reads? User Manual claims it does not • But! 27

Simply Increasing STATS SAMPLE PAGES Is not a solution 28

Solutions in Percona Server 5.7

Blocking ANALYZE TABLE Considered as a bug • jira.percona.com/browse/PS-2503 • lp:1704195 • bugs.mysql.com/87065 30

Blocking ANALYZE TABLE Considered as a bug • Fixed in Percona Server 5.6.38-83.0/5.7.20-18 30

Non-Blocking ANALYZE TABLE • Before the fix • Opens table statistics Concurrent DML allowed • Updates table statistics Concurrent DML allowed • Update finished • Invalidates entry in table definition cache Concurrent DML forbidden • Invalidates query cache Concurrent DML forbidden 31

Non-Blocking ANALYZE TABLE • After the fix • Opens table statistics Concurrent DML allowed • Updates table statistics Concurrent DML allowed • Update finished • Invalidates entry in table definition cache Concurrent DML forbidden • Invalidates query cache Concurrent DML forbidden 31

Without the Fix: Manual Update • InnoDB stores its statistics mysql.innodb index stats 32

Without the Fix: Manual Update • InnoDB stores its statistics mysql.innodb index stats • This table is writable 32

Without the Fix: Manual Update • InnoDB stores its statistics mysql.innodb index stats • This table is writable • Updating it with following FLUSH TABLE allows to fake any statistics 32

Without the Fix: Manual Update • InnoDB stores its statistics mysql.innodb index stats • This table is writable • Updating it with following FLUSH TABLE allows to fake any statistics • Hack • Not documented • Not recommended • Can stop working any time 32

5.7: Resume • With Percona fix for blocking ANALYZE TABLE we can use large value for STATS SAMPLE PAGES • Does not help when • Index cannot be used • Data distribution in the index vary a lot 33

5.7: Resume • With Percona fix for blocking ANALYZE TABLE we can use large value for STATS SAMPLE PAGES • Does not help when • Index cannot be used • Data distribution in the index vary a lot • Manual update allows to fix statistics • Not recommended • Can stop working any time 33

Histograms

What are the Histograms? • Optimizer Column Statistics • Engine-independent • No fancy calculations • Knows about data distribution 35

Number of Values in Each Bucket 800 600 400 200 0 36 1 2 3 4 5 6 7 8 9 10

Data in the Histogram 1 0 . 8 0 . 6 0 . 4 0 . 2 0 37 1 2 3 4 5 6 7 8 9 10

How Histograms are Helpful? • Accurate statistics • Truly persistent • No extra calculations on access • Optimizer knows about data distribution • Without touching the table! 38

Filtered Rows • Example data mysql> create table example(f1 int) engine=innodb; mysql> insert into example values(1),(1),(1),(2),(3); mysql> select f1, count(f1) from example group by f1; +------+-----------+ | f1 | count(f1) | +------+-----------+ | 1 | 3 | | 2 | 1 | | 3 | 1 | +------+-----------+ 3 rows in set (0.00 sec) 39

Filtered Rows • Without a histogram mysql> explain select * from example where f1 > 0\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 39 Extra: Using where

Filtered Rows • With the histogram mysql> select * from information_schema.column_statistics -> where table_name=’example’\G *************************** 1. row *************************** SCHEMA_NAME: hist_ex TABLE_NAME: example COLUMN_NAME: f1 HISTOGRAM: "buckets": [[1, 0.6], [2, 0.8], [3, 1.0]], "data-type": "int", "null-values": 0.0, "collation-id": 8, "last-updated": "2018-11-07 09:07:19.791470", "sampling-rate": 1.0, "histogram-type": "singleton", "number-of-buckets-specified": 3 1 row in set (0.00 sec) 39

Filtered Rows • With the histogram mysql> explain select * from example where f1 > 0\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 100.00 -- all rows 39 Extra: Using where

Filtered Rows • With the histogram mysql> explain select * from example where f1 > 1\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 40.00 -- 2 rows 39 Extra: Using where

Filtered Rows • With the histogram mysql> explain select * from example where f1 > 2\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 20.00 -- one row 39 Extra: Using where

Filtered Rows • With the histogram mysql> explain select * from example where f1 > 3\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 20.00 - one row 39 Extra: Using where

Example • EXPLAIN without histograms mysql> explain select goods.* from goods -> join categories on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -- Large range -> order by goods.cat_id -> limit 10\G -- We ask for 10 rows only! 40

Example • EXPLAIN without histograms *************************** 1. row *************************** id: 1 select_type: SIMPLE table: categories -- Small table first partitions: NULL type: index possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 20 filtered: 70.00 Extra: Using where; Using index; 40 Using temporary; Using filesort

Example • EXPLAIN without histograms *************************** 2. row *************************** id: 1 select_type: SIMPLE table: goods -- Large table partitions: NULL type: ref possible_keys: cat_id_2 key: cat_id_2 key_len: 5 ref: orig.categories.id rows: 51827 filtered: 11.11 -- Default value Extra: Using where 40 2 rows in set, 1 warning (0.01 sec)

Example • Execution time without histograms mysql> flush status; Query OK, 0 rows affected (0.00 sec) mysql> select goods.* from goods -> join categories on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -> order by goods.cat_id -> limit 10; ab9f9bb7bc4f357712ec34f067eda364 - 10 rows in set (56.47 sec) 40

Example • Engine statistics without histograms mysql> show status like ’Handler%’; +----------------------------+--------+ | Variable_name | Value | +----------------------------+--------+ ... | Handler_read_next | 964718 | | Handler_read_prev | 0 | | Handler_read_rnd | 10 | | Handler_read_rnd_next | 951671 | ... | Handler_write | 951670 | +----------------------------+--------+ 18 rows in set (0.01 sec) 40

Example • EXPLAIN with the histogram mysql> explain select goods.* from goods -> join categories -> on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -> order by goods.cat_id -> limit 10\G 40

Example • EXPLAIN with the histogram *************************** 1. row *************************** id: 1 select_type: SIMPLE table: goods -- Large table first partitions: NULL type: index possible_keys: cat_id_2 key: cat_id_2 key_len: 5 ref: NULL rows: 10 -- Same as we asked filtered: 98.70 -- True numbers Extra: Using where 40

Example • EXPLAIN with the histogram *************************** 2. row *************************** id: 1 select_type: SIMPLE table: categories -- Small table partitions: NULL type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: orig.goods.cat_id rows: 1 filtered: 100.00 Extra: Using index 40 2 rows in set, 1 warning (0.01 sec)

Billion Goods in Few Categories: how Histograms Save a Life? - PowerPoint PPT Presentation

Billion Goods in Few Categories: how Histograms Save a Life? November, 7, 2018 Sveta Smirnova Table of Contents The Case The Cardinality: Two Levels ANALYZE TABLE Limitations Solutions in Percona Server 5.7 Histograms

Matching Histograms to Data 1. Consider the following four histograms. Listed below are five

Chapter 18: The Normal Approximation for Probability Histograms The histograms we saw in Chapter

XL1G: Create Histograms using Excel 2013 Functions V0H 3/31/2017 XL1G: 0H Create Histograms

Digital Image Processing (CS/ECE 545) Lecture 2: Histograms and Point Operations (Part 1) Prof

Basic Concept of GST 12-06-2019 GOODS & SERVICE TAX (GST) GOODS & SERVICE TAX GOODS

Combinatory Categorial Grammar (CCG) Categories Categories = types Primitive categories

CMU 15-896 Fair division 4: Indivisible goods Teacher: Ariel Procaccia Indivisible goods

Why save them? Why save them? Karl Schaerf (Central Acclimatisation Society) & Luke Pearce

Prometheus Histograms Past, Present, and Future Bjrn Beorn Rabenstein PromCon EU,

Jeffreys centroids: A closed-form expression for positive histograms and a guaranteed tight

Effective management of high volume numeric data with histograms Fred Moyer @Circonus

Adaptive Histograms from a Randomized Queue that is Prioritized for Statistically Equivalent

Smooth Local Histograms Filters Micheal Kass and Justin Solomon Yeara Kozlov Saarland University

TN Save a Life TN Save a Life Program Program Op Opio ioid id Cr Cris isis is in in Te

Methods Updating Variables Console Programs int life = 42; life life = 42 life; 21 life =

Tangent categories are locally Cartesian differential categories J.R.B. Cockett Department of

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

RaisingtheCurtainonCulturalDiversity:

CSE 3341: Principles of Programming Languages Recursive Descent Parsing Jeremy Morris 1

COSC 1020 The StringTokenizer Class Yves Lesp erance Allows you to easily extract

P ermutation patterns and the Mbius funtion Vt Jelnek, Eva Jelnk ov and

GATE APIs Track II, Module 6 Second GATE Training Course May 2010 GATE APIs 1 / 62 Using Java

Temporal probability models Chapter 15, Sections 15 Chapter 15, Sections 15 1 Outline

Netw ork Services I nterface Nordic I nfrastructure for Research & Education Nordic

Billion Goods in Few Categories: how Histograms Save a Life? - PowerPoint PPT Presentation

Billion Goods in Few Categories: how Histograms Save a Life? November, 7, 2018 Sveta Smirnova Table of Contents The Case The Cardinality: Two Levels ANALYZE TABLE Limitations Solutions in Percona Server 5.7 Histograms

Matching Histograms to Data 1. Consider the following four histograms. Listed below are five

Chapter 18: The Normal Approximation for Probability Histograms The histograms we saw in Chapter

XL1G: Create Histograms using Excel 2013 Functions V0H 3/31/2017 XL1G: 0H Create Histograms

Digital Image Processing (CS/ECE 545) Lecture 2: Histograms and Point Operations (Part 1) Prof

Basic Concept of GST 12-06-2019 GOODS &amp; SERVICE TAX (GST) GOODS &amp; SERVICE TAX GOODS

Combinatory Categorial Grammar (CCG) Categories Categories = types Primitive categories

CMU 15-896 Fair division 4: Indivisible goods Teacher: Ariel Procaccia Indivisible goods

Why save them? Why save them? Karl Schaerf (Central Acclimatisation Society) &amp; Luke Pearce

Prometheus Histograms Past, Present, and Future Bjrn Beorn Rabenstein PromCon EU,

Jeffreys centroids: A closed-form expression for positive histograms and a guaranteed tight

Effective management of high volume numeric data with histograms Fred Moyer @Circonus

Adaptive Histograms from a Randomized Queue that is Prioritized for Statistically Equivalent

Smooth Local Histograms Filters Micheal Kass and Justin Solomon Yeara Kozlov Saarland University

TN Save a Life TN Save a Life Program Program Op Opio ioid id Cr Cris isis is in in Te

Methods Updating Variables Console Programs int life = 42; life life = 42 life; 21 life =

Tangent categories are locally Cartesian differential categories J.R.B. Cockett Department of

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

RaisingtheCurtainonCulturalDiversity:

CSE 3341: Principles of Programming Languages Recursive Descent Parsing Jeremy Morris 1

COSC 1020 The StringTokenizer Class Yves Lesp erance Allows you to easily extract

P ermutation patterns and the Mbius funtion Vt Jelnek, Eva Jelnk ov and

GATE APIs Track II, Module 6 Second GATE Training Course May 2010 GATE APIs 1 / 62 Using Java

Temporal probability models Chapter 15, Sections 15 Chapter 15, Sections 15 1 Outline

Netw ork Services I nterface Nordic I nfrastructure for Research &amp; Education Nordic

Basic Concept of GST 12-06-2019 GOODS & SERVICE TAX (GST) GOODS & SERVICE TAX GOODS

Why save them? Why save them? Karl Schaerf (Central Acclimatisation Society) & Luke Pearce

Netw ork Services I nterface Nordic I nfrastructure for Research & Education Nordic