billion goods in few categories how histograms save a life
play

Billion Goods in Few Categories: how Histograms Save a Life? - PowerPoint PPT Presentation

Billion Goods in Few Categories: how Histograms Save a Life? November, 7, 2018 Sveta Smirnova Table of Contents The Case The Cardinality: Two Levels ANALYZE TABLE Limitations Solutions in Percona Server 5.7 Histograms


  1. Example • Number of pages in the table: 20,000 • STATS SAMPLE PAGES : 20 (default) • Unique values in the secondary index: • In sample pages: 10 • In the table: 11 • Cardinality: 20,000 * 10 / 20 = 10,000 24

  2. Example 2 • Number of pages in the table: 20,000 • STATS SAMPLE PAGES : 5,000 • Unique values in the secondary index: • In sample pages: 10 • In the table: 11 • Cardinality: 20,000 * 10 / 5,000 = 40 25

  3. Use Larger STATS SAMPLE PAGES? • Time consuming mysql> select count(*) from goods; +----------+ | count(*) | +----------+ | 80303000 | +----------+ 1 row in set (35.95 sec) 26

  4. Use Larger STATS SAMPLE PAGES? • Time consuming • With default STATS SAMPLE PAGES mysql> analyze table goods; +------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +------------+---------+----------+----------+ | test.goods | analyze | status | OK | +------------+---------+----------+----------+ 1 row in set (0.32 sec) 26

  5. Use Larger STATS SAMPLE PAGES? • Time consuming • With bigger number mysql> alter table goods STATS_SAMPLE_PAGES=5000; Query OK, 0 rows affected (0.04 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> analyze table goods; +------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +------------+---------+----------+----------+ | test.goods | analyze | status | OK | +------------+---------+----------+----------+ 1 row in set (27.13 sec) 26

  6. Use Larger STATS SAMPLE PAGES? • Time consuming • With bigger number • 27.13/0.32 = 85 times slower! 26

  7. Does ANALYZE TABLE Block Reads? User Manual claims it does not During the analysis, the table is locked with a read lock for InnoDB and MyISAM. 27

  8. Does ANALYZE TABLE Block Reads? User Manual claims it does not • But! 27

  9. Does ANALYZE TABLE Block Reads? User Manual claims it does not Sometimes it blocks all subsequent queries +------+-------------------------+---------------------------------+ | Time | State | Info | +------+-------------------------+---------------------------------+ | 32 | Writing to net | select * from t where c > ’%0%’ | | 12 | Waiting for table flush | select * from test.t where i=1 | | 12 | Waiting for table flush | select * from test.t where i=2 | | 12 | Waiting for table flush | select * from test.t where i=3 | | 11 | Waiting for table flush | select * from test.t where i=7 | | 10 | Waiting for table flush | select * from test.t where i=11 | ... 27

  10. Simply Increasing STATS SAMPLE PAGES Is not a solution 28

  11. Solutions in Percona Server 5.7

  12. Blocking ANALYZE TABLE Considered as a bug • jira.percona.com/browse/PS-2503 • lp:1704195 • bugs.mysql.com/87065 30

  13. Blocking ANALYZE TABLE Considered as a bug • Fixed in Percona Server 5.6.38-83.0/5.7.20-18 30

  14. Non-Blocking ANALYZE TABLE • Before the fix • Opens table statistics Concurrent DML allowed • Updates table statistics Concurrent DML allowed • Update finished • Invalidates entry in table definition cache Concurrent DML forbidden • Invalidates query cache Concurrent DML forbidden 31

  15. Non-Blocking ANALYZE TABLE • After the fix • Opens table statistics Concurrent DML allowed • Updates table statistics Concurrent DML allowed • Update finished • Invalidates entry in table definition cache Concurrent DML forbidden • Invalidates query cache Concurrent DML forbidden 31

  16. Without the Fix: Manual Update • InnoDB stores its statistics mysql.innodb index stats 32

  17. Without the Fix: Manual Update • InnoDB stores its statistics mysql.innodb index stats • This table is writable 32

  18. Without the Fix: Manual Update • InnoDB stores its statistics mysql.innodb index stats • This table is writable • Updating it with following FLUSH TABLE allows to fake any statistics 32

  19. Without the Fix: Manual Update • InnoDB stores its statistics mysql.innodb index stats • This table is writable • Updating it with following FLUSH TABLE allows to fake any statistics • Hack • Not documented • Not recommended • Can stop working any time 32

  20. 5.7: Resume • With Percona fix for blocking ANALYZE TABLE we can use large value for STATS SAMPLE PAGES • Does not help when • Index cannot be used • Data distribution in the index vary a lot 33

  21. 5.7: Resume • With Percona fix for blocking ANALYZE TABLE we can use large value for STATS SAMPLE PAGES • Does not help when • Index cannot be used • Data distribution in the index vary a lot • Manual update allows to fix statistics • Not recommended • Can stop working any time 33

  22. Histograms

  23. What are the Histograms? • Optimizer Column Statistics • Engine-independent • No fancy calculations • Knows about data distribution 35

  24. Number of Values in Each Bucket 800 600 400 200 0 36 1 2 3 4 5 6 7 8 9 10

  25. Data in the Histogram 1 0 . 8 0 . 6 0 . 4 0 . 2 0 37 1 2 3 4 5 6 7 8 9 10

  26. How Histograms are Helpful? • Accurate statistics • Truly persistent • No extra calculations on access • Optimizer knows about data distribution • Without touching the table! 38

  27. Filtered Rows • Example data mysql> create table example(f1 int) engine=innodb; mysql> insert into example values(1),(1),(1),(2),(3); mysql> select f1, count(f1) from example group by f1; +------+-----------+ | f1 | count(f1) | +------+-----------+ | 1 | 3 | | 2 | 1 | | 3 | 1 | +------+-----------+ 3 rows in set (0.00 sec) 39

  28. Filtered Rows • Without a histogram mysql> explain select * from example where f1 > 0\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 39 Extra: Using where

  29. Filtered Rows • Without a histogram mysql> explain select * from example where f1 > 1\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 39 Extra: Using where

  30. Filtered Rows • Without a histogram mysql> explain select * from example where f1 > 2\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 39 Extra: Using where

  31. Filtered Rows • Without a histogram mysql> explain select * from example where f1 > 3\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 39 Extra: Using where

  32. Filtered Rows • With the histogram mysql> analyze table example update histogram on f1 with 3 buckets; +-----------------+-----------+----------+------------------------------+ | Table | Op | Msg_type | Msg_text | +-----------------+-----------+----------+------------------------------+ | hist_ex.example | histogram | status | Histogram statistics created for column ’f1’. | +-----------------+-----------+----------+------------------------------+ 1 row in set (0.03 sec) 39

  33. Filtered Rows • With the histogram mysql> select * from information_schema.column_statistics -> where table_name=’example’\G *************************** 1. row *************************** SCHEMA_NAME: hist_ex TABLE_NAME: example COLUMN_NAME: f1 HISTOGRAM: "buckets": [[1, 0.6], [2, 0.8], [3, 1.0]], "data-type": "int", "null-values": 0.0, "collation-id": 8, "last-updated": "2018-11-07 09:07:19.791470", "sampling-rate": 1.0, "histogram-type": "singleton", "number-of-buckets-specified": 3 1 row in set (0.00 sec) 39

  34. Filtered Rows • With the histogram mysql> explain select * from example where f1 > 0\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 100.00 -- all rows 39 Extra: Using where

  35. Filtered Rows • With the histogram mysql> explain select * from example where f1 > 1\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 40.00 -- 2 rows 39 Extra: Using where

  36. Filtered Rows • With the histogram mysql> explain select * from example where f1 > 2\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 20.00 -- one row 39 Extra: Using where

  37. Filtered Rows • With the histogram mysql> explain select * from example where f1 > 3\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 20.00 - one row 39 Extra: Using where

  38. Example • EXPLAIN without histograms mysql> explain select goods.* from goods -> join categories on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -- Large range -> order by goods.cat_id -> limit 10\G -- We ask for 10 rows only! 40

  39. Example • EXPLAIN without histograms *************************** 1. row *************************** id: 1 select_type: SIMPLE table: categories -- Small table first partitions: NULL type: index possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 20 filtered: 70.00 Extra: Using where; Using index; 40 Using temporary; Using filesort

  40. Example • EXPLAIN without histograms *************************** 2. row *************************** id: 1 select_type: SIMPLE table: goods -- Large table partitions: NULL type: ref possible_keys: cat_id_2 key: cat_id_2 key_len: 5 ref: orig.categories.id rows: 51827 filtered: 11.11 -- Default value Extra: Using where 40 2 rows in set, 1 warning (0.01 sec)

  41. Example • Execution time without histograms mysql> flush status; Query OK, 0 rows affected (0.00 sec) mysql> select goods.* from goods -> join categories on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -> order by goods.cat_id -> limit 10; ab9f9bb7bc4f357712ec34f067eda364 - 10 rows in set (56.47 sec) 40

  42. Example • Engine statistics without histograms mysql> show status like ’Handler%’; +----------------------------+--------+ | Variable_name | Value | +----------------------------+--------+ ... | Handler_read_next | 964718 | | Handler_read_prev | 0 | | Handler_read_rnd | 10 | | Handler_read_rnd_next | 951671 | ... | Handler_write | 951670 | +----------------------------+--------+ 18 rows in set (0.01 sec) 40

  43. Example • Now lets add the histogram mysql> analyze table goods update histogram on date_added; +------------+-----------+----------+------------------------------+ | Table | Op | Msg_type | Msg_text | +------------+-----------+----------+------------------------------+ | orig.goods | histogram | status | Histogram statistics created for column ’date_added’. | +------------+-----------+----------+------------------------------+ 1 row in set (2.01 sec) 40

  44. Example • EXPLAIN with the histogram mysql> explain select goods.* from goods -> join categories -> on (categories.id=goods.cat_id) -> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17) -> and -> date_added between ’2000-01-01’ and ’2001-01-01’ -> order by goods.cat_id -> limit 10\G 40

  45. Example • EXPLAIN with the histogram *************************** 1. row *************************** id: 1 select_type: SIMPLE table: goods -- Large table first partitions: NULL type: index possible_keys: cat_id_2 key: cat_id_2 key_len: 5 ref: NULL rows: 10 -- Same as we asked filtered: 98.70 -- True numbers Extra: Using where 40

  46. Example • EXPLAIN with the histogram *************************** 2. row *************************** id: 1 select_type: SIMPLE table: categories -- Small table partitions: NULL type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: orig.goods.cat_id rows: 1 filtered: 100.00 Extra: Using index 40 2 rows in set, 1 warning (0.01 sec)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend