Billion Goods in Few Categories: how Histograms Save a Life?
November, 7, 2018
Sveta Smirnova
Billion Goods in Few Categories: how Histograms Save a Life? - - PowerPoint PPT Presentation
Billion Goods in Few Categories: how Histograms Save a Life? November, 7, 2018 Sveta Smirnova Table of Contents The Case The Cardinality: Two Levels ANALYZE TABLE Limitations Solutions in Percona Server 5.7 Histograms
November, 7, 2018
Sveta Smirnova
2
3
4
5
5
6
7
Table names Column names Data
7
7
7
9
Price Date: added, last updated, etc. Characteristics Store ...
9
select * from goods join categories
(categories.id=goods.cat_id) where date_added between ’2018-07-01’ and ’2018-08-01’ and cat_id in (16,11) and price >= 1000 and <=10000 [ and ... ] [ GROUP BY ... [ORDER BY ... [ LIMIT ...]]] ;
10
11
11
11
11
12
12
12
12
13
13
14
14
16
17
17
17
17
17
mysql> show index from test\G ... *************************** 2. row *************************** Table: test Non_unique: 1 Key_name: f1 Seq_in_index: 1 Column_name: f1 Collation: A Cardinality: 64 ...
18
mysql> show index from test\G ... *************************** 2. row *************************** Table: test Non_unique: 1 Key_name: f1 Seq_in_index: 1 Column_name: f1 Collation: A Cardinality: 2 ...
18
mysql> show index from test\G ... *************************** 2. row *************************** Table: test Non_unique: 1 Key_name: f1 Seq_in_index: 1 Column_name: f1 Collation: A Cardinality: 16 ...
18
mysql> show index from test\G ... *************************** 2. row *************************** Table: test Non_unique: 1 Key_name: f1 Seq_in_index: 1 Column_name: f1 Collation: A Cardinality: 2 ...
18
19
19
19
19
20
20
20
21
21
21
23
23
23
23
24
24
25
mysql> select count(*) from goods; +----------+ | count(*) | +----------+ | 80303000 | +----------+ 1 row in set (35.95 sec)
26
mysql> analyze table goods; +------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +------------+---------+----------+----------+ | test.goods | analyze | status | OK | +------------+---------+----------+----------+ 1 row in set (0.32 sec)
26
mysql> alter table goods STATS_SAMPLE_PAGES=5000; Query OK, 0 rows affected (0.04 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> analyze table goods; +------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +------------+---------+----------+----------+ | test.goods | analyze | status | OK | +------------+---------+----------+----------+ 1 row in set (27.13 sec)
26
26
27
27
+------+-------------------------+---------------------------------+ | Time | State | Info | +------+-------------------------+---------------------------------+ | 32 | Writing to net | select * from t where c > ’%0%’ | | 12 | Waiting for table flush | select * from test.t where i=1 | | 12 | Waiting for table flush | select * from test.t where i=2 | | 12 | Waiting for table flush | select * from test.t where i=3 | | 11 | Waiting for table flush | select * from test.t where i=7 | | 10 | Waiting for table flush | select * from test.t where i=11 | ...
27
28
30
30
Concurrent DML allowed
Concurrent DML allowed
Concurrent DML forbidden
Concurrent DML forbidden
31
Concurrent DML allowed
Concurrent DML allowed
Concurrent DML forbidden
Concurrent DML forbidden
31
32
32
32
32
33
33
35
1 2 3 4 5 6 7 8 9 10 200 400 600 800
36
1 2 3 4 5 6 7 8 9 10 0.2 0.4 0.6 0.8 1
37
38
mysql> create table example(f1 int) engine=innodb; mysql> insert into example values(1),(1),(1),(2),(3); mysql> select f1, count(f1) from example group by f1; +------+-----------+ | f1 | count(f1) | +------+-----------+ | 1 | 3 | | 2 | 1 | | 3 | 1 | +------+-----------+ 3 rows in set (0.00 sec)
39
mysql> explain select * from example where f1 > 0\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 Extra: Using where
39
mysql> explain select * from example where f1 > 1\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 Extra: Using where
39
mysql> explain select * from example where f1 > 2\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 Extra: Using where
39
mysql> explain select * from example where f1 > 3\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 33.33 Extra: Using where
39
mysql> analyze table example update histogram on f1 with 3 buckets; +-----------------+-----------+----------+------------------------------+ | Table | Op | Msg_type | Msg_text | +-----------------+-----------+----------+------------------------------+ | hist_ex.example | histogram | status | Histogram statistics created for column ’f1’. | +-----------------+-----------+----------+------------------------------+ 1 row in set (0.03 sec)
39
mysql> select * from information_schema.column_statistics
*************************** 1. row *************************** SCHEMA_NAME: hist_ex TABLE_NAME: example COLUMN_NAME: f1 HISTOGRAM: "buckets": [[1, 0.6], [2, 0.8], [3, 1.0]], "data-type": "int", "null-values": 0.0, "collation-id": 8, "last-updated": "2018-11-07 09:07:19.791470", "sampling-rate": 1.0, "histogram-type": "singleton", "number-of-buckets-specified": 3 1 row in set (0.00 sec)
39
mysql> explain select * from example where f1 > 0\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 100.00 -- all rows Extra: Using where
39
mysql> explain select * from example where f1 > 1\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 40.00 -- 2 rows Extra: Using where
39
mysql> explain select * from example where f1 > 2\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 20.00 -- one row Extra: Using where
39
mysql> explain select * from example where f1 > 3\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: example partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5 filtered: 20.00 - one row Extra: Using where
39
mysql> explain select goods.* from goods
40
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: categories -- Small table first partitions: NULL type: index possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 20 filtered: 70.00 Extra: Using where; Using index; Using temporary; Using filesort
40
*************************** 2. row *************************** id: 1 select_type: SIMPLE table: goods -- Large table partitions: NULL type: ref possible_keys: cat_id_2 key: cat_id_2 key_len: 5 ref: orig.categories.id rows: 51827 filtered: 11.11 -- Default value Extra: Using where 2 rows in set, 1 warning (0.01 sec)
40
mysql> flush status; Query OK, 0 rows affected (0.00 sec) mysql> select goods.* from goods
ab9f9bb7bc4f357712ec34f067eda364
40
mysql> show status like ’Handler%’; +----------------------------+--------+ | Variable_name | Value | +----------------------------+--------+ ... | Handler_read_next | 964718 | | Handler_read_prev | 0 | | Handler_read_rnd | 10 | | Handler_read_rnd_next | 951671 | ... | Handler_write | 951670 | +----------------------------+--------+ 18 rows in set (0.01 sec)
40
mysql> analyze table goods update histogram on date_added; +------------+-----------+----------+------------------------------+ | Table | Op | Msg_type | Msg_text | +------------+-----------+----------+------------------------------+ | orig.goods | histogram | status | Histogram statistics created for column ’date_added’. | +------------+-----------+----------+------------------------------+ 1 row in set (2.01 sec)
40
mysql> explain select goods.* from goods
40
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: goods -- Large table first partitions: NULL type: index possible_keys: cat_id_2 key: cat_id_2 key_len: 5 ref: NULL rows: 10 -- Same as we asked filtered: 98.70 -- True numbers Extra: Using where
40
*************************** 2. row *************************** id: 1 select_type: SIMPLE table: categories -- Small table partitions: NULL type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: orig.goods.cat_id rows: 1 filtered: 100.00 Extra: Using index 2 rows in set, 1 warning (0.01 sec)
40
mysql> flush status; Query OK, 0 rows affected (0.00 sec) mysql> select goods.* from goods
eeb005fae0dd3441c5c380e1d87fee84
40
mysql> show status like ’Handler%’; +----------------------------+-------++----------------------------+-------+ | Variable_name | Value || Variable_name | Value | +----------------------------+-------++----------------------------+-------+ | Handler_commit | 1 || Handler_read_prev | 0 | | Handler_delete | 0 || Handler_read_rnd | 0 | | Handler_discover | 0 || Handler_read_rnd_next | 0 | | Handler_external_lock | 4 || Handler_rollback | 0 | | Handler_mrr_init | 0 || Handler_savepoint | 0 | | Handler_prepare | 0 || Handler_savepoint_rollback | 0 | | Handler_read_first | 1 || Handler_update | 0 | | Handler_read_key | 3 || Handler_write | 0 | | Handler_read_last | 0 |+----------------------------+-------+ | Handler_read_next | 9 |18 rows in set (0.00 sec)
40
41
42
44
45
46
http://www.slideshare.net/SvetaSmirnova https://twitter.com/svetsmirnova https://github.com/svetasmirnova
47
49
49
49
49
50
50
50
51
51
51
for (uint32_t j = 0; j < key->actual_key_parts; j++) { ... assert_always(next_key_part < _rec_per_keys); ulong val = _rec_per_key[next_key_part++]; val = (val * tokudb::sysvars::cardinality_scale_percent) / 100;
51
52
mysql> create table test(
Query OK, 0 rows affected (0.10 sec) mysql> insert into test (f1, ts) values(1, NOW()), (2, NOW()); Query OK, 2 rows affected (0.03 sec) Records: 2 Duplicates: 0 Warnings: 0 ... mysql> insert into test (f1, ts) select f1, NOW() from test; Query OK, 32 rows affected (0.01 sec) Records: 32 Duplicates: 0 Warnings: 0
53
mysql> select count(distinct id), count(distinct f1) from test; +--------------------+--------------------+ | count(distinct id) | count(distinct f1) | +--------------------+--------------------+ | 64 | 2 | +--------------------+--------------------+ 1 row in set (0.01 sec)
53
mysql> show index from test\G *************************** 1. row *************************** Table: test Non_unique: 0 Key_name: PRIMARY Column_name: id Cardinality: 64 ... *************************** 2. row *************************** Table: test Non_unique: 1 Key_name: f1 Column_name: f1 Cardinality: 64 ...
54
$ ../bin/tokuftdump --header --nodata var/mysqld.1/data/test/test_key_f1_145_1_1d_B_1.tokudb ft: layout_version=29 layout_version_original=29 layout_version_read_from_disk=29 build_id=0 build_id_original=0 time_of_creation= 1537709029 Sun Sep 23 16:23:49 2018 time_of_last_modification=1537709100 Sun Sep 23 16:25:00 2018 ... estimated numrows=64 estimated numbytes=640 logical row count=64
54
Thread 44 "mysqld" hit Breakpoint 1, TOKUDB_SHARE::set_cardinality_counts_in_table (this=0x7fd86da54020, table=0x7fd86d90b020) at /home/sveta/src/percona-server/storage/tokudb/ha_tokudb.cc:400 400 if (val == 0 || _rows == 0 || (gdb) p key->name $21 = 0x7fd86d879999 "f1" (gdb) p val $22 = 0
54
54
mysql> analyze table test; +-----------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +-----------+---------+----------+----------+ | test.test | analyze | status | OK | +-----------+---------+----------+----------+ 1 row in set (0.01 sec)
55
mysql> show index from test\G *************************** 1. row *************************** Table: test Non_unique: 0 Key_name: PRIMARY Column_name: id Cardinality: 64 ... *************************** 2. row *************************** Table: test Non_unique: 1 Key_name: f1 Column_name: f1 Cardinality: 2 ...
56
$ ../bin/tokuftdump --header --nodata var/mysqld.1/data/test/test_key_f1_145_1_1d_B_1.tokudb ft: layout_version=29 layout_version_original=29 layout_version_read_from_disk=29 build_id=0 build_id_original=0 time_of_creation= 1537709029 Sun Sep 23 16:23:49 2018 time_of_last_modification=1537709100 Sun Sep 23 16:25:00 2018 ... estimated numrows=64 estimated numbytes=640 logical row count=64
56
Thread 44 "mysqld" hit Breakpoint 1, TOKUDB_SHARE::set_cardinality_counts_in_table (this=0x7fd86da54020, table=0x7fd86d90b020) at /home/sveta/src/percona-server/storage/tokudb/ha_tokudb.cc:400 400 if (val == 0 || _rows == 0 || (gdb) p key->name $26 = 0x7fd86d879999 "f1" (gdb) p val $27 = 32
56
56
mysql> insert into test (f1, ts) select f1, NOW() from test; Query OK, 64 rows affected (0.01 sec) Records: 64 Duplicates: 0 Warnings: 0 mysql> insert into test (f1, ts) select f1, NOW() from test; Query OK, 128 rows affected (0.01 sec) Records: 128 Duplicates: 0 Warnings: 0 mysql> insert into test (f1, ts) select f1, NOW() from test; Query OK, 256 rows affected (0.02 sec) Records: 256 Duplicates: 0 Warnings: 0 mysql> select count(distinct id), count(distinct f1) from test; +--------------------+--------------------+ | count(distinct id) | count(distinct f1) | +--------------------+--------------------+ | 512 | 2 | +--------------------+--------------------+ 1 row in set (0.01 sec)
57
mysql> show index from test\G *************************** 1. row *************************** Table: test Non_unique: 0 Key_name: PRIMARY Column_name: id Cardinality: 512 ... *************************** 2. row *************************** Table: test Non_unique: 1 Key_name: f1 Column_name: f1 Cardinality: 16 ...
58
$ ../bin/tokuftdump --header --nodata var/mysqld.1/data/test/test_key_f1_145_1_1d_B_1.tokudb ft: layout_version=29 layout_version_original=29 layout_version_read_from_disk=29 build_id=0 build_id_original=0 time_of_creation= 1537709029 Sun Sep 23 16:23:49 2018 time_of_last_modification=1537709880 Sun Sep 23 16:38:00 2018 ... estimated numrows=512 estimated numbytes=5120 logical row count=512
58
Thread 44 "mysqld" hit Breakpoint 1, TOKUDB_SHARE::set_cardinality_counts_in_table (this=0x7fd86da54020, table=0x7fd86d90b020) at /home/sveta/src/percona-server/storage/tokudb/ha_tokudb.cc:400 400 if (val == 0 || _rows == 0 || (gdb) p key->name $30 = 0x7fd86d879999 "f1" (gdb) p val $31 = 32
58
58
mysql> show index from test\G *************************** 1. row *************************** Table: test Non_unique: 0 Key_name: PRIMARY Column_name: id Cardinality: 512 ... *************************** 2. row *************************** Table: test Non_unique: 1 Key_name: f1 Column_name: f1 Cardinality: 16 ...
59
Thread 44 "mysqld" hit Breakpoint 1, TOKUDB_SHARE::set_cardinality_counts_in_table (this=0x7fd4e67ea020, table=0x7fd4e6765c20) at /home/sveta/src/percona-server/storage/tokudb/ha_tokudb.cc:400 400 if (val == 0 || _rows == 0 || (gdb) p key->name $3 = 0x7fd4e66d7599 "f1" (gdb) p val $4 = 32
59
59
59
60
60
60
60