with Dictionaries an alternative to InnoDB table compression Yura - PowerPoint PPT Presentation

Percona XtraDB: Compressed Columns with Dictionaries – an alternative to InnoDB table compression Yura Sorokin, Senior Software Engineer at Percona

Existing compression methods Overview

Existing compression methods for MySQL • InnoDB Table Compression (starting from 5.1) https://dev.mysql.com/doc/refman/5.7/en/innodb-table-compression.html • InnoDB Page Compression (starting from 5.7) https://dev.mysql.com/doc/refman/5.7/en/innodb-page-compression.html • Filesystems with transparent compression (BTRFS, ZFS, NTFS, etc.) will not be covered in this talk • COMPRESS() / UNCOMPRESS() MySQL functions https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_compress https://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html#function_uncompress • Something intriguing from Percona Server :) something you might have already noticed in the title of this talk 3

InnoDB Table Compression ROW_FORMAT=COMPRESSED

InnoDB Table Compression SET GLOBAL innodb_file_per_table=1; SET GLOBAL innodb_file_format=Barracuda; CREATE TABLE tbl ( id SERIAL PRIMARY KEY, value BLOB NOT NULL ) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=N; Based on zlib compression library (LZ77 compression algorithm). 5

InnoDB Table Compression • Uncompressed “modification log” records changes made to the page (to avoid unnecessary uncompression and Compressed data recompression) • When “modification log” is full, recompression is needed. Padding • Recompression may fail as compressed data no longer fit and in this case B-tree Modification log nodes are split. • To avoid frequent compression failures (splitting), there is reserved empty space (padding) 6

InnoDB Table Compression • BLOB, VARCHAR, and TEXT columns that are not part of the primary key may be stored on separately allocated overflow Compressed data pages. • All data written to overflow pages is compressed “as is” (MySQL applies the Padding zlib compression algorithm to the entire data item) Modification log 7

InnoDB Table Compression Tuning: • KEY_BLOCK_SIZE=(1|2|4|8|16) combined with innodb_page_size (KEY_BLOCK_SIZE must be <= innodb_page_size) (Table compression does not support 32k and 64k innodb_page_size) • innodb_compression_level = (0..9) https://dev.mysql.com/doc/refman/5.7/en/innodb- parameters.html#sysvar_innodb_compression_level • innodb_compression_pad_pct_max https://dev.mysql.com/doc/refman/5.7/en/innodb- parameters.html#sysvar_innodb_compression_pad_pct_max • innodb_compression_failure_threshold_pct https://dev.mysql.com/doc/refman/5.7/en/innodb- parameters.html#sysvar_innodb_compression_failure_threshold_pct 8

InnoDB Table Compression “…In general, compression works best on tables that include a reasonable number of character string columns and where the data is read far more often than it is written …” https://dev.mysql.com/doc/refman/5.7/en/innodb-compression-tuning.html 9

Real world data JSON documents

Real world data JSON data sets for JSON Studio tutorials http://jsonstudio.com/resources/ • JSON data set of projects funded by the World Bank (436K compressed). • JSON data set for US zip (postal) codes (656K compressed). • JSON data set of listed stocks (1.6M compressed). • JSON data set for Enron emails (3.9M compressed). • JSON data set of startup company information (14.8M compressed). ‘ companies.json ’ repeated 8 times 11

Real world data { "_id": { "$oid ": "52cdef7c4bab8bd675297d8b“ }, "name": "AdventNet", "permalink": "abc3", "crunchbase_url": "http://www.crunchbase.com/company/adventnet", "homepage_url": "http://adventnet.com", "blog_url": "", "blog_feed_url": "", "twitter_username": "manageengine", "category_code": "enterprise", "number_of_employees": 600, "founded_year": 1996, "deadpooled_year": 2, "tag_list": "", "alias_list": "Zoho ManageEngine ", "email_address": "pr@adventnet.com", "phone_number": "925-924-9500", "description": "Server Management Software", "created_at": { "$date": 1180121062000 }, "updated_at": "Wed Oct 31 18:26:09 UTC 2012", "overview": "AdventNet is now <a href=\"/company/zoho-manageengine\" title=\"Zoho ManageEngine\" rel=\"nofollow\">Zoho ManageEngine</a>.\n\nFounded in 1996, AdventNet has served a diverse range of enterprise IT, networking and telecom customers.\n\nAdventNet supplies server and network management software.", … 12

Real world data Loading data from the file SET max_heap_table_size = 1024 * 1048576; CREATE TABLE companies_src ( id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, value LONGTEXT NOT NULL, PRIMARY KEY(id) ) ENGINE=Memory; LOAD DATA INFILE ‘ companies.json' INTO TABLE companies_src FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '' (value); 13

Real world data Parsing JSON documents CREATE TABLE companies_src_parsed( id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, value JSON NOT NULL, PRIMARY KEY(id) ) ENGINE=Memory AS SELECT * FROM companies_src; 14

Real world data Statistics: SELECT COUNT(*) FROM companies_src_parsed; 150408 SELECT COUNT(*) FROM companies_src_parsed WHERE JSON_VALID(value) <> 1; 0 SELECT MIN(LENGTH(value)), ROUND(AVG(LENGTH(value))), MAX(LENGTH(value)) FROM companies_src_parsed; 1027 4160 278016 15

Real world data More statistics: SELECT JSON_KEYS(value) AS keys, JSON_LENGTH(keys) FROM companies_src_parsed ORDER BY id LIMIT 1; ["_id", "name", "image", "offices", "blog_url", "overview", "partners", "products", "tag_list", "permalink", "alias_list", "created_at", "milestones", "updated_at", "acquisition", "description", "founded_day", "investments", "screenshots", "acquisitions", "competitions", "founded_year", "homepage_url", "phone_number", "video_embeds", "blog_feed_url", "category_code", "email_address", "founded_month", "providerships", "relationships", "crunchbase_url", "external_links", "funding_rounds", "deadpooled_year", "twitter_username", "total_money_raised", "number_of_employees"] 38 SELECT MIN(JSON_DEPTH(value)), MAX(JSON_DEPTH(value)) FROM companies_src_parsed; 3 7 16

InnoDB Table Compression Experiments Compression ratio and insertion time

InnoDB Table Compression Experiments Copying data to a new table: SET GLOBAL innodb_compression_level = LEVEL; CREATE TABLE companies_compressed( id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, value LONGTEXT NOT NULL, PRIMARY KEY(id) ) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=KBS; INSERT INTO companies_compressed SELECT * FROM companies_src; LEVEL = [0, 1, 6, 9] KBS = [1, 2, 4, 8, 16] 18

InnoDB Table Compression Experiments Calculating compressed data size: SELECT data_length FROM information_schema.tables WHERE table_schema = DATABASE() AND table_name = 'companies_compressed ’; 19

InnoDB Table Compression Experiments Identity: Data size, MB and query time, sec data_length, MB query_time, sec 1 000 40 898 36 880 900 35 33 800 30 700 25 600 500 20 400 15 300 10 200 5 100 0 0 LONGTEXT JSON LONGTEXT JSON data_length, MB query_time, sec 20

InnoDB Table Compression Experiments Identity: Data size, % and query time, % data_length_relative, % query_time_relative, % 120,00% 120,00% 109,09% 100,00% 100,00% 98,00% 100,00% 100,00% 80,00% 80,00% 60,00% 60,00% 40,00% 40,00% 20,00% 20,00% 0,00% 0,00% LONGTEXT JSON LONGTEXT JSON data_length_relative, % query_time_relative, % 21

InnoDB Table Compression Experiments Data size, MB 1 000 938 917 898 898 898 898 880 880 880 880 873 900 854 792 791 790 800 749 700 600 500 401 400 399 379 400 364 363 359 341 341 338 317 316 300 200 100 0 LONGTEXT JSON KBS1 KBS2 KBS4 KBS8 KBS16 data_length(LVL0), MB data_length(LVL1), MB data_length(LVL6), MB data_length(LVL9), MB 22

InnoDB Table Compression Experiments Query time, sec 70 64 60 52 48 50 46 45 45 43 43 43 38 40 37 37 36 36 36 36 36 34 33 33 33 33 28 30 27 24 23 21 20 20 10 0 LONGTEXT JSON KBS1 KBS2 KBS4 KBS8 KBS16 query_time(LVL0), sec query_time(LVL1), sec query_time(LVL6), sec query_time(LVL9), sec 23

InnoDB Table Compression Experiments Data size, % 120,00% 104,43% 102,06% 100,00% 100,00% 100,00% 100,00% 98,00% 98,00% 98,00% 98,00% 97,20% 95,10% 100,00% 88,20% 88,08% 87,97% 83,41% 80,00% 60,00% 44,65% 44,49% 44,43% 42,23% 40,48% 40,45% 39,97% 37,99% 37,95% 37,61% 35,27% 35,19% 40,00% 20,00% 0,00% LONGTEXT JSON KBS1 KBS2 KBS4 KBS8 KBS16 data_length_relative(LVL0), % data_length_relative(LVL1), % data_length_relative(LVL6), % data_length_relative(LVL9), % 24

with Dictionaries an alternative to InnoDB table compression Yura - PowerPoint PPT Presentation

Percona XtraDB: Compressed Columns with Dictionaries an alternative to InnoDB table compression Yura Sorokin, Senior Software Engineer at Percona Existing compression methods Overview Existing compression methods for MySQL InnoDB Table

61A Lecture 13 {'Dem': 0} Wednesday, September 28 2 Limitations on Dictionaries Implementing

Computational Dictionaries Computational Dictionaries & Terminology & Terminology

Py Python Dictionaries Python dictionaries are the only built-in mapping type: unordered

Dictionaries A Key-Value Relationship C-START Python PD Workshop C-START Python PD Workshop

HTTP/2 Compression Dictionaries Vlad Krasnov In a nutshell Allow cross-stream compression in

Dictionaries Dictionaries and and the the Organization Organization of of Knowledge

Lecture 22: Applications of Dictionaries; Plotting with Matplotlib Practice with Dictionaries

STATS 507 Data Analysis in Python Lecture 4: Dictionaries and Tuples Two more fundamental

{} Introduction to Computer Programming Data Structures CSCI-UA 2 Dictionaries {key: value}

DLMF Content Dictionaries Special Function Catalog The Next Iteration DLMF Content

Ordered Dictionaries Ordered Dictionaries Keys are ordered Perform usual dictionary

Langdale Dock Langdale Dock Langdale Dock Langdale Dock Alternative Approval Alternative

ALF-CEMI ND Supporting the use of alternative fuels Alternative Fuels and Alternative Raw

Coding dictionaries information: How to manage? How to use? Eric Sorel Herve Guimard PhUSE 2009

The Federal Circuit month at Month at a Glance COURT CONSTRUES ROLE OF DICTIONARIES,

Dictionaries and Hash Tables 0 1 025-612-0001 2 981-101-0002 3 4 451-229-0004

Using Transparent Compression to Improve SSD-based I/O Caches Thanos Makatos, Yannis Klonatos,

Animation Sequence Compression Yang Liu Department of Computer Science March 2009 . . . . .

Ho How to to Co Compr mpress ss Hid Hidden en Markov Source ces Preetum Nakkiran Harvard

Evaluation of a High Performance Code Compression Method Charles Lefurgy, Eva Piccininni, and

Most of the slides are borrowed from the authors original presentation. original

Compressing Coldbox Data Ivan K. Furic, Remington Gerras University of Florida ProtoDUNE-SP TDR:

Efficient Lightweight Compression Alongside Fast Scans Orestis Polychroniou Kenneth A. Ross

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

with Dictionaries an alternative to InnoDB table compression Yura - PowerPoint PPT Presentation

Percona XtraDB: Compressed Columns with Dictionaries an alternative to InnoDB table compression Yura Sorokin, Senior Software Engineer at Percona Existing compression methods Overview Existing compression methods for MySQL InnoDB Table

61A Lecture 13 {'Dem': 0} Wednesday, September 28 2 Limitations on Dictionaries Implementing

Computational Dictionaries Computational Dictionaries &amp; Terminology &amp; Terminology

Py Python Dictionaries Python dictionaries are the only built-in mapping type: unordered

Dictionaries A Key-Value Relationship C-START Python PD Workshop C-START Python PD Workshop

HTTP/2 Compression Dictionaries Vlad Krasnov In a nutshell Allow cross-stream compression in

Dictionaries Dictionaries and and the the Organization Organization of of Knowledge

Lecture 22: Applications of Dictionaries; Plotting with Matplotlib Practice with Dictionaries

STATS 507 Data Analysis in Python Lecture 4: Dictionaries and Tuples Two more fundamental

{} Introduction to Computer Programming Data Structures CSCI-UA 2 Dictionaries {key: value}

DLMF Content Dictionaries Special Function Catalog The Next Iteration DLMF Content

Ordered Dictionaries Ordered Dictionaries Keys are ordered Perform usual dictionary

Langdale Dock Langdale Dock Langdale Dock Langdale Dock Alternative Approval Alternative

ALF-CEMI ND Supporting the use of alternative fuels Alternative Fuels and Alternative Raw

Coding dictionaries information: How to manage? How to use? Eric Sorel Herve Guimard PhUSE 2009

The Federal Circuit month at Month at a Glance COURT CONSTRUES ROLE OF DICTIONARIES,

Dictionaries and Hash Tables 0 1 025-612-0001 2 981-101-0002 3 4 451-229-0004

Using Transparent Compression to Improve SSD-based I/O Caches Thanos Makatos, Yannis Klonatos,

Animation Sequence Compression Yang Liu Department of Computer Science March 2009 . . . . .

Ho How to to Co Compr mpress ss Hid Hidden en Markov Source ces Preetum Nakkiran Harvard

Evaluation of a High Performance Code Compression Method Charles Lefurgy, Eva Piccininni, and

Most of the slides are borrowed from the authors original presentation. original

Compressing Coldbox Data Ivan K. Furic, Remington Gerras University of Florida ProtoDUNE-SP TDR:

Efficient Lightweight Compression Alongside Fast Scans Orestis Polychroniou Kenneth A. Ross

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Computational Dictionaries Computational Dictionaries & Terminology & Terminology