Fast Text Compression with Neural Networks Matthew Mahoney Florida - PDF document

Fast Text Compression with Neural Networks Matthew Mahoney Florida Institute of Technology http://cs.fit.edu/~mmahoney/compression/ • How text compression works • Neural implementations have been too slow • How to make them faster

How Text Compression Works Common character sequences can have shorter codes Morse Code e = . z = --.. Shorter code Longer code e z dog dgo of the the of roses are red roses are green Text compression is an AI problem

Types of compression From fast but poor... to slow but good Limpel-Ziv ( compress, zip, gzip, gif ) the cat in the hat the cat in h Context Sorting ( Burrows-Wheeler (szip) ) the ca|t ---> 2t 1a 2_ 2e ( run-length code ) the ha|t the c|a in the|_ at the|_ in th|e hat th|e Predictive Arithmetic ( PPMZ (boa, rkive) and neural network ) P( a ) P( b ) x = the ca P(x ≤ the cat) Predictor Arithmetic Encoder P( z ) t

Arithmetic Encoding 0 1 A |B| C | D | E |F|G| H |I |J|K| L | M | N | O |P |Q|R | S | T | U|V|W|X|Y|Z .78 .83 TA |||| TE || TH | TI ||||| TO || TR | TU | TW|TY .795 .798 .803 .81 THA |||| THE |||| THI ||||| THO || THR || THU ||| P("THE") = 0.005 Compress("THE") = .8 Binary code for x is within 1 bit of log 2 1/P( x ) (Theoretical limit, Shannon, 1949) Compression depends entirely on accuracy of P.

Schmidhuber and Heil (1994) Neural Network Predictor A A A A A A B B B B B B C C C C C C Z Z Z Z Z Z Last 5 Next characters Character • 80 character alphabet • 3 layer network • 400 input units (last 5 characters) • 430 hidden units • 80 output units • Trained off line in 25 passes by back propagation • Training time: 3 days on 600KB of text (HP-700) • 18% better compression than gzip -9

Fast Neural Network Predictor X i N 01 E|L|E|P|H|A|N| 01 AN 01 y P(1) HAN 01 22-bit hash PHAN 01 function W i , N i (0), N i (1) EPHAN 01 • Predicts one bit at a time • 2 layer network • 2 22 (about 4 million) input units • One output unit • Hash function selects 5 or 6 inputs = 1, all others 0 • Trained on line using variable learning rate • Compresses 600KB in 15 seconds (475 MHz P6-II) • 42-47% better compression than gzip -9

Prediction P(1) = g( Σ i w i x i ) Weighted sum of inputs g(x) = 1/(1 + e − x ) Squashing function Training N i (y) ← N i (y) + x i Count 0 or 1 in context i E = y − P(1) Output error w i ← w i + ( η S + η L / σ 2 i )x i E Adjust weight to reduce error σ 2 i = (N i (0) + N i (1) + 2d)/(N i (0) + d)(N i (1) + d) Variance of data in context i d = 0.5 Initial count η S = 0 to 0.2 Short term learning rate η L = 0.2 to 0.5 Long term learning rate

Compression Results compress compress zip zip gzip -9 gzip -9 szip -b41 -o0 szip -b41 -o0 boa -m15 boa -m15 rkive -mt3 rkive -mt3 Book1 Alice p5 p5 p6 p6 p12 p12 0 0.5 1 1.5 2 2.5 3 3.5 Compression in bits per character • η S and η L tuned on Alice in Wonderland • Tested on book1 (Far from the Madding Crowd) • P5 - 256K neurons, contexts of 1-4 characters • P6 - 4M neurons, contexts of 1-5 characters • P12 - 4M neurons, contexts of 1-4 characters and 1-2 words (unpublished)

Compression Time compress compress zip zip Decompress gzip -9 gzip -9 Compress szip -b41 -o0 szip -b41 -o0 boa -m15 boa -m15 rkive -mt3 rkive -mt3 p5 p5 p6 p6 p12 p12 0 20 40 60 80 100 120 140 Seconds to compress and decompress Alice (152KB file on 100 MHz 486)

Summary Compression within 2% of best known, at similar speeds 50% better (but 4x-50x slower) than compress, zip, gzip Fast because • Fixed representation - only output layer is trained (5x faster) • One pass training by variable learning rate (25x faster) • Bit-level prediction (16x faster) • Sparse input activation (5-6 of 4 million, 80x faster) Implementation available at http://cs.fit.edu/~mmahoney/compression/

Fast Text Compression with Neural Networks Matthew Mahoney Florida - PDF document

Fast Text Compression with Neural Networks Matthew Mahoney Florida Institute of Technology http://cs.fit.edu/~mmahoney/compression/ How text compression works Neural implementations have been too slow How to make them faster How

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Lossless compression in lossy compression systems Almost every lossy compression system

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

A Little Confusing Without [a block digest], one must query the offset digest with all

Efficient Lightweight Compression Alongside Fast Scans Orestis Polychroniou Kenneth A. Ross

Compressing Coldbox Data Ivan K. Furic, Remington Gerras University of Florida ProtoDUNE-SP TDR:

Practical Near-Collisions and Collisions on Reduced-Round ECHO-256 Compression Function Jrmy

On Variable Dependencies and Compressed Pattern Databases Malte Helmert 1 Nathan Sturtevant 2

LFZip: Lossy compression of multivariate time series data via improved prediction Shubham

Data Compression Lossless And Lossy Compression compressedData = compress(originalData)

Raimund Seidel