Compression and Estimation Over Large Alphabets Alon Orlitsky - PDF document

Compression and Estimation Over Large Alphabets Alon Orlitsky Narayana P. Santhanam Krishnamurthy Viswanathan Junan Zhang UCSD 1

Univeral Compression [Sh 48] [Fi 66, Da 73] Setup: A — alphabet p — collection of p.d.’s over A n random sequence ∼ p ∈ P (unknown) def L q = expected # bits of encoder q def Redundancy: R q = max p L q − H ( p ) Question: L def = min q L q = ? if R/n → 0, Universally Compressible Answer: L ≈ H ( p ) iid: R ≈ 1 2 ( |A| − 1)log n Problem: p not known [Kief. 78]: As |A| → ∞ , R/n → ∞ Solution: Universal compression 2

Univeral Compression [Sh 48] [Fi 66, Da 73] Setup: A — alphabet P — collection of p.d.’s over A n random sequence ∼ p ∈ P (unknown) def L q = expected # bits of encoder q def Redundancy: R q = max p L q − H ( p ) Question: R def = min q R q =? if R/n → 0, Universally Compressible Answer: iid, markov, cxt tree, stnr ergd — UC iid: R ≈ 1 2 ( |A| − 1)log n Problem: |A| ≈ or > n (text, images) [Kief. 78]: As |A| → ∞ , R/n → ∞ Solution: Several 3

Solutions Theoretical: Constrain distributions Monotone: [Els 75], [GPM 94], [FSW 02] Bounded moments: [UK 02,03] Others: [YJ 00], [HY 03] Concern: May not apply Practical: Convert to bits Lempel Ziv Context-tree weighting Concern: May lose context Change the question 4

Why ∞ ? Alphabet: A def = N Collection: P def = { p k : k ∈ N } p k : constant- k distribution  1 if x = k . . . k p k ( x ) def  = 0 otherwise  If k is known: H ( p k ) = 0 0 bits Universally: must describe k ∞ bits (for worst k ) R = ∞ Conclusion: Describe elts & pattern separately 5

Patterns Replace each symbol by its order of appearance Sequence: a b r a c a d a b r a Pattern: 1 2 3 1 4 1 5 1 2 3 1 Convey pattern: 12314151231 1 2 3 4 5 dictionary: a b r c d Compress pattern and dictionary separately Related application (PPM): [˚ ASS 97] 6

Main result Patterns of iid distributions over any alphabet (large, infinite, uncountably infinite, unknown) can be universally compressed (sequentially and efficiently). Details � √ n � � 2 Block: R ≤ π 3 log e � √ n � 4 π Sequential (super-poly): R ≤ √ 3(2 − 2) Sequential (linear): R ≤ 10 n 2 / 3 In all: R/n → 0 7

Additional results R m : redundancy for m -symbol patterns Identical technique For m ≤ o ( n 1 / 3 ), � 1 �� n − 1 � R m ≤ log m − 1 m ! Similar average-problem when alphabet assumed to contain no unseen symbols consequently considered by [Sh 03] 8

Proof technique Compression = probability estimation Estimate distributions over large alphabets Considered by I.J. Good and A. Turing Good-Turing estimator is good, not optimal View as set partitioning Construct optimal estimators Use results by Hardy and Ramanujan 9

Probability estimation 10

Safari preparation Observe sample of animals 3 giraffes, 1 hippopotamus, 2 elephants Probability estimation? Species Prob giraffe 3/6 hippo 1/6 elephant 2/6 Problem? Lions! 11

Laplace estimator Add one, including to new 3+1 giraffes, 1+1 hippopotamus, 2+1 elephants, 0+1 new Species Prob giraffe 4/10 hippo 2/10 elephant 3/10 new 1/10 Many add-constant variations 12

Krichevsky-Trofimov estimator Add half Achieves Jeffreys’ prior Best for fixed alphabet, length → ∞ Are add-constant estimators good? 13

DNA n samples ( n large) All different Probability estimation? For each observed: 1 + 1 = 2 For new: 0 + 1 = 1 Sample Probability observed 2 / (2 n + 1) new 1 / (2 n + 1) Problem? P (new) = 1 / (2 n + 1) ≈ 0 P (observed) = 2 n/ (2 n + 1) ≈ 1 Opposite more accurate 14

Good-Turing problem Enigma cipher Captured German book of keys Had previous decryptions Looked for distribution of key pages Similar as # pages large compared to data 15

Good-Turing estimator Surprising and complicated Works well for infrequent elements Used in a variety of applications Suboptimal for frequent elements Modifications: empirical for frequent elements Several explanations Some evaluations 16

Evaluation Observe sequence: x 1 , x 2 , x 3 , . . . Successively estimate prob given previous: q ( x i | x i − 1 ) 1 Assign probability to whole sequence: n q ( x i | x i − 1 q ( x n � 1 ) = ) 1 i =1 Compare to highest possible p ( x n 1 ) Cf. compression, online algorithms/learning Precise definitions require patterns 17

Pattern of a sequence Replace symbol by order of appearance g,h,g,e,e,g giraffe — 1, hippo — 2, elephant — 3 1,2,1,3,3,1 Can enumerate, assign probabilities 18

Sequence = pattern Example: q +1 Sequence: ghge → NNgN q +1 ( ghge ) = q +1 ( N ) · q +1 ( N | g ) · q +1 ( g | gh ) · q +1 ( N | ghg ) = 1 1 · 1 3 · 2 5 · 1 6 = 1 45 Pattern: 1213 q +1 (1213) = q +1 (1) · q +1 (2 | 1) · q +1 (1 | 12) · q +1 (3 | 121) = 1 1 · 1 3 · 2 5 · 1 6 = 1 45 19

Patterns Strings of positive ingeters First appearance of i > 2 follows that of i − 1 Patterns: 1, 11, 12, 121, 122, 123 Not patterns: 2, 21, 132 Ψ n — length- n patterns 20

Pattern probability A — alphabet p — distribution over A ψ — pattern in Ψ n p Ψ ( ψ ) def = p { x ∈ A n with pattern ψ } Example A = { a, b } p ( a ) = α , p ( b ) = α p Ψ (11) = p { aa, bb } = α 2 + α 2 p Ψ (12) = p { ab, ba } = 2 αα 21

Maximum pattern probability Highest probability of pattern p Ψ ( ψ ) def p Ψ ( ψ ) ˆ = max p Examples p Ψ (11) = 1 ˆ [constant distributions] p Ψ (12) = 1 ˆ [continuous distributions] In general, difficult p Ψ (112) = 1 / 4 ˆ [ p ( a ) = p ( b ) = 1 / 2] p Ψ (1123) = 12 / 125 ˆ [ p ( a ) = ... = p ( e ) = 1 / 5] 22

General results Obtained several results m : # symbols appearing µ i : # times i appears µ min , µ max : smallest, largest µ i Example: 111223, µ 1 = 3, µ min = 1, µ max = 3 ˆ k : # symbols in maximizing distribution m − 1 Upper bound: ˆ k ≤ m + 2 µ min − 2 � 2 − µi − 2 − µ max Lower bound: ˆ k ≥ m − 1 + 2 µ max − 2 23

Attenuation Attenuation of q for ψ n 1 p Ψ ( ψ n = ˆ 1 ) 1 ) def R ( q, ψ n q ( ψ n 1 ) Worst-case sequence attenuation of q ( n symb) R n ( q ) def R ( q, ψ n = max 1 ) ψ n 1 Worst-case attenuation of q R ∗ ( q ) def n →∞ ( R n ( q )) 1 /n = lim sup 24

Laplace estimator Pattern: 123 . . . n p Ψ (123 . . . n ) = 1 ˆ 1 q +1 (123 . . . n ) = 1 · 3 · ... · (2 n +1) � n p Ψ (123 ...n ) R n ( q +1 ) ≥ ˆ � 2 n q +1 (123 ...n ) = 1 · 3 · · · (2 n +1) ≈ e 2 n R ∗ ( q +1 ) = lim sup e = ∞ n →∞ 25

Good-Turing estimator Multiplicity of ψ ∈ Z + in ψ n 1 def µ ψ = |{ 1 ≤ i ≤ n : ψ i = ψ }| Prevalence of multiplicity µ in ψ n 1 def = |{ ψ : µ ψ = µ }| ϕ µ Increased multiplicity r def = µ ψ n +1 Good-Turing estimator  ϕ ′ 1 n , r = 0   q ( ψ n +1 | ψ n 1 ) = ϕ ′ r +1 r +1 r , r ≥ 1  n  ϕ ′ ϕ ′ µ — smoothed version of ϕ µ 26

Performance of Good Turing Analyzed three versions Simple: 1 . 39 ≤ R ∗ ( q sgt ) ≤ 2 Church-Gale: experimatnatally > 1 Common-sense: same 27

Diminishing attenuation � n 1 / 3 � c [ n ] = f c [ n ] ( ϕ ) def = max( ϕ, c [ n ])  f c [ n ] ( ϕ 1 + 1) r = 0  1  3 ( ψ n +1 | ψ n q 1 1 ) = 1 ) · f c [ n ] ( ϕ r +1 +1) S c [ n ] ( ψ n ( r + 1) r > 0 f c [ n ] ( ϕ r )   S c [ n ] ( ψ n 1 ) is a normalization factor 3 ) ≤ 2 O ( n 2 / 3 ) , R n ( q 1 constant ≤ 10 3 ) ≤ 2 O ( n − 1 / 3 ) → 1 R ∗ ( q 1 Proof: Potential functions 28

Low-attenuation estimator t n — largest power of 2 that is ≤ n 1 ) def = { y 2 t n ∈ Ψ 2 t n : y n Ψ 2 t n ( ψ n 1 = ψ n 1 } 1 � n µ =1 µ ! ϕµ ϕ µ ! 1 ) def p ( ψ n ˜ = n ! � ) ˜ p ( y ) y ∈ Ψ2 tn ( ψn +1 ( ψ n +1 | ψ n 1 q 1 1 ) = � 1) ˜ p ( y ) y ∈ Ψ2 tn ( ψn 2 √ n � � 4 π R n ( q 1 ) ≤ exp √ √ 3(2 − 2) 2 � � 4 π R ∗ ( q 1 ) ≤ exp → 1 √ √ 2) √ n 3(2 − 2 Proof: Integer partitions, Hardy-Ramanujan 29

Lower bound 3 ) ≤ 2 O ( n 2 / 3 ) R n ( q 1 ) ≤ 2 O ( n 1 / 2 ) R n ( q 1 2 For any q , R n ( q ) ≥ 2 Ω( n 1 / 3 ) Proof: Generating functions and Hayman’s thm 30

“Test” q (new) = Θ( 1 aaaa . . . n ) q (new) = Θ( 1 n ) abab . . . 1 abcd . . . q (new) = 1 − Θ( n 2 / 3 ) q (new) = Possible guess: 1/2 aabbcc . . . q (new) = 1 / 4 after even, 0 after odd “Explanation”: likely | αβ | = 0 . 62 n p (new) ≈ 0 . 2 31

Compression and Estimation Over Large Alphabets Alon Orlitsky - PDF document

Compression and Estimation Over Large Alphabets Alon Orlitsky Narayana P. Santhanam Krishnamurthy Viswanathan Junan Zhang UCSD 1 Univeral Compression [Sh 48] [Fi 66, Da 73] Setup: A alphabet p collection of p.d.s over A n random

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

Lossless compression in lossy compression systems Almost every lossy compression system

Learning Automata over Large Alphabets Oded Maler Irini Eleftheria Mens CNRS-V ERIMAG University

;; keep-lt-5 : list of numbers -> list of numbers ;; Purpose: keeps all input numbers less

Data Amplification: Instance-Optimal Property Estimation Yi Hao and Alon Orlitsky {yih179, alon}@

Learning Learning Re Regular gular Languages Languages over er Lar Large ge Alphabets

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

OF DC-I SOLATED G ATE D RIVERS Alon Blumenfeld, Alon Cervera, and Shmuel (Sam) Ben-Yaakov Power

OSINT: The Secret Weapon in Hunting Nation-State Campaigns alon@intsights.com Alon Arvatz

Class 09: Recursion practice, how recursive programs work Recall the list-length procedure

Doubly-Competitive Distribution Estimation Yi Hao and Alon Orlitsky Department of Electrical and

Analog-to-Digital Compression Oral PhD Exam Alon Kipnis Advisor: Andrea Goldsmith 1 /32

Digital Image Compression Digital Image Compression Digital Image Compression and JPEG Standards

Digital Video Compression Digital Video Compression Digital Video Compression and H.261

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg

Compsci 201 201 Sear earch T Tree ees an and Recursio sion Susan Rodger March 6, 2020

Machine Learning Lecture 6 Unsupervised Learning with a bit of Supervised Learning Clustering

Recent Improvements on AIRS V6-CH4 retrieval, validation and data analysis Xiaozhen(Shawn) Xiong,

NOAA-GMD HIPPO data past and future: transport and chemistry in the troposphere. (HIPPO-NOAA-GMD

Using Apache Commons SCXML 2.0 a general purpose and standards based state machine engine Ate

Multiscale Conditional Random Fields for Image Labeling Xuming He, Richard Zemel and Miguel A.

Experimental Design & Evaluation 14. Quiz 3 SunyoungKim,PhD 1. Scientists always try

Texture Mapping Motivation: Add interesting and/or realistic detail to surfaces of objects.

Compression and Estimation Over Large Alphabets Alon Orlitsky - PDF document

Compression and Estimation Over Large Alphabets Alon Orlitsky Narayana P. Santhanam Krishnamurthy Viswanathan Junan Zhang UCSD 1 Univeral Compression [Sh 48] [Fi 66, Da 73] Setup: A alphabet p collection of p.d.s over A n random

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

Lossless compression in lossy compression systems Almost every lossy compression system

Learning Automata over Large Alphabets Oded Maler Irini Eleftheria Mens CNRS-V ERIMAG University

;; keep-lt-5 : list of numbers -&gt; list of numbers ;; Purpose: keeps all input numbers less

Data Amplification: Instance-Optimal Property Estimation Yi Hao and Alon Orlitsky {yih179, alon}@

Learning Learning Re Regular gular Languages Languages over er Lar Large ge Alphabets

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

OF DC-I SOLATED G ATE D RIVERS Alon Blumenfeld, Alon Cervera, and Shmuel (Sam) Ben-Yaakov Power

OSINT: The Secret Weapon in Hunting Nation-State Campaigns alon@intsights.com Alon Arvatz

Class 09: Recursion practice, how recursive programs work Recall the list-length procedure

Doubly-Competitive Distribution Estimation Yi Hao and Alon Orlitsky Department of Electrical and

Analog-to-Digital Compression Oral PhD Exam Alon Kipnis Advisor: Andrea Goldsmith 1 /32

Digital Image Compression Digital Image Compression Digital Image Compression and JPEG Standards

Digital Video Compression Digital Video Compression Digital Video Compression and H.261

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg

Compsci 201 201 Sear earch T Tree ees an and Recursio sion Susan Rodger March 6, 2020

Machine Learning Lecture 6 Unsupervised Learning with a bit of Supervised Learning Clustering

Recent Improvements on AIRS V6-CH4 retrieval, validation and data analysis Xiaozhen(Shawn) Xiong,

NOAA-GMD HIPPO data past and future: transport and chemistry in the troposphere. (HIPPO-NOAA-GMD

Using Apache Commons SCXML 2.0 a general purpose and standards based state machine engine Ate

Multiscale Conditional Random Fields for Image Labeling Xuming He, Richard Zemel and Miguel A.

Experimental Design &amp; Evaluation 14. Quiz 3 SunyoungKim,PhD 1. Scientists always try

Texture Mapping Motivation: Add interesting and/or realistic detail to surfaces of objects.

;; keep-lt-5 : list of numbers -> list of numbers ;; Purpose: keeps all input numbers less

Experimental Design & Evaluation 14. Quiz 3 SunyoungKim,PhD 1. Scientists always try