Information Theory in an Industrial Research Lab Marcelo J. - PowerPoint PPT Presentation

Information Theory in an Industrial Research Lab Marcelo J. Weinberger Information Theory Research Group Hewlett-Packard Laboratories – Advanced Studies Palo Alto, California, USA with contributions from the ITR group Purdue University – November 19, 2007

Information Theory (Shannon, 1948) it’s all about models, bounds, and algorithms � The mathematical theory: � measures of information � fundamentals of data representation (codes) for � compactness � secure, reliable communication/storage over a possibly noisy channel A formal framework for areas of engineering and science for which the notion of “information” is relevant � Components: � data models � fundamental bounds � codes, efficient encoding/decoding algorithms � Engineering problems addressed: � data compression enabling technologies with many practical � error control coding applications in: computing, imaging, storage, � cryptography multimedia, communications...

Information Theory research in the industry � Mission Research the mathematical foundations and practical applications of information theory, generating intellectual property and technology for “XXX Company” through the advancement of scientific knowledge in these areas � Apply the theory and work on the applications makes obvious sense for “XXX Company” research labs; But why invest on advancing the theory? � some simple answers which apply to any basic research area: long-term investment, prestige, visibility, give back to society... � this talk will be about a different type of answer: differentiating technology vs. enabling technology Main claim: working on the theory helps developing analytical tools that are needed to envision innovative, technology-differentiating ideas

Case studies Input Output � JPEG-LS: 010010... 010010... store, de- compress transmit compress from universal context modeling to a lossless image compression standard � DUDE (Discrete Universal DEnoiser): from a formal setting for universal denoising to actual image denoising algorithms S S 0 0 1 1 � Error-correcting codes in nanotechnology: the advantages of interdisciplinary research 111 111 100 100 � 2-D information theory: 010 010 001 001 looking into the future of storage devices

Work paradigm new challenges patents, technology, consulting work on practical solutions visibility standards identify a start (fairly abstract) industry visibility practical talent problem e.g. e.g. •image compression academia, •2D channel coding work on •ECC for nano the scientific ideas, •denoising theory community papers, motivation participation •scientific interest •vision of benefit to XXX

Universal Modeling and Coding � Traditional Shannon theory assumes that a (probabilistic) model of the data is available, and aims at compressing the data optimally w.r.t. the model � Kraft’s inequality: Every uniquely decipherable (UD) code of ∑ − ( ) 2 ≤ length L ( s ) satisfies L s 1 ∈ s A n string of length n over finite alphabet A ⇒ a code defines a probability distribution P ( s ) = 2 - L ( s ) over A n � Conversely, given a distribution P ( ) (a model), there exists a − log ⎡ ( ) ⎤ bits to s (Shannon code) UD code that assigns p s � Hence, P ( ) serves as a model to encode s , and every code has an associated model � a model is a probabilistic tool to “understand” and predict the behavior of the data

Universal Modeling and Coding (cont.) � Given a model P ( ) on n -tuples, arithmetic coding provides an effective mean to sequentially assign a code word of length close to -log P ( s ) to s � if s = x 1 x 2 … x n the “ideal code length” for symbol x t is ( | ... ) p x x x x − 1 2 1 t t � the model can vary arbitrarily and “adapt” to the data � CODING SYSTEM = MODEL + CODING UNIT � two separate problems: design a model and use it to encode We will view data compression as a problem of assigning probabilities to data

Coding with Model Classes � Universal data compression deals with the optimal description of data in the absence of a given model � in most practical applications, the model is not given to us � How do we make the concept of “optimality” meaningful? � there is always a code that assigns just 1 bit to the given data! The answer: Model classes � We want a “universal” code to perform as well as the best model in a given class C for any string s , where the best competing model changes from string to string � universality makes sense only w.r.t. a model class � A code with length function L ( x n ) is pointwise universal w.r.t. a class if, when n → ∞ , code length with model C 1 = − → ( , ) [ ( ) min ( )] 0 n n n R L x L x L x C C ∈ n C C

How to Choose a Model Class? Universal coding tells us how to encode optimally w.r.t. to a class; it doesn’t tell us how to choose a class! � Some possible criteria: � complexity: existence of efficient algorithms � prior knowledge on the data � We will see that the bigger the class, the slower the best possible convergence rate of the redundancy to 0 � in this sense, prior knowledge is of paramount importance: don’t learn what you already know! Ultimately, the choice of model class is an art

Parametric Model Classes � A useful limitation to the model class is to assume C = { P θ , θ ∈ Θ d } a parameter space of dimension d � Examples: � Bernoulli: d = 1 , general i.i.d. model: d = α -1 ( α = | A | ) � FSM model with k states: d = k ( α -1) � memoryless geometric distribution on the integers i ≥ 0 : P ( i ) = θ i (1- θ ) , d = 1 � A straightforward method: two-part code [Rissanen ‘84] ⎡ ⎤ θ + − log θ ( | ) p x n encode best bits probability of the data “model cost”: under θ : grows with d grows with d � Trade-off: the dimension of the parameter space plays a fundamental role in modeling problems

Fundamental Lower Bound � A criterion for measuring the optimality of a universal model is provided by Rissanen’s lower bound [Rissanen ‘84] � for every p ( ) , any ε > 0 , and sufficiently large n , log n − − ≥ θ − ε 1 [ log ( )] ( ) + ( 1 ) n E P x d n H θ n 2 n for all parameter values θ except for a set whose volume → 0 as n → ∞ , provided a “good” estimator of θ exists Conclusion: the number of parameters affects the achievable convergence rate of a universal code length to the entropy

Contexts and Tree Models � More efficient parametrization of a Markov process [Weinberger/Lempel/Ziv ‘92, Weinberger/Rissanen/Feder ‘95] � Any suffix of a sequence x t is called a context in which the next symbol x t+ 1 occurs context 0 1 0 1 0 1 ... 1 1 0 0 1 0 1 ... 0 1 next input bit 0 1 � For a finite-memory source P , the conditioning states s ( x t ) = ∀ ∈ ∈ ( | ) ( | ( )), *, t t are contexts that satisfy P a x P a us x u A a A � # of parameters: α -1 per leaf of the tree � There exist efficient universal schemes in the class of tree models of any size [ Weinberger/Rissanen/Feder ’95, Willems/Shtarkov/Tjalkens ’95, Martín/Seroussi/Weinberger ’04]

Lossless Image Compression (the real thing…) Input Output 010010... 010010... Store, De- Compress transmit compress � Some applications of lossless image compression: � Images meant for further analysis and processing (as opposed to just human perception) � Images where loss might have legal implications � Images obtained at great cost � Applications with intensive editing and repeated compression/decompression cycles � Applications where desired quality of rendered image is unknown at time of acquisition � International standard: JPEG-LS (1998)

Universality vs. Prior Knowledge � Application of universal algorithms for tree models directly to real images yields poor results � some structural symmetries typical of images are not captured by the model � a universal model has an associated “learning cost:” why learn something we already know? � Modeling approach: limit model class by use of “prior knowledge” � for example, images tend to be a combination of smooth regions and edges � predictive coding was successfully used for years: it encodes the difference between a pixel and a predicted value of it � prediction errors tend to follow a Laplacian distribution ⇒ AR model + Laplacian, where both the center and the decay are context dependent � Prediction = fixed prediction + adaptive correction

Models for Images � In practice, contexts are formed out of a Causal template c b d finite subset of the past sequence a x Current sample � Conditional probability model for prediction errors: two-sided geometric distribution (TSGD) P ( e ) s = θ + θ ∈ ∈ | | ( ) , (0,1), [ 0 , 1 ) e s P e c s 0 TSGD � “discrete Laplacian” e – 1 –s � shift s constrained to [0,1) by integer-valued adaptive correction (bias cancellation) on the fixed predictor

Information Theory in an Industrial Research Lab Marcelo J. - PowerPoint PPT Presentation

Information Theory in an Industrial Research Lab Marcelo J. Weinberger Information Theory Research Group Hewlett-Packard Laboratories Advanced Studies Palo Alto, California, USA with contributions from the ITR group Purdue University

Tuberculosis Researches in Thailand

HCC@UF Lab Resources Overview (and Tour) Lisa Anthony, PhD January 12, 2017 HCC@UF Lab

Lab 7 Lab 6 Review Review for Lab 7 March 5, 2019 Sprenkle - CSCI111 1 Lab 7: Pair

Bushy Park Industrial Complex Bushy Park Industrial Complex Bushy Park Industrial Complex

MID Mould Industrial Ltd. MID Mould Industrial Ltd. Midday Industrial Ltd. td. Midday

Digital Industrial Revolution Bearing Specialists Association Greg Scheu, President ABB Americas

industrial IOT technologies INDUSTRIAL MARKET INSIGHTS within global industrial sector

INFORMATION ORGANIZATION LAB INFORMATION ORGANIZATION LAB NOVEMBER 26, 2012 LAST TIME ON IOLAB

INFORMATION ORGANIZATION LAB INFORMATION ORGANIZATION LAB NOVEMBER 7, 2012 LAST TIME ON IOLAB

INFORMATION ORGANIZATION LAB INFORMATION ORGANIZATION LAB SEPTEMBER 17, 2012 LAST TIME ON IOLAB

Medical Lab Medical Lab Technology Technology - ELO ELO What is a Medical lab What is a

Computer Applications Lab Computer Applications Lab Lab 1 Lab 1 Introduction to Matlab

Week 1 Tutorial: Lab Preview & Building Gates Lab 0 Using the DE2. Creating a project

Computer Applications Lab Computer Applications Lab Lab 7 Lab 7 Designing GUI with Matlab

Computer Applications Lab Computer Applications Lab Lab 9 Lab 9 Numerical Calculus and Symbolic

Lab Overview Review lab 8 Prep for lab 9 March 20, 2018 Sprenkle - CSCI111 1 Lab 8:

Cataloging Roundtable Item Status and Copy Buckets Item Status provides a wealth of information!

Distributed Systems Security Topics Byzan7ne fault resistance BitCoin Course Wrap Up

UMBC A B M A L T F O U M B C I M Y O R T 1 (May 5, 2002) I E S R C E O

Semi-leptonic and Dileptonic Top-Quark Decays at ATLAS Raphael Mameghani IMPRS/GK Young

AI Basics Heechul Yun Acknowledgement: Many slides are adopted from Berkeleys CS188 AI slide

Foundations of Computer Science Lecture 17 Independent Events Independence is a Powerful

On the PPA -completeness of the Combinatorial Nullstellensatz and the Chevalley-Warning Theorem

Applications: Information Storage Laurent Ranno Laboratoire Louis Nel, Grenoble Recording

Information Theory in an Industrial Research Lab Marcelo J. - PowerPoint PPT Presentation

Information Theory in an Industrial Research Lab Marcelo J. Weinberger Information Theory Research Group Hewlett-Packard Laboratories Advanced Studies Palo Alto, California, USA with contributions from the ITR group Purdue University

Tuberculosis Researches in Thailand

HCC@UF Lab Resources Overview (and Tour) Lisa Anthony, PhD January 12, 2017 HCC@UF Lab

Lab 7 Lab 6 Review Review for Lab 7 March 5, 2019 Sprenkle - CSCI111 1 Lab 7: Pair

Bushy Park Industrial Complex Bushy Park Industrial Complex Bushy Park Industrial Complex

MID Mould Industrial Ltd. MID Mould Industrial Ltd. Midday Industrial Ltd. td. Midday

Digital Industrial Revolution Bearing Specialists Association Greg Scheu, President ABB Americas

industrial IOT technologies INDUSTRIAL MARKET INSIGHTS within global industrial sector

INFORMATION ORGANIZATION LAB INFORMATION ORGANIZATION LAB NOVEMBER 26, 2012 LAST TIME ON IOLAB

INFORMATION ORGANIZATION LAB INFORMATION ORGANIZATION LAB NOVEMBER 7, 2012 LAST TIME ON IOLAB

INFORMATION ORGANIZATION LAB INFORMATION ORGANIZATION LAB SEPTEMBER 17, 2012 LAST TIME ON IOLAB

Medical Lab Medical Lab Technology Technology - ELO ELO What is a Medical lab What is a

Computer Applications Lab Computer Applications Lab Lab 1 Lab 1 Introduction to Matlab

Week 1 Tutorial: Lab Preview &amp; Building Gates Lab 0 Using the DE2. Creating a project

Computer Applications Lab Computer Applications Lab Lab 7 Lab 7 Designing GUI with Matlab

Computer Applications Lab Computer Applications Lab Lab 9 Lab 9 Numerical Calculus and Symbolic

Lab Overview Review lab 8 Prep for lab 9 March 20, 2018 Sprenkle - CSCI111 1 Lab 8:

Cataloging Roundtable Item Status and Copy Buckets Item Status provides a wealth of information!

Distributed Systems Security Topics Byzan7ne fault resistance BitCoin Course Wrap Up

UMBC A B M A L T F O U M B C I M Y O R T 1 (May 5, 2002) I E S R C E O

Semi-leptonic and Dileptonic Top-Quark Decays at ATLAS Raphael Mameghani IMPRS/GK Young

AI Basics Heechul Yun Acknowledgement: Many slides are adopted from Berkeleys CS188 AI slide

Foundations of Computer Science Lecture 17 Independent Events Independence is a Powerful

On the PPA -completeness of the Combinatorial Nullstellensatz and the Chevalley-Warning Theorem

Applications: Information Storage Laurent Ranno Laboratoire Louis Nel, Grenoble Recording

Week 1 Tutorial: Lab Preview & Building Gates Lab 0 Using the DE2. Creating a project