Minimum Description Length Bono Nonchev Principle in Model - PowerPoint PPT Presentation

Minimum Description Length Principle in Model Selection Minimum Description Length Bono Nonchev Principle in Model Selection Information Theory The MDL Principle Bono Nonchev Model Selection Faculty of Mathematics and Informatics, Sofia University Model Complexity Q/A ISCPS, SDA, WPA 2012, Pomorie

Contents Minimum Description Length Principle in Model Selection 1 Information Theory Bono Nonchev Information 2 The MDL Principle Theory The MDL Principle 3 Model Selection Model Selection Model Complexity 4 Model Complexity Q/A

Knowledge = compression Minimum Description Length Regularities in data lead to compression Principle in Model Selection Example: Bono 0101010101010101010101010101010101010... Nonchev 1101100111111101111110110011111111111... Information 1010101000111010001110100011101011111... Theory Denote an n-tuple of real numbers x n - data, generated by The MDL Principle some process. Model Selection Make inference about the generating process by finding a Model way to encode the data using the patterns it exhibits. Complexity Q/A Kolmogorov complexity and problems uncomputability arbitrariness

Equivalence between code and distribution Minimum Description Length Let x n be a realization of the random vector X n in Principle in Model (Ω , F , P ). Selection Bono x n is encoded with a string of 0 and 1 with length L ( x n ) Nonchev having unique decodability. Information Theory Shortest code length (in expected sence) is achieved using a code that for a given observation x n has length The MDL Principle (Shannon-Fano coding) Model Selection L ( x n ) = − log P ( x n ) Model Complexity Q/A A probability distributions defines a code and vice versa. The requirement for integer code lengths is not essential.

The MDL principle I Minimum Description Length Principle in Restrict the class of models and codes to probability Model Selection distributions. Bono Nonchev Define a set of candidate models H , e.g. N ( µ, σ ). Encode data “optimally” using code with length L ( x n | H ), Information Theory for each point hypothesis H ∈ H , e.g. The MDL H = { X n ∈ N (1 . 42 , 0 . 443) } . Principle Model Encode point hypothesis “optimally” with code length Selection L ( H ). Model Complexity The optimal point hypothesis H ∈ H is that for which Q/A L ( x n | H ) + L ( H ) is minimal. Not clear how to find the code lengths for H and x n | H .

The MDL principle II L is called a universal code with respect to a family of Minimum Description codes L , if Length Principle in Model 1 � � Selection L ∗ ∈L L ∗ ( x n ) L ( x n ) − max − n →∞ 0 − − → Bono n Nonchev Examples: Information Theory Two step coding: code H ∈ H “uniformly”, then code x n The MDL using the coding corresponding to P ( x n | H ). Principle Bayesian approach (Minimum Message Length): define Model Selection prior probability P H ( H ), then code using Model Complexity L ( x n ) = − log P ( x n | H ) − log P H ( H ) Q/A Instead of a set of codes L examine a set of distributions M . M ∈ M that corresponds to a universal code is called a universal model .

Measures of Goodness For a model M and a distribution ˜ Minimum P , the Regret is a Description Length measure of distance - how much we loose if we try to Principle in encode the data using ˜ Model P instead of the best distribution in Selection M : Bono Nonchev R M ( ˜ P , x n ) = − log ˜ P ( x n ) − min P ∈M {− log P ( x n ) } Information Theory To remove the dependence on x n we take the maximum The MDL Principle � � M ( ˜ R ( ˜ R max P , x n ) P ) = max Model Selection x n ∈X n Model For a parametric family M θ we use the maximum Complexity likelihood estimate ˆ θ ( x n ) and define model complexity as Q/A � P ( x n | ˆ θ ( x n )) COMP n ( M θ ) = x n ∈X n Also called Stochastic Complexity .

Normalized Maximum Likelihood (NML) Distribution Minimum Find distribution for x n that is universal w.r.t. M θ . Description Length Idea: use P ( x n | ˆ θ ( x n )). Principle in Model Selection Not possible, so do the next best thing: normalize the Bono above probability: Nonchev Information P ( x n | ˆ θ ( x n )) Theory ˜ P NML ( x n ) = y n ∈ Y n P ( y n | ˆ The MDL � θ ( y n )) Principle Model The NML distribution achieves constant regret Selection Model Complexity R M θ ( ˜ P , x n ) = log COMP n ( M θ ) Q/A NML is a universal model with code length L ( x n ) = − log P ( x n | ˆ θ ( x n )) + log COMP n ( M θ )

Properties of Model Complexity Minimum Description Length COMP n ( M θ ) is a measure of the complexity or flexibility Principle in Model of the family of distributions M θ . Selection In case M θ is discrete, COMP n ( M θ ) can be interpreted Bono Nonchev as the number of “essentially different models” in the Information family. Theory COMP n ( M θ ) is invariant under change of The MDL Principle parametrization. Model Selection When some regularity conditions are imposed Model Complexity k 2 log n � � log COMP n ( M θ ) − − − → 2 π +log | I ( θ ) | d θ + o (1) Q/A n →∞ θ ∈ Θ It is possible that COMP n ( M θ ) = ∞ .

Model Selection Problem Minimum Description Length Principle in A sample X 1 , . . . , X n of a random variable. Model Selection Decide which of a myriad of distributions does this sample Bono Nonchev originate. Example: Information Theory y = ax b + Z - Stevens’ model The MDL y = a ln ( x + b ) + Z - Fechner’s model Principle Model More data patterns can be explained by Stevens’ model Selection than by Fechner’s (see [Grundwald, Rissanen, 2007]). Model Complexity Our goal is to select between: Q/A (N) - the sample is from a distribution in N ( µ, σ 2 I ). (T) - the sample is from a distribution in T ν ( µ, σ 2 I ).

Model Selection Solution Minimum Description Length Principle in Model Ordinary information criteria (AIC, BIC, GIC, DIC) do not Selection account for complexity beyond the number of free Bono Nonchev parameters. Using the MDL principle: Information Theory Calculate COMP n ( M θ ) for both models. The MDL Calculate MLE for µ and σ and the log-likelihood of the Principle data. Model Selection Select the model with smallest total description length Model Complexity L ( x n ) = − log f ( x n | ˆ µ, ˆ σ ) + log COMP n ( M θ ) Q/A .

Solution for Gaussian Model with Known Variance Minimum [Barron et al, 1998] show that with (jointly) sufficient Description Length statistics T present for θ Principle in Model Selection � � � � dx n = x n | ˆ P ( t | ˆ θ ( x n ) COMP n ( M θ ) = X n P θ ( t )) dt Bono Nonchev T Information Normal model with known variance COMP n ( M θ ) = ∞ . Theory The MDL They propose that the conditional complexity be used by � x i Principle � ≤ R � 1 �� limiting = A : Model n Selection � COMP n ( M θ | x n ∈ A ) = � � Model x n | ˆ θ ( x n ) dx n P Complexity A Q/A In that case COMP n ( M θ | x n ∈ A ) = − 1 2 log π − log σ + 1 2 log n +log 2 R

Solution for Gaussian Model with Unknown Variance Minimum Description Length Principle in Model Use the sufficient statistics x and s 2 . Selection Bono Compute the conditonal complexity for Nonchev � | x | ≤ R , D ≤ s 2 � A = : Information Theory � n � n 2 e − n COMP n ( M θ | x n ∈ A ) = 2 The MDL 2 � × 2 RD − 1 2 √ π Γ Principle � n − 1 Model 2 Selection The unconditional complexity is again infinite Model Complexity An idea emerges - try to extract the last term and ignore Q/A it when comparing models.

Complexity of Absolutely Continuous Location-Scale Family I Minimum Description Length Define the p.d.f. of a multivariate location-scale family Principle in Model Selection � x n − µ � Bono f ( x n | µ, σ ) = σ − n g Nonchev σ Information The conditional complexity, conditional on x n ∈ A for Theory The MDL Principle σ 2 ( x n ) � � A = | ˆ µ ( x ) | ≤ R , D ≤ ˆ Model Selection Model is the following integral Complexity Q/A � x n − ˆ µ ( x n ) � � COMP n ( M θ | x n ∈ A ) = σ ( x n )) − n g dx n (ˆ σ ( x n ) ˆ A

Complexity of Absolutely Continuous Location-Scale Family II Minimum Theorem Description Length If δ is the Dirac delta function, then Principle in Model Selection COMP n ( M θ | x n ∈ A ) Bono Nonchev � = 2 RD − 1 × µ ( y n ))) δ (1 − ˆ σ ( y n ))] g ( y n ) dy n [ δ (ˆ Information Theory The MDL = 2 RD − 1 × E Y n [ δ (ˆ µ ( Y n )) δ (1 − ˆ σ ( Y n ))] Principle Model Selection Corollary Model Complexity The unconditional parametric complexity of an absolutely Q/A continuous location-scale family is either zero or infinity. Note: Dependence structure in a sample is treated in the integral.

Future work Minimum Description Length Principle in Model Calculate the parametric complexity of multivariate Selection Student-T using the theorem and that x and s 2 are the Bono Nonchev MLE estimators for the parameters. Information Calculate the stochastic complexity of linear regression Theory with Student-T distributed residuals. The MDL Principle Extend the theorem and corollary to be used when the Model sample statistics are not MLE estimators: Selection Model An i.i.d. sample with Student T marginals. Complexity Stable and CTS Q/A Models for time dependence

Minimum Description Length Bono Nonchev Principle in Model - PowerPoint PPT Presentation

Minimum Description Length Principle in Model Selection Minimum Description Length Bono Nonchev Principle in Model Selection Information Theory The MDL Principle Bono Nonchev Model Selection Faculty of Mathematics and Informatics,

DATA MINING LECTURE 10 Minimum Description Length Information Theory Co-Clustering MINIMUM

The Minimum Description Length Principle Peter Grnwald CWI Amsterdam www.grunwald.nl (slides

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Govt. of Gujarat Gujarat Coastline Zone Accretion Erosion length Stable length Total length

Verification of Security Protocols with Lists: from Length One to Unbounded Length Miriam Paiola

For Friday Read Chapter 10, sections 1 and 2 Prolog Handout 4 Length of a List

February 2017 European Minimum Income Network - Introduction The European Minimum Income Network

Outline and Reading Minimum Spanning Trees ( 12.7.3) Minimum Spanning Tree n Definitions n A

THE MINIMUM WAGE Background I Established in US in 1938 THE MINIMUM WAGE Background I

Lecture 6A LENSES, FOCAL LENGTH, & PORTRAITS LOUDEN 1 What is focal Length? Very

Class 09: Recursion practice, how recursive programs work Recall the list-length procedure

Finite-size scaling For a system of length L, the correlation length Express divergent quantities

taken as: the length of the design vehicle + 6 inches (for bumper clearance) For small cars:

Scheme: Whats Good? Whats Bad? An advanced cognitive task: 1. Remember 2. Understand 3.

Midterm Review Jason Mars Monday, February 11, 13 ISA - Instruction Length Fixed Length

Arc Length Consider a curve y = f ( x ), a x b . Alan H. SteinUniversity of Connecticut

Complete Statistics Lecture 05 Biostatistics 602 - Statistical Inference . Summary . .

Capstone 1 Presentation Locations Please make sure to read this document in its entirety as it

VENUE PRESENTATION 030 EVENTLOFT 030 EVENTLOFT More room for big ideas INCLUSIVE Banquet

The Civic University and the City John Goddard OBE Emeritus Professor of Regional Development

CSX CORPORATION COWEN & COMPANY GLOBAL TRANSPORTATION CONFERENCE Progressing Forward Forward

2018 FOURTH QUARTER EARNINGS CONFERENCE CALL James M. Foote President and Chief Executive

ECRG presentation 13 September 2018 Rob Carr Programme Director, Area South Topics Euston

National Clean Diesel Campaign American Recovery and Reinvestment Act of 2009 Diesel Emission

Sambuz

Useful Links

Newsletter

Mail Us

Minimum Description Length Bono Nonchev Principle in Model - PowerPoint PPT Presentation

Minimum Description Length Principle in Model Selection Minimum Description Length Bono Nonchev Principle in Model Selection Information Theory The MDL Principle Bono Nonchev Model Selection Faculty of Mathematics and Informatics,

DATA MINING LECTURE 10 Minimum Description Length Information Theory Co-Clustering MINIMUM

The Minimum Description Length Principle Peter Grnwald CWI Amsterdam www.grunwald.nl (slides

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Govt. of Gujarat Gujarat Coastline Zone Accretion Erosion length Stable length Total length

Verification of Security Protocols with Lists: from Length One to Unbounded Length Miriam Paiola

For Friday Read Chapter 10, sections 1 and 2 Prolog Handout 4 Length of a List

February 2017 European Minimum Income Network - Introduction The European Minimum Income Network

Outline and Reading Minimum Spanning Trees ( 12.7.3) Minimum Spanning Tree n Definitions n A

THE MINIMUM WAGE Background I Established in US in 1938 THE MINIMUM WAGE Background I

Lecture 6A LENSES, FOCAL LENGTH, &amp; PORTRAITS LOUDEN 1 What is focal Length? Very

Class 09: Recursion practice, how recursive programs work Recall the list-length procedure

Finite-size scaling For a system of length L, the correlation length Express divergent quantities

taken as: the length of the design vehicle + 6 inches (for bumper clearance) For small cars:

Scheme: Whats Good? Whats Bad? An advanced cognitive task: 1. Remember 2. Understand 3.

Midterm Review Jason Mars Monday, February 11, 13 ISA - Instruction Length Fixed Length

Arc Length Consider a curve y = f ( x ), a x b . Alan H. SteinUniversity of Connecticut

Complete Statistics Lecture 05 Biostatistics 602 - Statistical Inference . Summary . .

Capstone 1 Presentation Locations Please make sure to read this document in its entirety as it

VENUE PRESENTATION 030 EVENTLOFT 030 EVENTLOFT More room for big ideas INCLUSIVE Banquet

The Civic University and the City John Goddard OBE Emeritus Professor of Regional Development

CSX CORPORATION COWEN &amp; COMPANY GLOBAL TRANSPORTATION CONFERENCE Progressing Forward Forward

2018 FOURTH QUARTER EARNINGS CONFERENCE CALL James M. Foote President and Chief Executive

ECRG presentation 13 September 2018 Rob Carr Programme Director, Area South Topics Euston

National Clean Diesel Campaign American Recovery and Reinvestment Act of 2009 Diesel Emission

Sambuz

Useful Links

Newsletter

Mail Us

Lecture 6A LENSES, FOCAL LENGTH, & PORTRAITS LOUDEN 1 What is focal Length? Very

CSX CORPORATION COWEN & COMPANY GLOBAL TRANSPORTATION CONFERENCE Progressing Forward Forward