The Minimum Description Length Principle Peter Grnwald CWI - - PowerPoint PPT Presentation

the minimum description length principle
SMART_READER_LITE
LIVE PREVIEW

The Minimum Description Length Principle Peter Grnwald CWI - - PowerPoint PPT Presentation

The Minimum Description Length Principle Peter Grnwald CWI Amsterdam www.grunwald.nl (slides edited by Tim van Erven) Machine Learning Course, Vrije Universiteit Amsterdam December 5 th 2007 Minimum Description Length Principle Rissanen


slide-1
SLIDE 1

The Minimum Description Length Principle

Peter Grünwald CWI Amsterdam www.grunwald.nl

(slides edited by Tim van Erven)

Machine Learning Course, Vrije Universiteit Amsterdam December 5th 2007

slide-2
SLIDE 2

Minimum Description Length Principle

  • ‘MDL’ is a method for inductive inference…

– machine learning – pattern recognition – statistics

  • …based on ideas from data compression (information

theory)

  • In contrast to most other methods, MDL automatically

deals with overfitting, arguably the central problem in machine learning and statistics Rissanen 1978, 1987, 1996, Barron, Rissanen and Yu 1998

slide-3
SLIDE 3

Minimum Description Length Principle

  • MDL is based on the correspondence

between ‘regularity’ and ‘compression’: – The more you are able to compress a sequence of data, the more regularity you have detected in the data – Example:

001 0010 0100 1001 0010 0100 1001 ::::0 01 010 1101 1100 1001 1101 0001 0101 ::::0 10

slide-4
SLIDE 4

Minimum Description Length Principle

  • MDL is based on the correspondence

between ‘regularity’ and ‘compression’: – The more you are able to compress a sequence of data, the more regularity you have detected in the data… – …and thus the more you have learned from the data:

  • ‘inductive inference’ as trying to find regularities

in data (and using those to make predictions of future data)

slide-5
SLIDE 5

Model Selection/Overfitting

Given data D and hypothesis spaces/models , which model best explains the data ? – Need to take into account

  • Complexity of models
  • Error (minus Goodness-of-fit)

– Example:

  • Selecting the degree of a polynomial in

regression

  • Sum of squared errors

M 1, M 2, M 3 ,

slide-6
SLIDE 6

Example: Regression

slide-7
SLIDE 7

Example: Regression

slide-8
SLIDE 8

Example: Regression

slide-9
SLIDE 9

Example: Regression

slide-10
SLIDE 10

Example: Regression