Information, Learning and Falsification David Balduzzi December 17, - PowerPoint PPT Presentation

Effective information Algorithmic information Learning theory Falsification Conclusion Information, Learning and Falsification David Balduzzi December 17, 2011 Max Planck Institute for Intelligent Systems T¨ ubingen, Germany

Effective information Algorithmic information Learning theory Falsification Conclusion Three main theories of information: Algorithmic information. “Description”. The information embedded in a single string depends on its shortest description.

Effective information Algorithmic information Learning theory Falsification Conclusion Three main theories of information: Algorithmic information. “Description”. The information embedded in a single string depends on its shortest description. Shannon information. “Transmission”. The information transmitted by symbols depends on the transmission probabilities of other symbols in an ensemble.

Effective information Algorithmic information Learning theory Falsification Conclusion Three main theories of information: Algorithmic information. “Description”. The information embedded in a single string depends on its shortest description. Shannon information. “Transmission”. The information transmitted by symbols depends on the transmission probabilities of other symbols in an ensemble. Statistical learning theory. “Prediction”. The information about the world embedded in a classifier (its expected error) depends on the complexity of the learning algorithm.

Effective information Algorithmic information Learning theory Falsification Conclusion Three main theories of information: Algorithmic information. “Description”. The information embedded in a single string depends on its shortest description. Shannon information. “Transmission”. The information transmitted by symbols depends on the transmission probabilities of other symbols in an ensemble. Statistical learning theory. “Prediction”. The information about the world embedded in a classifier (its expected error) depends on the complexity of the learning algorithm. Can these be related?

Effective information Algorithmic information Learning theory Falsification Conclusion Three main theories of information: Algorithmic information. “Description”. The information embedded in a single string depends on its shortest description. Shannon information. “Transmission”. The information transmitted by symbols depends on the transmission probabilities of other symbols in an ensemble. Statistical learning theory. “Prediction”. The information about the world embedded in a classifier (its expected error) depends on the complexity of the learning algorithm. Can these be related? Effective information. “Discrimination.” The information produced by a physical process when it produces an output depends on how sharply it discriminates between inputs.

Effective information Algorithmic information Learning theory Falsification Conclusion Effective information

Effective information Algorithmic information Learning theory Falsification Conclusion Nature decomposes into specific, bounded physical systems which we model as deterministic functions f : X → Y or more generally as Markov matrices p m ( y | x ), where X and Y are finite sets.

Effective information Algorithmic information Learning theory Falsification Conclusion Physical processes discriminate between inputs thermometer

Effective information Algorithmic information Learning theory Falsification Conclusion Definition The discrimination given by Markov matrix m outputting y is � � · p unif ( x ) � � ˆ p m x | y := p m y | do ( x ) p m ( y ) , � � where p m ( y ) := � x p m y | do ( x ) · p unif ( x ) is the effective distribution . Definition Effective information is the Kullback-Leibler divergence � � � � � � ei ( m , y ) := H ˆ p m X | y � p unif ( X ) Balduzzi and Tononi, PLoS Computational Biology , 2008

Effective information Algorithmic information Learning theory Falsification Conclusion Special case: deterministic f : X → Y Definition The discrimination given by f outputting y assigns equal probability to all elements of pre-image f − 1 ( y ) . Definition Effective information is ei ( f , y ) := − log | f − 1 ( y ) | | X |

Effective information Algorithmic information Learning theory Falsification Conclusion discrimination thermometer input when thermometer outputs is [ [ size ei = -log size

Effective information Algorithmic information Learning theory Falsification Conclusion Algorithmic information

Effective information Algorithmic information Learning theory Falsification Conclusion Definition Given universal prefix Turing machine T, the Kolmogorov complexity of string s is K T ( s ) := { i : T ( i )= s •} len ( i ) min the length of the shortest program that generates s. For any Turing machine U � = T , there exists a constant c such that K U ( s ) − c ≤ K T ( s ) ≤ K U ( s ) + c for all s .

Effective information Algorithmic information Learning theory Falsification Conclusion Definition Given T, the (unnormalized) Solomonoff prior probability of string s is � 2 − len ( i ) , p T ( s ) := { i | T ( i )= s •} where the sum is over strings i that cause T to output s as a prefix, and no proper prefix of i outputs s. The Turing machine discriminates between programs according to which strings they output; Solomonoff prior counts programs are in each class (weighted by length).

Effective information Algorithmic information Learning theory Falsification Conclusion Kolmogorov complexity = Algorithmic probability Theorem (Levin) For all s − log P T ( s ) = K T ( s ) . up to an additive constant c. Upshot: for my purposes, Solomonoff’s formulation of Kolmogorov complexity is the right one K T ( s ) := − log p T ( s ) .

Effective information Algorithmic information Learning theory Falsification Conclusion Recall, the effective distribution was the denominator when computing discriminations using Bayes’ rule: � � · p unif ( x ) � � p m ˆ x | y := p m y | do ( x ) p m ( y ) .

Effective information Algorithmic information Learning theory Falsification Conclusion Solomonoff prior → Effective distribution Proposition The effective distribution on Y induced by f is � 2 − len ( x ) p f ( y ) = { x : f ( x )= y } Compare with Solomonoff distribution: � 2 − len( i ) p T ( s ) := { i | T ( i )= s •} Compute effective distribution by replacing universal Turing machine T with f : X → Y ; and giving inputs len( x ) = log | X | in the optimal code for the uniform distribution on X .

Effective information Algorithmic information Learning theory Falsification Conclusion Kolmogorov Complexity → Effective information Proposition For function f : X → Y , effective information equals   � 2 − len ( x ) ei ( f , y ) = − log p f ( y ) = − log   { x : f ( x )= y } Compare with Kolmogorov complexity:   � 2 − len( i ) K T ( s ) = − log p T ( s ) = − log   { i | T ( i )= s •}

Effective information Algorithmic information Learning theory Falsification Conclusion Statistical learning theory

Effective information Algorithmic information Learning theory Falsification Conclusion Hypothesis space Given unlabeled data D = ( x 1 , . . . , x l ) ⊂ X l , let hypothesis space � � Σ D = σ : D → ± 1 be the set of all possible labelings. +1 -1 -1 +1 -1 +1 HYPOTHESIS SPACE

Effective information Algorithmic information Learning theory Falsification Conclusion Setup Suppose data D = ( x 1 , . . . , x l ) is drawn from unknown probability distribution P X and labeled y i = σ ( x i ) by an unknown supervisor σ ∈ Σ X . The learning problem: Find a classifier ˆ f guaranteed to perform well on future (unseen) data sampled via P X and labeled by σ .

Effective information Algorithmic information Learning theory Falsification Conclusion Empirical risk minimization Suppose we are given a class F of functions to work with. A simple algorithm for tackling the learning problem is: Algorithm: Given data labeled by σ ∈ Σ D , find classifier ˆ f ∈ F ⊂ Σ D , that minimizes empirical risk: l 1 � ˆ f := arg min I f ( x i ) � = σ ( x i ) l f ∈F i =1

Effective information Algorithmic information Learning theory Falsification Conclusion Empirical risk minimization Suppose we are given a class F of functions to work with. A simple algorithm for tackling the learning problem is: Algorithm: Given data labeled by σ ∈ Σ D , find classifier ˆ f ∈ F ⊂ Σ D , that minimizes empirical risk: l 1 � ˆ f := arg min I f ( x i ) � = σ ( x i ) l f ∈F i =1 Key step. Reformulate algorithm as function between finite sets:

Effective information Algorithmic information Learning theory Falsification Conclusion Empirical risk minimization Suppose we are given a class F of functions to work with. A simple algorithm for tackling the learning problem is: Algorithm: Given data labeled by σ ∈ Σ D , find classifier ˆ f ∈ F ⊂ Σ D , that minimizes empirical risk: l 1 � ˆ f := arg min I f ( x i ) � = σ ( x i ) l f ∈F i =1 Key step. Reformulate algorithm as function between finite sets: Empirical risk minimization : R F , D : HYPOTHESIS SPACE − → EMPIRICAL RISK Σ D − → R � l min f ∈F 1 σ �→ i =1 I f ( x i ) � = σ ( x i ) l

Information, Learning and Falsification David Balduzzi December 17, - PowerPoint PPT Presentation

Effective information Algorithmic information Learning theory Falsification Conclusion Information, Learning and Falsification David Balduzzi December 17, 2011 Max Planck Institute for Intelligent Systems T ubingen, Germany Effective

Hybrid System Falsification and Reinforcement Learning Formal Method for Cyber-Physical Systems

Hybrid System Falsification and Reinforcement Learning Formal Method for Cyber-Physical Systems

An Active Learning Approach to the Falsification of Black Box Cyber-Physical Systems Simone

Channels inclusion, falsification, and verification Francesco Buscemi 1 in coll. with: S.

Poppers Falsificationism Philosophy of Economics University of Virginia Matthias Brinkmann

Test design under unobservable falsification Eduardo Perez-Richet (Sciences Po) Vasiliki Skreta

Lecture 3 Popper on Auxiliary Hypotheses Patrick Maher Scientific Thought II Spring 2010 The

Falsification using Speech Recognition: A Proof of Concept Study Hanyu Sun, Gonzalo Rivero, Ting

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

Social interactions and incentives II MPA 612: Public Management Economics January 29, 2018

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning

Bounded Model Checking of Hybrid Systems From Qualitative to Quantitative Certificates and from

Research misconduct in non-empirical research are there types of misconduct analogous to

4442 Particle Physics Ryan Nichol Module 5 QED http://www.hep.ucl.ac.uk/~rjn/teaching/4442

N 33 34 27.4, E 36 24 25.2 ) in Douma. From 30.06.2019 to 1.07.2019 we carried

Machine Learning Theory CS 446 1. SVM risk 0.6 0.5 aff tr aff te misclassification rate

Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Hollister, 306 Instructor : Karthik

Welcome to Girl Scouts and the great adventure of being a Girl Scout troop leader! Get ready for an

& policy sciences Criminology, Social Policy, Sociology, Social Sciences careers@bath.ac.uk

Slides from lecture Friday, April 26, 2019 12:02 PM Unfiled Notes Page 1 Unfiled Notes Page 2

Course Introduction Mattox Beckman University of Illinois at Urbana-Champaign Department of

Daniel Archambault Lecturer (Assistant Professor) Swansea University, Wales, UK Information

Bethany Noblitt (PI) & Brooke Buckley (co-PI) Northern Kentucky University This material is

Information, Learning and Falsification David Balduzzi December 17, - PowerPoint PPT Presentation

Effective information Algorithmic information Learning theory Falsification Conclusion Information, Learning and Falsification David Balduzzi December 17, 2011 Max Planck Institute for Intelligent Systems T ubingen, Germany Effective

Hybrid System Falsification and Reinforcement Learning Formal Method for Cyber-Physical Systems

Hybrid System Falsification and Reinforcement Learning Formal Method for Cyber-Physical Systems

An Active Learning Approach to the Falsification of Black Box Cyber-Physical Systems Simone

Channels inclusion, falsification, and verification Francesco Buscemi 1 in coll. with: S.

Poppers Falsificationism Philosophy of Economics University of Virginia Matthias Brinkmann

Test design under unobservable falsification Eduardo Perez-Richet (Sciences Po) Vasiliki Skreta

Lecture 3 Popper on Auxiliary Hypotheses Patrick Maher Scientific Thought II Spring 2010 The

Falsification using Speech Recognition: A Proof of Concept Study Hanyu Sun, Gonzalo Rivero, Ting

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

Social interactions and incentives II MPA 612: Public Management Economics January 29, 2018

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning

Bounded Model Checking of Hybrid Systems From Qualitative to Quantitative Certificates and from

Research misconduct in non-empirical research are there types of misconduct analogous to

4442 Particle Physics Ryan Nichol Module 5 QED http://www.hep.ucl.ac.uk/~rjn/teaching/4442

N 33 34 27.4, E 36 24 25.2 ) in Douma. From 30.06.2019 to 1.07.2019 we carried

Machine Learning Theory CS 446 1. SVM risk 0.6 0.5 aff tr aff te misclassification rate

Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Hollister, 306 Instructor : Karthik

Welcome to Girl Scouts and the great adventure of being a Girl Scout troop leader! Get ready for an

&amp; policy sciences Criminology, Social Policy, Sociology, Social Sciences careers@bath.ac.uk

Slides from lecture Friday, April 26, 2019 12:02 PM Unfiled Notes Page 1 Unfiled Notes Page 2

Course Introduction Mattox Beckman University of Illinois at Urbana-Champaign Department of

Daniel Archambault Lecturer (Assistant Professor) Swansea University, Wales, UK Information

Bethany Noblitt (PI) &amp; Brooke Buckley (co-PI) Northern Kentucky University This material is

& policy sciences Criminology, Social Policy, Sociology, Social Sciences careers@bath.ac.uk

Bethany Noblitt (PI) & Brooke Buckley (co-PI) Northern Kentucky University This material is