probability entropy and inference
play

Probability, Entropy, and Inference Ensemble X is a triple ( x, A X - PowerPoint PPT Presentation

2 Ensembles and probabilities Probability, Entropy, and Inference Ensemble X is a triple ( x, A X , P X ) , where Based on David J.C. MacKay: x is the outcome of random variable Information Theory, Inference and Learning Algorithms, 2003


  1. 2 Ensembles and probabilities Probability, Entropy, and Inference • Ensemble X is a triple ( x, A X , P X ) , where Based on David J.C. MacKay: – x is the outcome of random variable Information Theory, Inference and Learning Algorithms, 2003 – A X = { a 1 , a 2 , . . . , a I } are the possible values for x Chapter 2 – P X = { p 1 , p 2 , . . . , p I } are the probabilities of outcomes P ( x = a i ) = p i – p i ≥ 0 – � a i ∈A X P ( x = a i ) = 1 Juha Raitio juha.raitio@iki.fi • P ( x = a i ) may be written as P ( a i ) or P ( x ) 5th February 2004 • Probability of a subset T of A x � P ( T ) = P ( x ∈ T ) = P ( x = a i ) (1) a i ∈ T HUT T-61.182 Information Theory and Machine Learning Juha Raitio 5th February 2004 1 3 Outline Joint ensembles and marginal probabilities • Joint ensemble XY 1. On notation of probabilities – Outcome is an ordered pair x, y (or xy ) 2. Meaning of probability – Possible values A X = { a 1 , a 2 , . . . , a I } and A Y = { b 1 , b 2 , . . . , b J } – Joint probability P ( x, y ) 3. Forward and inverse probabilities • Marginal probabilities 4. Probabilistic inference � P ( x = a i ) ≡ P ( x = a i , y ) (2) 5. Shannon information and entropy y ∈A Y � P ( y ) ≡ P ( y, x ) (3) 6. On convexity of functions x ∈A X 7. Exercises Juha Raitio 5th February 2004 Juha Raitio 5th February 2004

  2. 4 6 Conditioning rules Two meanings for probability • Conditional probability P ( x = a i | y = b j ) ≡ P ( x = a i , y = b j ) • Frequentist view of probability P ( y = b j ) � = 0 , (4) P ( y = b j ) – Probabilities are frequencies of outcomes in random experiments – Probabilities describe random variables • Assumptions H – ”the probability that x equals a i , given H ” • Bayesian view of probability – Probabilities are degrees of belief in propositions • Product (chain) rule – Probabilities describe assumptions, and inferences given assumptions – Subjective intepretation of probability P ( x, y |H ) = P ( x | y, H ) P ( y |H ) = P ( y | x, H ) P ( x |H ) (5) “you cannot do inference without making assumptions” • Sum rule � � P ( x |H ) = P ( x, y |H ) = P ( x | y, H ) P ( y |H ) (6) y y Juha Raitio 5th February 2004 Juha Raitio 5th February 2004 5 7 Bayes theorem and independence Forward and inverse probabilities • Bayes theorem • Assume generative model describing a process giving rise to some data P ( x | y, H ) P ( y |H ) • Forward probability P ( y | x, H ) = (7) P ( x |H ) – Task is to compute probability distribution of some quantity that depends on data P ( x | y, H ) P ( y |H ) = (8) � y ′ P ( x | y ′ , H ) P ( y ′ |H ) • Inverse probability – Task is to compute probability distribution of unobserved variables given data • Two random variables X and Y are independent ( X ⊥ Y ) if and only if – Requires use of Bayes’ theorem P ( x, y ) = P ( x ) P ( y ) (9) Juha Raitio 5th February 2004 Juha Raitio 5th February 2004

  3. 8 10 Inference with inverse probabilities Decomposability of entropy • Inference on parameters θ given data D and hypothesis H by Bayes’ theorem • Entropy of probability distribution p = { p 1 , p 2 , . . . , p I } P ( θ | D, H ) = P ( D | θ, H ) P ( θ |H ) � � p 2 p 3 p I , (10) H ( p ) = H ( p 1 , 1 − p 1 ) + (1 − p 1 ) H , , . . . , (15) P ( D |H ) 1 − p 1 1 − p 1 1 − p 1 where • More generally P ( θ |H ) is the prior probability for parameters H ( p ) = H [( p 1 + p 2 + . . . + p m ) , ( p m +1 + p m +2 + . . . + p I )] P ( D | θ, H ) is the likelihood of the parameters given the data P ( D |H ) is the evidence � p 1 p m � +( p 1 + . . . + p m ) H ( p 1 + . . . + p m ) , . . . , (16) P ( θ | D, H ) is the posterior probability for parameters ( p 1 + . . . + p m ) � � p m +1 p I +( p m +1 + . . . + p I ) H ( p m +1 + . . . + p I ) , . . . , • in written ( p m +1 + . . . + p I ) posterior = likelihood × prior (11) evidence Juha Raitio 5th February 2004 Juha Raitio 5th February 2004 9 11 Shannon information and entropy Relative entropy • Shannon information content of an outcome x = a i (bits) • Kullback-Leibler divergence between P ( x ) and Q ( x ) over alphabet A X 1 h ( x = a i ) = log 2 (12) P ( x ) log P ( x ) � P ( x = a i ) D KL ( P � Q ) = (17) Q ( x ) x • Entropy of an ensemble X (bits) • Properties of relative entropy 1 � H ( X ) ≡ P ( x ) log (13) – Gibbs’ inequality : D KL ( P � Q ) ≥ 0 and D KL ( P � Q ) = 0 , if P = Q P ( x ) x ∈A X – in general D KL ( P � Q ) � = D KL ( Q � P ) • Joint entropy of X, Y 1 � H ( X, Y ) ≡ P ( x, y ) log (14) P ( x, y ) xy ∈A X A Y Juha Raitio 5th February 2004 Juha Raitio 5th February 2004

  4. 12 Convex and concave functions • f ( x ) is convex over ( a, b ) , if for all x 1 , x 2 ∈ ( a, b ) and 0 ≤ λ ≤ 1 f ( λx 1 + (1 − λ ) x 2 ) ≤ λf ( x 1 ) + (1 − λ ) f ( x 2 ) (18) • f ( x ) is concave is the above above holds for f with the inequities reversed • f ( x ) is strictly convex (concave) if the equality in (18) holds only for λ = 0 and λ = 1 • Jensen’s inequality for convex function f ( x ) of random variable x E [ f ( x )] ≥ f ( E [ x ]) , where E denotes expectation (19) • If f ( x ) is convex (concave) and ∇ f ( x ) = 0 , then f has its minimum (maximum) value at x Juha Raitio 5th February 2004 13 Problems 1. A circular coin of diameter a is thrown onto a square grid whose squares are b × b , ( a < b ) . What is the probability that the coin will lie entirely within one square? (MacKay exercise 2 . 31 ) 2. The inhabitants of an island tell the truth one third of the time. They lie with probability 2 / 3 . On an occasion, after one of them made a statement, you ask another ’was the statement true?’ and he says ’yes’. What is the probability that the statement was indeed true? (MacKay exercise 2 . 37 ) Juha Raitio 5th February 2004

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend