probability and information theory
play

Probability and Information Theory Lecture slides for Chapter 3 of - PowerPoint PPT Presentation

Probability and Information Theory Lecture slides for Chapter 3 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26 Probability Mass Function The domain of P must be the set of all possible states of x. x x , 0


  1. Probability and Information Theory Lecture slides for Chapter 3 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26

  2. Probability Mass Function • The domain of P must be the set of all possible states of x. • ∀ x ∈ x , 0 ≤ P ( x ) ≤ 1 . An impossible event has probability 0 and no state can be less probable than that. Likewise, an event that is guaranteed to happen has probability 1 , and no state can have a greater chance of occurring. • P x ∈ x P ( x ) = 1 . We refer to this property as being normalized . Without this property, we could obtain probabilities greater than one by computing the probability of one of many events occurring. Example: uniform distribution: P ( x = x i ) = 1 k (Goodfellow 2016)

  3. Probability Density Function • The domain of p must be the set of all possible states of x. • 8 x 2 x , p ( x ) � 0 . Note that we do not require p ( x )  1 . R p ( x ) dx = 1 . • mass outside the 1 u ( x ; a, b ) = b − a . Example: uniform distribution: integrates to 1. W (Goodfellow 2016)

  4. Computing Marginal Probability with the Sum Rule X 8 x 2 x , P ( x = x ) = P ( x = x, y = y ) . (3.3) y Z p ( x ) = p ( x, y ) dy. (3.4) (Goodfellow 2016)

  5. Conditional Probability P ( y = y | x = x ) = P ( y = y, x = x ) (3.5) . P ( x = x ) (Goodfellow 2016)

  6. Chain Rule of Probability i =2 P ( x ( i ) | x (1) , . . . , x ( i − 1) ) . P ( x (1) , . . . , x ( n ) ) = P ( x (1) ) Π n (3.6) (Goodfellow 2016)

  7. Independence ∀ x ∈ x , y ∈ y , p ( x = x, y = y ) = p ( x = x ) p ( y = y ) . (3.7) ndom variables x and y are given a random (Goodfellow 2016)

  8. Conditional Independence ∀ x ∈ x , y ∈ y , z ∈ z , p ( x = x, y = y | z = z ) = p ( x = x | z = z ) p ( y = y | z = z ) . (3.8) We can denote independence and conditional independence with compact (Goodfellow 2016)

  9. Expectation X E x ∼ P [ f ( x )] = P ( x ) f ( x ) , (3.9) x Z E x ∼ p [ f ( x )] = p ( x ) f ( x ) dx. (3.10) linearity of expectations: E x [ α f ( x ) + β g ( x )] = α E x [ f ( x )] + β E x [ g ( x )] , (3.11) (Goodfellow 2016)

  10. Variance and Covariance h ( f ( x ) − E [ f ( x )]) 2 i Var( f ( x )) = E (3.12) . Cov( f ( x ) , g ( y )) = E [( f ( x ) − E [ f ( x )]) ( g ( y ) − E [ g ( y )])] . (3.13) Covariance matrix: Cov( x ) i,j = Cov( x i , x j ) . (3.14) f the covariance give the variance: (Goodfellow 2016)

  11. Bernoulli Distribution P ( x = 1) = φ (3.16) P ( x = 0) = 1 − φ (3.17) P ( x = x ) = φ x (1 − φ ) 1 − x (3.18) E x [ x ] = φ (3.19) Var x ( x ) = φ (1 − φ ) (3.20) (Goodfellow 2016)

  12. Gaussian Distribution Parametrized by variance: r ✓ ◆ 1 − 1 N ( x ; µ, σ 2 ) = 2 σ 2 ( x − µ ) 2 2 πσ 2 exp (3.21) . Parametrized by precision: 3.1 for a plot of the density function. r ✓ ◆ − 1 β N ( x ; µ, β � 1 ) = 2 β ( x − µ ) 2 (3.22) 2 π exp . (Goodfellow 2016)

  13. Gaussian Distribution 0 . 40 0 . 35 Maximum at x = µ 0 . 30 0 . 25 Inflection points at p(x) x = µ ± σ 0 . 20 0 . 15 0 . 10 0 . 05 0 . 00 − 2 . 0 − 1 . 5 − 1 . 0 − 0 . 5 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0 Figure 3.1 (Goodfellow 2016)

  14. Multivariate Gaussian Parametrized by covariance matrix: s 1 ✓ − 1 ◆ 2( x − µ ) > Σ � 1 ( x − µ ) N ( x ; µ , Σ ) = (2 π ) n det( Σ ) exp (3.23) . Parametrized by precision matrix: s ✓ ◆ det( β ) − 1 N ( x ; µ , β � 1 ) = 2( x − µ ) > β ( x − µ ) (2 π ) n exp (3.24) . (Goodfellow 2016)

  15. More Distributions Exponential: (3.25) p ( x ; λ ) = λ 1 x � 0 exp ( − λ x ) . ution uses the indicator function to assign probability Laplace: ✓ − | x − µ | ◆ Laplace( x ; µ, γ ) = 1 2 γ exp (3.26) . γ Dirac: p ( x ) = δ ( x − µ ) . (3.27) (Goodfellow 2016)

  16. Empirical Distribution m p ( x ) = 1 X δ ( x − x ( i ) ) ˆ (3.28) m i =1 (Goodfellow 2016)

  17. Mixture Distributions X P ( x ) = P ( c = i ) P ( x | c = i ) (3.29) i Gaussian mixture with three components x 2 x 1 Figure 3.2 (Goodfellow 2016)

  18. Logistic Sigmoid 1 . 0 0 . 8 0 . 6 σ ( x ) 0 . 4 0 . 2 0 . 0 − 10 − 5 0 5 10 Figure 3.3: The logistic sigmoid function. Commonly used to parametrize Bernoulli distributions (Goodfellow 2016)

  19. Softplus Function 10 8 6 ζ ( x ) 4 2 0 − 10 − 5 0 5 10 Figure 3.4: The softplus function. (Goodfellow 2016)

  20. Bayes’ Rule P ( x | y ) = P ( x ) P ( y | x ) (3.42) . P ( y ) appears in the formula, it is usually feasible to compute (Goodfellow 2016)

  21. Change of Variables � ◆� ✓ ∂ g ( x ) � � p x ( x ) = p y ( g ( x )) � det (3.47) � . � � ∂ x (Goodfellow 2016)

  22. Information Theory Information: I ( x ) = − log P ( x ) . (3.48) Entropy: H ( x ) = E x ∼ P [ I ( x )] = � E x ∼ P [log P ( x )] . (3.49) KL divergence:  � log P ( x ) D KL ( P k Q ) = E x ∼ P = E x ∼ P [log P ( x ) � log Q ( x )] . (3.50) Q ( x ) (Goodfellow 2016)

  23. Entropy of a Bernoulli Variable 0 . 7 0 . 6 Shannon entropy in nats 0 . 5 0 . 4 0 . 3 0 . 2 0 . 1 0 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Bernoulli parameter Figure 3.5 (Goodfellow 2016)

  24. The KL Divergence is Asymmetric q ∗ = argmin q D KL ( p � q ) q ∗ = argmin q D KL ( q � p ) p ( x ) p( x ) Probability Density Probability Density q ∗ ( x ) q ∗ ( x ) x x Figure 3.6 (Goodfellow 2016)

  25. Directed Model b a Figure 3.7 d c e p ( a , b , c , d , e ) = p ( a ) p ( b | a ) p ( c | a , b ) p ( d | b ) p ( e | c ) . (3.54) (Goodfellow 2016)

  26. Undirected Model b a d c Figure 3.8 e p ( a , b , c , d , e ) = 1 Z φ (1) ( a , b , c ) φ (2) ( b , d ) φ (3) ( c , e ) . (3.56) (Goodfellow 2016)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend