conditional probability estimation
play

Conditional Probability Estimation Marco Cattaneo School of - PowerPoint PPT Presentation

Conditional Probability Estimation Marco Cattaneo School of Mathematics and Physical Sciences University of Hull PGM 2016, Lugano, Switzerland 7 September 2016 MLE of conditional probability given: a probabilistic model P with unknown


  1. Conditional Probability Estimation Marco Cattaneo School of Mathematics and Physical Sciences University of Hull PGM 2016, Lugano, Switzerland 7 September 2016

  2. MLE of conditional probability ◮ given: a probabilistic model P θ with unknown θ , past data D , and events E , Q concerning some new (independent) data Marco Cattaneo @ University of Hull Conditional Probability Estimation 2/7

  3. MLE of conditional probability ◮ given: a probabilistic model P θ with unknown θ , past data D , and events E , Q concerning some new (independent) data ◮ MLE of P θ ( Q | E ) = P θ ( Q | D ∩ E ): ˆ P ˆ θ D ( Q | E ) with θ D = arg max P θ ( D ) (wrong) θ Marco Cattaneo @ University of Hull Conditional Probability Estimation 2/7

  4. MLE of conditional probability ◮ given: a probabilistic model P θ with unknown θ , past data D , and events E , Q concerning some new (independent) data ◮ MLE of P θ ( Q | E ) = P θ ( Q | D ∩ E ): ˆ P ˆ θ D ( Q | E ) with θ D = arg max P θ ( D ) (wrong) θ ˆ P ˆ θ D ∩ E ( Q | E ) with θ D ∩ E = arg max P θ ( D ∩ E ) (right) θ Marco Cattaneo @ University of Hull Conditional Probability Estimation 2/7

  5. MLE of conditional probability ◮ given: a probabilistic model P θ with unknown θ , past data D , and events E , Q concerning some new (independent) data ◮ MLE of P θ ( Q | E ) = P θ ( Q | D ∩ E ): ˆ P ˆ θ D ( Q | E ) with θ D = arg max P θ ( D ) (wrong) θ ˆ P ˆ θ D ∩ E ( Q | E ) with θ D ∩ E = arg max P θ ( D ∩ E ) (right) θ ◮ when P θ is a (generalized) regression model, and E , Q describe predictors and response, respectively, then there is no difference between (right) and (wrong) Marco Cattaneo @ University of Hull Conditional Probability Estimation 2/7

  6. MLE of conditional probability ◮ given: a probabilistic model P θ with unknown θ , past data D , and events E , Q concerning some new (independent) data ◮ MLE of P θ ( Q | E ) = P θ ( Q | D ∩ E ): ˆ P ˆ θ D ( Q | E ) with θ D = arg max P θ ( D ) (wrong) θ ˆ P ˆ θ D ∩ E ( Q | E ) with θ D ∩ E = arg max P θ ( D ∩ E ) (right) θ ◮ when P θ is a (generalized) regression model, and E , Q describe predictors and response, respectively, then there is no difference between (right) and (wrong) ◮ when P θ is a Bayesian network, D is a training dataset, and E , Q concern some new instances, then the usual MLE is (wrong), and this partially explains the unsatisfactory performance of MLE for Bayesian networks Marco Cattaneo @ University of Hull Conditional Probability Estimation 2/7

  7. conditional probability estimation in Bayesian networks ◮ given: a DAG with vertices v ∈ V representing categorical variables X v , a complete training dataset D with counts n ( · ), and conjugate Dirichlet priors with parameters d ( · ) Marco Cattaneo @ University of Hull Conditional Probability Estimation 3/7

  8. conditional probability estimation in Bayesian networks ◮ given: a DAG with vertices v ∈ V representing categorical variables X v , a complete training dataset D with counts n ( · ), and conjugate Dirichlet priors with parameters d ( · ) ◮ estimates of local probability models: p D ( x v | x pa ( v ) ) = n ( x v , x pa ( v ) ) ˆ (ML) n ( x pa ( v ) ) p D ( x v | x pa ( v ) ) = n ( x v , x pa ( v ) ) + d ( x v , x pa ( v ) ) ˆ (Bayes) n ( x pa ( v ) ) + d ( x pa ( v ) ) Marco Cattaneo @ University of Hull Conditional Probability Estimation 3/7

  9. conditional probability estimation in Bayesian networks ◮ given: a DAG with vertices v ∈ V representing categorical variables X v , a complete training dataset D with counts n ( · ), and conjugate Dirichlet priors with parameters d ( · ) ◮ estimates of local probability models: p D ( x v | x pa ( v ) ) = n ( x v , x pa ( v ) ) ˆ (ML) n ( x pa ( v ) ) p D ( x v | x pa ( v ) ) = n ( x v , x pa ( v ) ) + d ( x v , x pa ( v ) ) ˆ (Bayes) n ( x pa ( v ) ) + d ( x pa ( v ) ) ◮ estimates of probabilities concerning a new instance: n ( x v , x pa ( v ) ) � � � � p D ( x Q ) = ˆ p D ( x v | x pa ( v ) ) = ˆ (ML) n ( x pa ( v ) ) x V\Q v ∈V x V\Q v ∈V n ( x v , x pa ( v ) ) + d ( x v , x pa ( v ) ) � � � � p D ( x Q ) = ˆ p D ( x v | x pa ( v ) ) = ˆ n ( x pa ( v ) ) + d ( x pa ( v ) ) x V\Q v ∈V x V\Q v ∈V (Bayes) Marco Cattaneo @ University of Hull Conditional Probability Estimation 3/7

  10. conditional probability estimation in Bayesian networks ◮ estimates of conditional probabilities concerning a new instance: � � v ∈V ˆ p D ( x v | x pa ( v ) ) x V\ ( Q∪E ) p D , x E ( x Q | x E ) = ˆ � � v ∈V ˆ p D ( x v | x pa ( v ) ) x V\Q (wrong ML) n ( x v , x pa ( v ) ) � � x V\ ( Q∪E ) v ∈V n ( x pa ( v ) ) = n ( x v , x pa ( v ) ) � � x V\Q v ∈V n ( x pa ( v ) ) � � v ∈V ˆ p D ( x v | x pa ( v ) ) x V\ ( Q∪E ) p D , x E ( x Q | x E ) = ˆ � � v ∈V ˆ p D ( x v | x pa ( v ) ) x V\Q (Bayes) n ( x v , x pa ( v ) )+ d ( x v , x pa ( v ) ) � � x V\ ( Q∪E ) v ∈V n ( x pa ( v ) )+ d ( x pa ( v ) ) = n ( x v , x pa ( v ) )+ d ( x v , x pa ( v ) ) � � x V\Q v ∈V n ( x pa ( v ) )+ d ( x pa ( v ) ) Marco Cattaneo @ University of Hull Conditional Probability Estimation 4/7

  11. conditional probability estimation in Bayesian networks ◮ estimates of conditional probabilities concerning a new instance: � � v ∈V ˆ p D , x E ( x v | x pa ( v ) ) x V\ ( Q∪E ) p D , x E ( x Q | x E ) = ˆ � � v ∈V ˆ p D , x E ( x v | x pa ( v ) ) x V\Q (ML) n ( x v , x pa ( v ) )+ˆ e D , x E ( x v , x pa ( v ) ) � � x V\ ( Q∪E ) v ∈V n ( x pa ( v ) )+ˆ e D , x E ( x pa ( v ) ) = n ( x v , x pa ( v ) )+ˆ e D , x E ( x v , x pa ( v ) ) � � x V\Q v ∈V n ( x pa ( v ) )+ˆ e D , x E ( x pa ( v ) ) � � v ∈V ˆ p D ( x v | x pa ( v ) ) x V\ ( Q∪E ) p D , x E ( x Q | x E ) = ˆ � � v ∈V ˆ p D ( x v | x pa ( v ) ) x V\Q (Bayes) n ( x v , x pa ( v ) )+ d ( x v , x pa ( v ) ) � � x V\ ( Q∪E ) v ∈V n ( x pa ( v ) )+ d ( x pa ( v ) ) = n ( x v , x pa ( v ) )+ d ( x v , x pa ( v ) ) � � x V\Q v ∈V n ( x pa ( v ) )+ d ( x pa ( v ) ) Marco Cattaneo @ University of Hull Conditional Probability Estimation 4/7

  12. conditional probability estimation in Bayesian networks ◮ estimates of conditional probabilities concerning a new instance: � � v ∈V ˆ p D , x E ( x v | x pa ( v ) ) x V\ ( Q∪E ) p D , x E ( x Q | x E ) = ˆ � � v ∈V ˆ p D , x E ( x v | x pa ( v ) ) x V\Q (ML) n ( x v , x pa ( v ) )+ˆ e D , x E ( x v , x pa ( v ) ) � � x V\ ( Q∪E ) v ∈V n ( x pa ( v ) )+ˆ e D , x E ( x pa ( v ) ) = n ( x v , x pa ( v ) )+ˆ e D , x E ( x v , x pa ( v ) ) � � x V\Q v ∈V n ( x pa ( v ) )+ˆ e D , x E ( x pa ( v ) ) � � v ∈V ˆ p D ( x v | x pa ( v ) ) x V\ ( Q∪E ) p D , x E ( x Q | x E ) = ˆ � � v ∈V ˆ p D ( x v | x pa ( v ) ) x V\Q (Bayes) n ( x v , x pa ( v ) )+ d ( x v , x pa ( v ) ) � � x V\ ( Q∪E ) v ∈V n ( x pa ( v ) )+ d ( x pa ( v ) ) = n ( x v , x pa ( v ) )+ d ( x v , x pa ( v ) ) � � x V\Q v ∈V n ( x pa ( v ) )+ d ( x pa ( v ) ) ◮ ˆ e D , x E ( · ) are the MLE of expected counts for the new instance, obtained from the EM algorithm Marco Cattaneo @ University of Hull Conditional Probability Estimation 4/7

  13. √ performance comparison: MSE ◮ given: 3 binary variables X 1 , X 2 , Y with X 1 ⊥ X 2 | Y and p ( x 1 | y ) = p ( ¬ x 1 | ¬ y ) = 99%, while p ( ¬ x 2 | y ) = p ( ¬ x 2 | ¬ y ) = 99% Marco Cattaneo @ University of Hull Conditional Probability Estimation 5/7

  14. √ performance comparison: MSE ◮ given: 3 binary variables X 1 , X 2 , Y with X 1 ⊥ X 2 | Y and p ( x 1 | y ) = p ( ¬ x 1 | ¬ y ) = 99%, while p ( ¬ x 2 | y ) = p ( ¬ x 2 | ¬ y ) = 99% ◮ estimate p ( y | x 1 , x 2 ) on the basis of a complete training dataset of size 100: 1.0 incomplete ML (when it exists) complete ML (when incomplete ML exists) Bayes−Laplace (when incomplete ML exists) complete ML (unconditional) Bayes−Laplace (unconditional) probability that incomplete ML exists 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 p(y) Marco Cattaneo @ University of Hull Conditional Probability Estimation 5/7

  15. √ performance comparison: MSE ◮ given: 3 binary variables X 1 , X 2 , Y with X 1 ⊥ X 2 | Y and p ( x 1 | y ) = p ( ¬ x 1 | ¬ y ) = 99%, while p ( ¬ x 2 | y ) = p ( x 2 | ¬ y ) = 90% Marco Cattaneo @ University of Hull Conditional Probability Estimation 6/7

  16. √ performance comparison: MSE ◮ given: 3 binary variables X 1 , X 2 , Y with X 1 ⊥ X 2 | Y and p ( x 1 | y ) = p ( ¬ x 1 | ¬ y ) = 99%, while p ( ¬ x 2 | y ) = p ( x 2 | ¬ y ) = 90% ◮ estimate p ( y | x 1 , x 2 ) on the basis of a complete training dataset of size 100: 1.0 0.8 0.6 incomplete ML (when it exists) complete ML (when incomplete ML exists) Bayes−Laplace (when incomplete ML exists) complete ML (unconditional) Bayes−Laplace (unconditional) 0.4 probability that incomplete ML exists 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 p(y) Marco Cattaneo @ University of Hull Conditional Probability Estimation 6/7

  17. conclusion ◮ the following way of using Bayesian networks is in agreement with Bayes estimation, but not with ML estimation: estimate the local probability models of a Bayesian network from data, and then use the resulting global model to calculate conditional probabilities of future events Marco Cattaneo @ University of Hull Conditional Probability Estimation 7/7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend