Bayesian networks Lecture 11 David Sontag New York - PowerPoint PPT Presentation

Bayesian ¡networks ¡ Lecture ¡11 ¡ David ¡Sontag ¡ New ¡York ¡University ¡

Outline ¡for ¡today ¡ • Modeling ¡ sequen&al ¡data ¡(e.g., ¡=me ¡series, ¡ speech ¡processing) ¡using ¡hidden ¡Markov ¡ models ¡(HMMs) ¡ • Bayesian ¡networks ¡ ¡ – Independence ¡proper=es ¡ – Examples ¡ – Learning ¡and ¡inference ¡

Example ¡applica=on: ¡Tracking ¡ Observe ¡noisy ¡measurements ¡of ¡ missile ¡loca=on: ¡Y 1 , ¡Y 2 , ¡… ¡ Radar ¡ Where ¡is ¡the ¡missile ¡ now ? ¡Where ¡will ¡it ¡be ¡in ¡10 ¡seconds? ¡

Probabilis=c ¡approach ¡ • Our ¡measurements ¡of ¡the ¡missile ¡loca=on ¡were ¡ Y 1 , ¡Y 2 , ¡…, ¡Y n ¡ • Let ¡X t ¡be ¡the ¡ true ¡<missile ¡loca=on, ¡velocity> ¡at ¡ =me ¡t ¡ • To ¡keep ¡this ¡simple, ¡suppose ¡that ¡everything ¡is ¡ discrete, ¡i.e. ¡X t ¡takes ¡the ¡values ¡1, ¡…, ¡k ¡ Grid ¡the ¡space: ¡

Probabilis=c ¡approach ¡ • First, ¡we ¡specify ¡the ¡ condi&onal ¡distribu=on ¡ Pr(X t ¡| ¡X t-‑1 ): ¡ From ¡basic ¡physics, ¡we ¡can ¡bound ¡ the ¡distance ¡that ¡the ¡missile ¡can ¡ have ¡traveled ¡ • Then, ¡we ¡specify ¡Pr(Y t ¡| ¡X t =<(10,20), ¡200 ¡mph ¡ toward ¡the ¡northeast>): ¡ With ¡probability ¡½, ¡Y t ¡= ¡X t ¡(ignoring ¡the ¡velocity). ¡Otherwise, ¡Y t ¡is ¡a ¡ uniformly ¡chosen ¡grid ¡loca=on ¡

Hidden ¡Markov ¡models ¡ 1960’s ¡ • Assume ¡that ¡the ¡ joint ¡distribu=on ¡on ¡X 1, ¡ X 2 , ¡…, ¡X n ¡and ¡Y 1 , ¡Y 2 , ¡ …, ¡Y n ¡factors ¡as ¡follows: ¡ n Y Pr( x 1 , . . . x n , y 1 , . . . , y n ) = Pr( x 1 ) Pr( y 1 | x 1 ) Pr( x t | x t − 1 ) Pr( y t | x t ) t =2 • To ¡find ¡out ¡where ¡the ¡missile ¡is ¡ now , ¡we ¡do ¡ marginal ¡ inference : ¡ Pr( x n | y 1 , . . . , y n ) • To ¡find ¡the ¡most ¡likely ¡ trajectory , ¡we ¡do ¡ MAP ¡(maximum ¡a ¡ posteriori) ¡inference : ¡ arg max Pr( x 1 , . . . , x n | y 1 , . . . , y n ) x

Inference ¡ • Recall, ¡to ¡find ¡out ¡where ¡the ¡missile ¡is ¡now, ¡we ¡do ¡marginal ¡ inference: ¡ Pr( x n | y 1 , . . . , y n ) • How ¡does ¡one ¡ compute ¡this? ¡ • Applying ¡rule ¡of ¡condi=onal ¡probability, ¡we ¡have: ¡ ¡ Pr( x n | y 1 , . . . , y n ) = Pr( x n , y 1 , . . . , y n ) Pr( x n , y 1 , . . . , y n ) = P k Pr( y 1 , . . . , y n ) x n =1 Pr(ˆ x n , y 1 , . . . , y n ) ˆ • Naively, ¡would ¡seem ¡to ¡require ¡k n-‑1 ¡summa=ons, ¡ Is ¡there ¡a ¡ more ¡efficient ¡ X algorithm? ¡ Pr( x n , y 1 , . . . , y n ) = Pr( x 1 , . . . , x n , y 1 , . . . , y n ) x 1 ,...,x n − 1

Marginal ¡inference ¡in ¡HMMs ¡ • Use ¡ dynamic ¡programming ¡ X Pr( A = a ) = Pr( B = b, A = a ) X Pr( x n , y 1 , . . . , y n ) = Pr( x n − 1 , x n , y 1 , . . . , y n ) b Pr( � a, � B = � b ) = Pr( � a ) Pr( � B = � b | � A = � A = � A = � a ) x n − 1 X = Pr( x n − 1 , y 1 , . . . , y n − 1 ) Pr( x n , y n | x n − 1 , y 1 , . . . , y n − 1 ) Condi=onal ¡independence ¡in ¡HMMs ¡ x n − 1 X = Pr( x n − 1 , y 1 , . . . , y n − 1 ) Pr( x n , y n | x n − 1 ) x n − 1 Pr( A = a, B = b ) = Pr( A = a ) Pr( B = b | A = a ) X = Pr( x n − 1 , y 1 , . . . , y n − 1 ) Pr( x n | x n − 1 ) Pr( y n | x n , x n − 1 ) Condi=onal ¡independence ¡in ¡HMMs ¡ x n − 1 X = Pr( x n − 1 , y 1 , . . . , y n − 1 ) Pr( x n | x n − 1 ) Pr( y n | x n ) x n − 1 • For ¡n=1, ¡ini=alize ¡ ¡ Pr( x 1 , y 1 ) = Pr( x 1 ) Pr( y 1 | x 1 ) Easy ¡to ¡do ¡ filtering ¡ • Total ¡running ¡=me ¡is ¡O(nk) ¡– ¡linear ¡=me! ¡

MAP ¡inference ¡in ¡HMMs ¡ • MAP ¡inference ¡in ¡HMMs ¡can ¡ also ¡be ¡solved ¡in ¡linear ¡=me! ¡ arg max Pr( x 1 , . . . x n | y 1 , . . . , y n ) = arg max Pr( x 1 , . . . x n , y 1 , . . . , y n ) x x = arg max log Pr( x 1 , . . . x n , y 1 , . . . , y n ) x n h i h i X = arg max log Pr( x 1 ) Pr( y 1 | x 1 ) + log Pr( x i | x i − 1 ) Pr( y i | x i ) x i =2 • Formulate ¡as ¡a ¡shortest ¡paths ¡problem ¡ Weight ¡for ¡edge ¡(x i-‑1 , ¡x i ) ¡is ¡ -‑ ¡ Weight ¡for ¡edge ¡(s, ¡x 1 ) ¡is ¡ h i log Pr( x i | x i − 1 ) Pr( y i | x i ) Path ¡from ¡s ¡to ¡t ¡gives ¡ -‑ ¡ h i log Pr( x 1 ) Pr( y 1 | x 1 ) the ¡MAP ¡assignment ¡ … ¡ s ¡ t ¡ Weight ¡for ¡edge ¡(x n , ¡t) ¡is ¡0 ¡ k ¡nodes ¡per ¡variable ¡ X 1 ¡ X 2 ¡ X n-‑1 ¡ X n ¡ Called ¡the ¡Viterbi ¡algorithm ¡

Applica=ons ¡of ¡HMMs ¡ • Speech ¡recogni=on ¡ – Predict ¡phonemes ¡from ¡the ¡sounds ¡forming ¡words ¡(i.e., ¡the ¡ actual ¡signals) ¡ • Natural ¡language ¡processing ¡ – Predict ¡parts ¡of ¡speech ¡(verb, ¡noun, ¡determiner, ¡etc.) ¡from ¡ the ¡words ¡in ¡a ¡sentence ¡ • Computa=onal ¡biology ¡ – Predict ¡intron/exon ¡regions ¡from ¡DNA ¡ – Predict ¡protein ¡structure ¡from ¡DNA ¡(locally) ¡ • And ¡many ¡many ¡more! ¡

HMMs ¡as ¡a ¡ graphical ¡model ¡ We ¡can ¡represent ¡a ¡hidden ¡Markov ¡model ¡with ¡a ¡graph: ¡ • X 1 ¡ X 2 ¡ X 3 ¡ X 4 ¡ X 5 ¡ X 6 ¡ Shading ¡in ¡denotes ¡ observed ¡variables ¡(e.g. ¡what ¡ is ¡available ¡at ¡test ¡=me) ¡ Y 1 ¡ Y 2 ¡ Y 3 ¡ Y 4 ¡ Y 5 ¡ Y 6 ¡ n Y Pr( x 1 , . . . x n , y 1 , . . . , y n ) = Pr( x 1 ) Pr( y 1 | x 1 ) Pr( x t | x t − 1 ) Pr( y t | x t ) t =2 There ¡is ¡a ¡1-‑1 ¡mapping ¡between ¡the ¡graph ¡structure ¡and ¡the ¡factoriza=on ¡ • of ¡the ¡joint ¡distribu=on ¡

Naïve ¡Bayes ¡as ¡a ¡ graphical ¡model ¡ We ¡can ¡represent ¡a ¡naïve ¡Bayes ¡model ¡with ¡a ¡graph: ¡ • Label Y Shading ¡in ¡denotes ¡ observed ¡variables ¡(e.g. ¡what ¡ is ¡available ¡at ¡test ¡=me) ¡ . . . X1 X2 X3 Xn Features n Y Pr( y, x 1 , . . . , x n ) = Pr( y ) Pr( x i | y ) i =1 There ¡is ¡a ¡1-‑1 ¡mapping ¡between ¡the ¡graph ¡structure ¡and ¡the ¡factoriza=on ¡ • of ¡the ¡joint ¡distribu=on ¡

Bayesian ¡networks ¡ • A ¡ Bayesian ¡network ¡is ¡specified ¡by ¡a ¡directed ¡ acyclic ¡graph ¡ G=(V,E) ¡with: ¡ – One ¡node ¡ i ¡for ¡each ¡random ¡variable ¡ X i ¡ – One ¡condi=onal ¡probability ¡distribu=on ¡(CPD) ¡per ¡node, ¡ p ( x i ¡| ¡ x Pa(i) ), ¡ specifying ¡the ¡variable’s ¡probability ¡condi=oned ¡on ¡its ¡parents’ ¡values ¡ • Corresponds ¡1-‑1 ¡with ¡a ¡par=cular ¡factoriza=on ¡of ¡the ¡joint ¡ distribu=on: ¡ Y p ( x 1 , . . . x n ) = p ( x i | x Pa ( i ) ) i ∈ V • Powerful ¡framework ¡for ¡designing ¡ algorithms ¡to ¡perform ¡ probability ¡computa=ons ¡

2011 ¡Turing ¡award ¡was ¡for ¡Bayesian ¡networks ¡

Example ¡ • Consider ¡the ¡following ¡Bayesian ¡network: ¡ d 0 d 1 i 0 i 1 0.6 0.4 0.7 0.3 Difficulty Intelligence Example ¡from ¡Koller ¡& ¡ g 1 g 2 g 3 Friedman, ¡P robabilis&c ¡ Grade SAT i 0 , d 0 0.3 0.4 0.3 Graphical ¡Models, ¡ 2009 ¡ i 0 , d 1 0.05 0.25 0.7 s 0 s 1 i 0 , d 0 0.9 0.08 0.02 Letter i 0 , d 1 i 0 0.5 0.3 0.2 0.95 0.05 i 1 0.2 0.8 l 0 l 1 g 1 0.1 0.9 g 2 0.4 0.6 g 2 0.99 0.01 • What ¡is ¡its ¡joint ¡distribu=on? ¡ Y p ( x 1 , . . . x n ) = p ( x i | x Pa ( i ) ) i ∈ V p ( d , i , g , s , l ) = p ( d ) p ( i ) p ( g | i , d ) p ( s | i ) p ( l | g )

Example ¡ • Consider ¡the ¡following ¡Bayesian ¡network: ¡ d 0 d 1 i 0 i 1 0.6 0.4 0.7 0.3 Difficulty Intelligence Example ¡from ¡Koller ¡& ¡ g 1 g 2 g 3 Friedman, ¡P robabilis&c ¡ Grade SAT i 0 , d 0 0.3 0.4 0.3 Graphical ¡Models, ¡ 2009 ¡ i 0 , d 1 0.05 0.25 0.7 s 0 s 1 i 0 , d 0 0.9 0.08 0.02 Letter i 0 , d 1 i 0 0.5 0.3 0.2 0.95 0.05 i 1 0.2 0.8 l 0 l 1 g 1 0.1 0.9 g 2 0.4 0.6 g 2 0.99 0.01 • What ¡is ¡this ¡model ¡assuming? ¡ SAT 6? Grade SAT ⊥ Grade | Intelligence

Bayesian networks Lecture 11 David Sontag New York - PowerPoint PPT Presentation

Bayesian networks Lecture 11 David Sontag New York University Outline for today Modeling sequen&al data (e.g., =me series, speech processing) using

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Networks Volker Sorge Intro to AI: Specifying Probability Distributions Lecture 8

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Probabilistic Modeling: Bayesian Networks Bioinformatics: Sequence Analysis COMP 571 - Spring

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12

CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 5: Composition of State Machines Hao

Information and Learning in Markets by Xavier Vives, Princeton University Press 2008

MLSS 2016 Prac<cal Machine Learning for Networks

Introduction to Artificial Intelligence Object Recognition Classifiers Cascade and HOG/SVM

BORDERLINE PERSONALITY DISORDER Edward A. Selby, Ph.D. Assistant Professor Department of

Sequential Fundraising and Social Insurance Amir Ban (Weizmann Institute of Science) and Moran

Update on Cascade Care procurement activities and proposed timeline Senior Citizens

Non-equilibrium condensation in WT & GP models Sergey Nazarenko INPHYNI (Insitute de

Bayesian networks Lecture 11 David Sontag New York - PowerPoint PPT Presentation

Bayesian networks Lecture 11 David Sontag New York University Outline for today Modeling sequen&al data (e.g., =me series, speech processing) using

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Networks Volker Sorge Intro to AI: Specifying Probability Distributions Lecture 8

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Probabilistic Modeling: Bayesian Networks Bioinformatics: Sequence Analysis COMP 571 - Spring

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12

CIS 4930/6930: Principles of Cyber-Physical Systems Chapter 5: Composition of State Machines Hao

Information and Learning in Markets by Xavier Vives, Princeton University Press 2008

MLSS 2016 Prac&lt;cal Machine Learning for Networks

Introduction to Artificial Intelligence Object Recognition Classifiers Cascade and HOG/SVM

BORDERLINE PERSONALITY DISORDER Edward A. Selby, Ph.D. Assistant Professor Department of

Sequential Fundraising and Social Insurance Amir Ban (Weizmann Institute of Science) and Moran

Update on Cascade Care procurement activities and proposed timeline Senior Citizens

Non-equilibrium condensation in WT &amp; GP models Sergey Nazarenko INPHYNI (Insitute de

MLSS 2016 Prac<cal Machine Learning for Networks

Non-equilibrium condensation in WT & GP models Sergey Nazarenko INPHYNI (Insitute de