 
              4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang Biointelligence Lab Seoul National University B io 4190.408 Artificial Intelligence ( 2016-Spring) 1 I ntelligence
Machine Learning? • Learning System : – A system which autonomously improves its performance (P) by automatically forming model (M) based on experiential data (D) obtained from interaction with environment (E) • Self-improving Systems (Perspective of AI) • Knowledge Discovery (Perspective of Data Mining) • Data-Driven Software Design (Perspective of Software Engineering) • Automatic Programming (Perspective of Computer Engineering) B io 4190.408 Artificial Intelligence ( 2016-Spring) 2 I ntelligence
Machine Learning as Automatic Programming Traditional Programming Data Output Computer Program Machine Learning Data Program Computer Output B io 4190.408 Artificial Intelligence ( 2016-Spring) 3 I ntelligence
Machine Learning (ML): Three Tasks • Supervised Learning – Estimate an unknown mapping from known input and target output pairs   – Learn f w from training set D = {( x , y )} s.t. ( ) ( ) f x y f x w – Classification: y is discrete – Regression: y is continuous • Unsupervised Learning – Only input values are provided  – Learn f w from D = {( x )} s.t. ( ) f x x w – Density estimation and compression – Clustering, dimension reduction • Sequential (Reinforcement) Learning – Not target, but rewards (critiques) are provided “sequentially” – Learn a heuristic function f w from D t = {( s t , a t , r t ) | t = 1, 2, …} s.t. ( , , ) f w s a r t t t – With respect to the future, not just past – Sequential decision-making – Action selection and policy learning B io 4190.408 Artificial Intelligence ( 2016-Spring) 4 I ntelligence
Machine Learning Models • Supervised Learning • Probabilistic Graph – Neural Nets – Bayesian Networks – Decision Trees – Markov Networks – K-Nearest Neighbors – Hidden Markov Models – Support Vector – Hypernetworks Machines • Unsupervised Learning • Dynamic System – Self-Organizing Maps – Kalman Filters – Clustering Algorithms – Sequential Monte Carlo – Manifold Learning – Particle Filters – Evolutionary Learning – Reinforcement Learning B io 4190.408 Artificial Intelligence ( 2016-Spring) 5 I ntelligence
Outline • Bayesian Inference – Monte Carlo – Importance Sampling – MCMC • Probabilistic Graphical Models – Bayesian Networks – Markov Random Fields • Hypernetworks – Architecture and Algorithms – Application Examples • Discussion B io 4190.408 Artificial Intelligence ( 2016-Spring) 6 I ntelligence
Bayes Theorem B io 4190.408 Artificial Intelligence ( 2016-Spring) 7 I ntelligence
MAP vs. ML • What is the most probable hypothesis given data? • From Bayes Theorem • MAP (Maximum A Posteriori) • ML (Maximum Likelihood) B io 4190.408 Artificial Intelligence ( 2016-Spring) 8 I ntelligence
Bayesian Inference B io 4190.408 Artificial Intelligence ( 2016-Spring) 9 I ntelligence
Prof. Schrater’s Lecture Notes (Univ. of Minnesota) B io 4190.408 Artificial Intelligence ( 2016-Spring) 10 I ntelligence
B io 4190.408 Artificial Intelligence ( 2016-Spring) 11 I ntelligence
Monte Carlo (MC) Approximation B io 4190.408 Artificial Intelligence ( 2016-Spring) 12 I ntelligence
Markov chain Monte Carlo B io 4190.408 Artificial Intelligence ( 2016-Spring) 13 I ntelligence
MC with Importance Sampling B io 4190.408 Artificial Intelligence ( 2016-Spring) 14 I ntelligence
Graphical Models Graphical Models (GM) Other Semantics Causal Models Chain Graphs Dependency Networks Directed GMs Undirected GMs Bayesian Networks Markov Random Fields / Markov FST DBNs Mixture networks Decision Models Simple Trees HMMs Kalman Models Segment Models Gibbs/Boltzman Factorial HMM Mixed PCA BMMs Distributions Memory Markov Models LDA B io 4190.408 Artificial Intelligence ( 2016-Spring) 15 I ntelligence
BAYESIAN NETWORKS B io 4190.408 Artificial Intelligence ( 2016-Spring) 16 I ntelligence
Bayesian Networks • Bayesian network – DAG (Directed Acyclic Graph) – Express dependence relations between variables – Can use prior knowledge on the data (parameters) n   A B C ( ) ( | ) P X P X pa i i  1 i P ( A,B,C,D,E ) = P ( A ) P ( B|A ) P ( C|B ) P ( D|A,B ) P ( E|B,C,D ) D E B io 4190.408 Artificial Intelligence ( 2016-Spring) 17 I ntelligence
Representing Probability Distributions • Probability distribution = probability for each combination of values of these attributes Hospital patients described by • Background: age, gender, history of diseases, … • Symptoms: fever, blood pressure, headache, … • Diseases: pneumonia, heart attack, … • Naïve representations (such as tables) run into troubles – 20 attributes require more than 220 106 parameters – Real applications usually involve hundreds of attributes B io 4190.408 Artificial Intelligence ( 2016-Spring) 18 I ntelligence
Bayesian Networks - Key Idea Exploit regularities !!! • utilize conditional independence • Graphical representation of conditional independence respectively “causal” dependencies B io 4190.408 Artificial Intelligence ( 2016-Spring) 19 I ntelligence
Bayesian Networks 1. Finite, directed acyclic graph E B 2. Nodes: (discrete) random variables A 3. Edges: direct influences J M 4. Associated with each node: a table representing a conditional probability distribution (CPD), quantifying the effect the parents have on the node B io 4190.408 Artificial Intelligence ( 2016-Spring) 20 I ntelligence
Bayesian Networks X 1 X 2 (0.2, 0.8) (0.6, 0.4) X 3 true 1 (0.2,0.8) true 2 (0.5,0.5) false 1 (0.23,0.77) false 2 (0.53,0.47) B io 4190.408 Artificial Intelligence ( 2016-Spring) 21 I ntelligence
Example: Use a DAG to model the causality Train Martin Norman Strike Oversleep Oversleep Martin Norman Norman Late Late untidy Boss Project Office Failure-in-Love Delay Dirty Boss Angry B io 4190.408 Artificial Intelligence ( 2016-Spring) 22 I ntelligence
Example: Attach prior probabilities to all root nodes Train Norman Strike Probability Oversleep Probability T 0.1 T 0.2 Train Martin F 0.9 Norman F 0.8 Strike Oversleep Oversleep Martin Oversleep Probability T 0.01 Martin Norman Norman F 0.99 Late Late untidy Boss Project Office Failure-in-Love Delay Dirty Boss failure- in-love Probability T 0.01 F 0.99 Boss Angry B io 4190.408 Artificial Intelligence ( 2016-Spring) 23 I ntelligence
Example: Attach prior probabilities to non-root nodes Each column is summed to 1. Train Martin Norman Strike Oversleep Oversleep Martin Norman Norman Late Late untidy Boss Project Office Failure-in-Love Delay Dirty Train strike Norman T F oversleep Martin oversleep T F Boss T F T F Norman T 0.6 0.2 Angry F 0.4 0.8 untidy T 0.95 0.8 0.7 0.05 Martin Late F 0.05 0.2 0.3 0.95 B io 4190.408 Artificial Intelligence ( 2016-Spring) 24 I ntelligence
Example: Attach prior probabilities to non-root nodes Each column is summed to 1. Boss Failure-in-love Train Martin Norman T F Strike Oversleep Oversleep Project Delay T F T F Martin Norman Norman Office Dirty Late Late untidy T F T F T F T F very 0.98 0.85 0.6 0.5 0.3 0.2 0 0.01 Boss Project Office mid 0.02 0.15 0.3 0.25 0.5 0.5 0.2 0.02 Boss Failure-in-Love Delay Dirty Angry little 0 0 0.1 0.25 0.2 0.3 0.7 0.07 no 0 0 0 0 0 0 0.1 0.9 Boss Angry B io 4190.408 Artificial Intelligence ( 2016-Spring) 25 I ntelligence
Inference B io 4190.408 Artificial Intelligence ( 2016-Spring) 26 I ntelligence
MARKOV RANDOM FIELDS (MARKOV NETWORKS) B io 4190.408 Artificial Intelligence ( 2016-Spring) 27 I ntelligence
Graphical Models Directed Graph Undirected Graph (e.g. Bayesian Network) (e.g. Markov Random Field) B io 4190.408 Artificial Intelligence ( 2016-Spring) 28 I ntelligence
Bayesian Image Analysis Noise Transmission Original Image Degraded (observed) Image Degradatio n Process A Priori Probabilit y                           Pr Pr Degraded Image Original Image Original Image  Pr Original Image Degraded Image                Pr Degraded Image          A Posteriori Probabilit y Marginal Likelihood B io 4190.408 Artificial Intelligence ( 2016-Spring) 29 I ntelligence
Image Analysis • We could thus represent both the observed image (X) and the true image (Y) as Markov random fields. X – observed image Y – true image • And invoke the Bayesian framework to find P(Y|X) B io 4190.408 Artificial Intelligence ( 2016-Spring) 30 I ntelligence
Details • Remember P ( Y | X ) = P ( X | Y ) P ( Y ) µ P ( X | Y ) P ( Y ) P ( X ) • P(Y|X) proportional to P(X|Y)P(Y) – P(X|Y) is the data model. – P(Y) models the label interaction. • Next we need to compute the prior P(Y=y) and the likelihood P(X|Y). B io 4190.408 Artificial Intelligence ( 2016-Spring) 31 I ntelligence
Recommend
More recommend