Lecture 20 A Bayes Net defines a joint distribution P(X 1 X n ) over - PowerPoint PPT Presentation

CS440/ECE448: Intro to Artificial Intelligence � Bayes Nets � Lecture 20   A Bayes Net defines a joint distribution P(X 1 …X n ) over a set of random variables X 1 …X n � More on learning � Using the chain rule, we can factor P(X 1 …X n ) into graphical models � a product of n conditional distributions: � � P(X 1 …X n ) = ! j P(X i | X 1 …X i-1 ). Prof. Julia Hockenmaier � � juliahmr@illinois.edu � A Bayes Net makes a number of (conditional) � independence assumptions: � P(X 1 …X n ) = def ! j P(X i | Parents(X i ) ⊆ {X 1… X i-1 }) http://cs.illinois.edu/fa11/cs440 � � � � Learning Bayes Nets � Bayes Rule � � Parameter estimation: Given some data D over a P ( h | D ) = P ( D | h ) P ( h ) set of random variables X and a Bayes Net (with � P ( D ) empty CPTs) estimate the parameters (= fill in the � CPTs) of the Bayes Net . � P(h): prior probability of hypothesis � P(h | D) : posterior probability of hypothesis. � Structure learning: Given some data D over a set P(D | h) : likelihood of data, given hypothesis � of random variables X , find a Bayes Net (define its � CPTs) and estimate its parameters. � Prior ∝ posterior × likelihood � (This is much harder… we won ʼ t deal with it here) � P ( h | D ) ! P ( D | h ) P ( h ) � CS440/ECE448: Intro AI � 4 �

Three kinds of estimation Maximum likelihood learning � techniques � Bayes optimal: Marginalize out the hypotheses � Given data D , we want to find the P(X | D ) = ! i P(X | h i )P(h i | D) parameters that maximize P( D | θ ). � � MAP (maximum a posteriori): � We have a data set with N candies. � Pick the hypothesis with the highest posterior � c are cherry. l = (N-c) , are lime. � h MAP = argmax h P(h|D) Parameter θ = probability of cherry � � ML (maximum likelihood): � Maximum likelihood estimate: " = c/N Pick the hypothesis that assigns highest likelihood � h ML = argmax h P(D|h) � � CS440/ECE448: Intro AI � 5 � CS440/ECE448: Intro AI � 6 � Out of N candies, c are cherry. r c are cherry with a A more complex model � red wrapper, r l are lime with a red wrapper � � The likelihood of this data set: � Now the candy has two kinds of wrappers � P(d | " , " 1 , " 2 ) = " c (1- " ) N-c " 1 rc (1- " 1 ) c-rc " 2 rl (1- " 1 ) (N-c)-rl (red or green). � � The wrapper is chosen probabilistically, depending The log likelihood of this data set: � on the flavor of the candy. � L(d | " , " 1 , " 2 ) = [c log " + (N-c)log(1- " )] flavor � cherry: ! +[r c log " 1 + (c-r c )log(1- " 1 )] +[l c log " 2 + (N-c-l c )log(1- " 2 )] F P(red | F) The ML parameter estimates: cherry ! 1 wrapper � " = c/N " 1 = r c /c " 2 = r l /(N-c) � lime ! 2 CS440/ECE448: Intro AI � 7 � CS440/ECE448: Intro AI � 8 �

Medical diagnosis � The Naïve Bayes classifier � Assume the items in your data set have a Patients see a doctor and complain about a number of attribute A 1 …A n . � number of symptoms (headache, 100F � fever, …). � Each item also belongs to one of a number of given classes C 1 …C k . � � � What is the most likely disease d i , given the Which attributes an item has depends on its set of symptoms S the patient has? � class. � � ( ) If you only observe the attributes of an item, arg max P d | S can you predict the class? � i d i CS440/ECE448: Intro AI � 9 � CS440/ECE448: Intro AI � 10 � Naïve Bayes � The Naïve Bayes classifier � P(d 1 ) Disease P(d 2 ) P(s 1 |d 1 ) (1,2,3,…) P(d 3 ) P(s 1 |d 2 ) P(s 1 |d 3 ) • • • C � • • • Symptom1 Symptom2 Symptom3 • • • (T/F) (T/F) (T/F) P(s 3 |d 1 ) P(s 2 |d 1 ) P(s 3 |d 2 ) P(s 2 |d 2 ) … � A1 � A2 � An � P(s 3 |d 3 ) P(s 2 |d 3 ) • • • • • • CS440/ECE448: Intro AI � 11 �

Naïve Bayes � Maximum likelihood estimation � argmax C P(C| A 1 …A n ) = If we have a set of training data where the class of each item is given: � = argmax C P(A 1 …A n | C) P(C) – the multinomial P( C=c ) = freq(c)/N � = argmax C ! j P(A j | C) P(C) – for each attribute A j and class c:   P(A j = a| c) = freq(a, c)/freq( c ) � We need to estimate: � where � – the multinomial P( C ) � freq(c ) = the number of items in the training – for each attribute A j and class c P(A j | c) � data that have class c � � freq(a, c ) = the number of items in the training data that have attribute a and class c. � � CS440/ECE448: Intro AI � 13 � CS440/ECE448: Intro AI � 14 � �

Lecture 20 A Bayes Net defines a joint distribution P(X 1 X n ) over - PowerPoint PPT Presentation

CS440/ECE448: Intro to Artificial Intelligence Bayes Nets Lecture 20 A Bayes Net defines a joint distribution P(X 1 X n ) over a set of random variables X 1 X n More on learning Using the chain rule, we can factor P(X 1

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

Review: Perceptual issues in Graphics Review: lectures 22, 23 lecture 25 In many computer

Campaign Leader Workshop Haley Armstrong Nebraska Medicine Game, Set, Corporate Match By

Short term Foreign Exchange: England Summer 2012 BACK IN CALIFORNIA British Slang vs. American

IC220 Combinational Logic Slide Set #A2: Combinational and Multiplexors (mux)

CMSC201 Computer Science I for Majors Lecture 22 Dictionaries Prof. Katherine Gibson Based

Overview Motivation Societal Scientific Community Some interesting research

The High Cost of Insulin and Diabetic Healthcare - How to Navigate the Financial Barriers - A

Semantic Memory 1 General Knowledge Structure of Semantic Memory