lecture 20
play

Lecture 20 A Bayes Net defines a joint distribution P(X 1 X n ) over - PowerPoint PPT Presentation

CS440/ECE448: Intro to Artificial Intelligence Bayes Nets Lecture 20 A Bayes Net defines a joint distribution P(X 1 X n ) over a set of random variables X 1 X n More on learning Using the chain rule, we can factor P(X 1


  1. CS440/ECE448: Intro to Artificial Intelligence � Bayes Nets � Lecture 20 
 A Bayes Net defines a joint distribution P(X 1 …X n ) over a set of random variables X 1 …X n � More on learning � Using the chain rule, we can factor P(X 1 …X n ) into graphical models � a product of n conditional distributions: � � P(X 1 …X n ) = ! j P(X i | X 1 …X i-1 ). Prof. Julia Hockenmaier � � juliahmr@illinois.edu � A Bayes Net makes a number of (conditional) � independence assumptions: � P(X 1 …X n ) = def ! j P(X i | Parents(X i ) ⊆ {X 1… X i-1 }) http://cs.illinois.edu/fa11/cs440 � � � � Learning Bayes Nets � Bayes Rule � � Parameter estimation: Given some data D over a P ( h | D ) = P ( D | h ) P ( h ) set of random variables X and a Bayes Net (with � P ( D ) empty CPTs) estimate the parameters (= fill in the � CPTs) of the Bayes Net . � P(h): prior probability of hypothesis � P(h | D) : posterior probability of hypothesis. � Structure learning: Given some data D over a set P(D | h) : likelihood of data, given hypothesis � of random variables X , find a Bayes Net (define its � CPTs) and estimate its parameters. � Prior ∝ posterior × likelihood � (This is much harder… we won ʼ t deal with it here) � P ( h | D ) ! P ( D | h ) P ( h ) � CS440/ECE448: Intro AI � 4 �

  2. Three kinds of estimation Maximum likelihood learning � techniques � Bayes optimal: Marginalize out the hypotheses � Given data D , we want to find the P(X | D ) = ! i P(X | h i )P(h i | D) parameters that maximize P( D | θ ). � � MAP (maximum a posteriori): � We have a data set with N candies. � Pick the hypothesis with the highest posterior � c are cherry. l = (N-c) , are lime. � h MAP = argmax h P(h|D) Parameter θ = probability of cherry � � ML (maximum likelihood): � Maximum likelihood estimate: " = c/N Pick the hypothesis that assigns highest likelihood � h ML = argmax h P(D|h) � � CS440/ECE448: Intro AI � 5 � CS440/ECE448: Intro AI � 6 � Out of N candies, c are cherry. r c are cherry with a A more complex model � red wrapper, r l are lime with a red wrapper � � The likelihood of this data set: � Now the candy has two kinds of wrappers � P(d | " , " 1 , " 2 ) = " c (1- " ) N-c " 1 rc (1- " 1 ) c-rc " 2 rl (1- " 1 ) (N-c)-rl (red or green). � � The wrapper is chosen probabilistically, depending The log likelihood of this data set: � on the flavor of the candy. � L(d | " , " 1 , " 2 ) = [c log " + (N-c)log(1- " )] flavor � cherry: ! +[r c log " 1 + (c-r c )log(1- " 1 )] +[l c log " 2 + (N-c-l c )log(1- " 2 )] F P(red | F) The ML parameter estimates: cherry ! 1 wrapper � " = c/N " 1 = r c /c " 2 = r l /(N-c) � lime ! 2 CS440/ECE448: Intro AI � 7 � CS440/ECE448: Intro AI � 8 �

  3. Medical diagnosis � The Naïve Bayes classifier � Assume the items in your data set have a Patients see a doctor and complain about a number of attribute A 1 …A n . � number of symptoms (headache, 100F � fever, …). � Each item also belongs to one of a number of given classes C 1 …C k . � � � What is the most likely disease d i , given the Which attributes an item has depends on its set of symptoms S the patient has? � class. � � ( ) If you only observe the attributes of an item, arg max P d | S can you predict the class? � i d i CS440/ECE448: Intro AI � 9 � CS440/ECE448: Intro AI � 10 � Naïve Bayes � The Naïve Bayes classifier � P(d 1 ) Disease P(d 2 ) P(s 1 |d 1 ) (1,2,3,…) P(d 3 ) P(s 1 |d 2 ) P(s 1 |d 3 ) • • • C � • • • Symptom1 Symptom2 Symptom3 • • • (T/F) (T/F) (T/F) P(s 3 |d 1 ) P(s 2 |d 1 ) P(s 3 |d 2 ) P(s 2 |d 2 ) … � A1 � A2 � An � P(s 3 |d 3 ) P(s 2 |d 3 ) • • • • • • CS440/ECE448: Intro AI � 11 �

  4. Naïve Bayes � Maximum likelihood estimation � argmax C P(C| A 1 …A n ) = If we have a set of training data where the class of each item is given: � = argmax C P(A 1 …A n | C) P(C) – the multinomial P( C=c ) = freq(c)/N � = argmax C ! j P(A j | C) P(C) – for each attribute A j and class c: 
 P(A j = a| c) = freq(a, c)/freq( c ) � We need to estimate: � where � – the multinomial P( C ) � freq(c ) = the number of items in the training – for each attribute A j and class c P(A j | c) � data that have class c � � freq(a, c ) = the number of items in the training data that have attribute a and class c. � � CS440/ECE448: Intro AI � 13 � CS440/ECE448: Intro AI � 14 � �

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend