CS440/ECE448 Lecture 14: Nave Bayes Mark Hasegawa-Johnson, 2/2020 - PowerPoint PPT Presentation

CS440/ECE448 Lecture 14: Naïve Bayes Mark Hasegawa-Johnson, 2/2020 Including slides by Svetlana Lazebnik, 9/2016 License: CC-BY 4.0 You are free to redistribute or remix if you give attribution https://www.xkcd.com/1132/

Bayesian Inference and Bayesian Learning • Bayes Rule • Bayesian Inference • Misdiagnosis • The Bayesian “Decision” • The “Naïve Bayesian” Assumption • Bag of Words (BoW) • Bigrams • Bayesian Learning • Maximum Likelihood estimation of parameters • Maximum A Posteriori estimation of parameters • Laplace Smoothing

By Unknown - [2][3], Public Domain, Bayes’ Rule https://commons. wikimedia.org/w/i ndex.php?curid=1 4532025 Rev. Thomas Bayes • The product rule gives us two ways to factor (1702-1761) a joint probability: 𝑄 𝐵, 𝐶 = 𝑄 𝐶 𝐵 𝑄 𝐵 = 𝑄 𝐵 𝐶 𝑄 𝐶 • Therefore, 𝑄 𝐵 𝐶 = 𝑄 𝐶 𝐵 𝑄(𝐵) 𝑄(𝐶) • Why is this useful? • “A” is something we care about, but P(A|B) is really really hard to measure (example: the sun exploded) • “B” is something less interesting, but P(B|A) is easy to measure (example: the amount of light falling on a solar cell) • Bayes’ rule tells us how to compute the probability we want (P(A|B)) from probabilities that are much, much easier to measure (P(B|A)).

Bayes Rule example Eliot & Karson are getting married tomorrow, at an outdoor ceremony in the desert. Unfortunately, the weatherman has predicted rain for tomorrow. • In recent years, it has rained (event R) only 5 days each year (5/365 = 0.014). 𝑄 𝑆 = 0.014 • When it actually rains, the weatherman forecasts rain (event F) 90% of the time. 𝑄 𝐺 𝑆 = 0.9 • When it doesn't rain, he forecasts rain (event F) only 10% of the time. 𝑄 𝐺 ¬𝑆 = 0.1 • What is the probability that it will rain on Eliot’s wedding? 𝑄 𝑆 𝐺 = 𝑄 𝐺 𝑆 𝑄(𝑆) 𝑄 𝐺, 𝑆 𝑄(𝑆) 𝑄 𝐺 𝑆 𝑄(𝑆) = 𝑄 𝐺, 𝑆 + 𝑄(𝐺, ¬𝑆) = 𝑄(𝐺) 𝑄 𝐺|𝑆 𝑄(𝑆) + 𝑄 𝐺 ¬𝑆 𝑄(¬𝑆) (0.9)(0.014) = 0.014 + (0.1)(0.956) = 0.116 0.9

By Unknown - The More Useful Version [2][3], Public Domain, https://commons. wikimedia.org/w/i of Bayes’ Rule ndex.php?curid=1 4532025 Rev. Thomas Bayes (1702-1761) 𝑄 𝐵 𝐶 = 𝑄 𝐶 𝐵 𝑄(𝐵) This version is what you memorize. 𝑄(𝐶) • Remember, P(B|A) is easy to measure (the probability that light hits our solar cell, if the sun still exists and it’s daytime). Let’s assume we also know P(A) (the probability the sun still exists). • But suppose we don’t really know P(B) (what is the probability light hits our solar cell, if we don’t really know whether the sun still exists or not?) 𝑄 𝐶 𝐵 𝑄(𝐵) 𝑄 𝐵 𝐶 = This version is what you 𝑄 𝐶 𝐵 𝑄 𝐵 + 𝑄 𝐶 ¬𝐵 𝑄 ¬𝐵 actually use.

The Misdiagnosis Problem 1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer? P (cancer | positive) = P (positive | cancer) P (cancer) P (positive) P (positive | cancer) P (cancer) = P (positive | cancer) P (cancer) + P (positive | ¬ cancer) P ( ¬ Cancer) ´ 0 . 8 0 . 01 0 . 008 = = = 0 . 0776 ´ + ´ + 0 . 8 0 . 01 0 . 096 0 . 99 0 . 008 0 . 095

Considering Treatment for Illness, Injury? Get a Second Opinion https://www.webmd.com/health-insurance/second-opinions#1 CHECK YOUR SYMPTOMS FIND A DOCTOR FIND LOWEST DRUG PRICES SIGN IN SUBSCRIBE HEALTH DRUGS & LIVING FAMILY & NEWS & ( SEARCH A-Z SUPPLEMENTS HEALTHY PREGNANCY EXPERTS ADVERTISEMENT To Get Health Health Insurance and Medicare ! Reference ! TODAY ON WEBMD HEALTH Second Opinions INSURANCE Clinical Trials AND What qualifies you for one? MEDICARE ' & % " # $ HOME Working During Cancer If your doctor tells you that you have a News Treatment Reference health problem or suggests a treatment Know your benefits. Quizzes for an illness or injury, you might want a Videos second opinion. This is especially true Going to the Dentist? Message Boards How to save money. when you're considering surgery or major Find a Doctor procedures. Enrolling in Medicare Asking another doctor to review your case RELATED TO How to get started. can be useful for many reasons: HEALTH INSURANCE

The Bayesian Decision The agent is given some evidence, 𝐹 . The agent has to make a decision about the value of an unobserved variable 𝑍 . 𝑍 is called the “query variable” or the “class variable” or the “category.” • Partially observable, stochastic, episodic environment • Example: 𝑍 ∈ {spam, not spam}, 𝐹 = email message. • Example: 𝑍 ∈ {zebra, giraffe, hippo}, 𝐹 = image features

Classification using probabilities • Suppose you know that you have a toothache. • Should you conclude that you have a cavity? • Goal: make a decision that minimizes your probability of error . • Equivalent: make a decision that maximizes the probability of being correct . This is called a MAP (maximum a posteriori) decision. You decide that you have a cavity if and only if 𝑄 𝐷𝑏𝑤𝑗𝑢𝑧 𝑈𝑝𝑝𝑢ℎ𝑏𝑑ℎ𝑓 > 𝑄(¬𝐷𝑏𝑤𝑗𝑢𝑧|𝑈𝑝𝑝𝑢ℎ𝑏𝑑ℎ𝑓)

Bayesian Decisions • What if we don’t know 𝑄 𝐷𝑏𝑤𝑗𝑢𝑧 𝑈𝑝𝑝𝑢ℎ𝑏𝑑ℎ𝑓 ? Instead, we only know 𝑄 𝑈𝑝𝑝𝑢ℎ𝑏𝑑ℎ𝑓 𝐷𝑏𝑤𝑗𝑢𝑧 , 𝑄(𝐷𝑏𝑤𝑗𝑢𝑧) , and 𝑄(𝑈𝑝𝑝𝑢ℎ𝑏𝑑ℎ𝑓) ? • Then we choose to believe we have a Cavity if and only if 𝑄 𝐷𝑏𝑤𝑗𝑢𝑧 𝑈𝑝𝑝𝑢ℎ𝑏𝑑ℎ𝑓 > 𝑄(¬𝐷𝑏𝑤𝑗𝑢𝑧|𝑈𝑝𝑝𝑢ℎ𝑏𝑑ℎ𝑓) Which can be re-written as 𝑄 𝑈𝑝𝑝𝑢ℎ𝑏𝑑ℎ𝑓 𝐷𝑏𝑤𝑗𝑢𝑧 𝑄(𝐷𝑏𝑤𝑗𝑢𝑧) > 𝑄 𝑈𝑝𝑝𝑢ℎ𝑏𝑑ℎ𝑓 ¬𝐷𝑏𝑤𝑗𝑢𝑧 𝑄(¬𝐷𝑏𝑤𝑗𝑢𝑧) 𝑄(𝑈𝑝𝑝𝑢ℎ𝑏𝑑ℎ𝑓) 𝑄(𝑈𝑝𝑝𝑢ℎ𝑏𝑑ℎ𝑓)

MAP decision The action, “a”, should be the value of C that has the highest posterior probability given the observation E=e: 𝑏 = argmax𝑄 𝑍 = 𝑏 𝐹 = 𝑓 = argmax 𝑄 𝐹 = 𝑓 𝑍 = 𝑏 𝑄(𝑍 = 𝑏) 𝑄(𝐹 = 𝑓) = argmax𝑄 𝐹 = 𝑓 𝑍 = 𝑏 𝑄(𝑍 = 𝑏) 𝑄 𝑍 = 𝑏 𝐹 = 𝑓 ∝ 𝑄 𝐹 = 𝑓 𝑍 = 𝑏 𝑄(𝑍 = 𝑏) prior posterior likelihood

The Bayesian Terms • 𝑄(𝑍 = 𝑧) is called the “ prior ” ( a priori , in Latin) because it represents your belief about the query variable before you see any observation. • 𝑄 𝑍 = 𝑧 𝐹 = 𝑓 is called the “ posterior ” ( a posteriori , in Latin), because it represents your belief about the query variable after you see the observation. • 𝑄 𝐹 = 𝑓 𝑍 = 𝑧 is called the “ likelihood ” because it tells you how much the observation, E=e, is like the observations you expect if Y=y. • 𝑄(𝐹 = 𝑓) is called the “ evidence distribution ” because E is the evidence variable, and 𝑄(𝐹 = 𝑓) is its marginal distribution. 𝑄 𝑧 𝑓 = 𝑄 𝑓 𝑧 𝑄(𝑧) 𝑄(𝑓)

Naïve Bayes model • Suppose we have many different types of observations (symptoms, features) X 1 , …, X n that we want to use to obtain evidence about an underlying hypothesis C • MAP decision: 𝑄 𝑍 = 𝑧 𝐹 ! = 𝑓 ! , … , 𝐹 " = 𝑓 " ∝ 𝑄 𝑍 = 𝑧 𝑄(𝐹 ! = 𝑓 ! , … , 𝐹 " = 𝑓 " |𝑍 = 𝑧) • If each feature 𝐹 𝑗 can take on k values, how many entries are in the probability table 𝑄(𝐹 ! = 𝑓 ! , … , 𝐹 " = 𝑓 " |𝑍 = 𝑧) ?

Naïve Bayes model Suppose we have many different types of observations (symptoms, features) E 1 , …, E n that we want to use to obtain evidence about an underlying hypothesis Y The Naïve Bayes decision: 𝑏 = argmax 𝑞 𝑍 = 𝑏 𝐹 ! = 𝑓 ! , … , 𝐹 " = 𝑓 " = argmax 𝑞 𝑍 = 𝑏 𝑞 𝐹 ! = 𝑓 ! , … , 𝐹 " = 𝑓 " 𝑍 = 𝑏 ≈ argmax 𝑞 𝑍 = 𝑏 𝑞 𝐹 ! = 𝑓 ! 𝑍 = 𝑏 … 𝑞 𝐹 " = 𝑓 " 𝑍 = 𝑏

CS440/ECE448 Lecture 14: Nave Bayes Mark Hasegawa-Johnson, 2/2020 - PowerPoint PPT Presentation

CS440/ECE448 Lecture 14: Nave Bayes Mark Hasegawa-Johnson, 2/2020 Including slides by Svetlana Lazebnik, 9/2016 License: CC-BY 4.0 You are free to redistribute or remix if you give attribution https://www.xkcd.com/1132/ Bayesian Inference

CS440/ECE448: Artificial Intelligence Lecture 1: What is AI? CS440/ECE448 Lecture 1: What is AI?

Lecture 1: What is AI? Julia Hockenmaier juliahmr@illinois.edu Welcome to CS440/ECE448

CS 440/ECE448 Lecture 19: Bayes Net Inference Mark Hasegawa-Johnson, 3/2019 modified by Julia

CS440/ECE448: Artificial Intelligence Lecture 1: Course Intro Course Intro: Syllabus Web

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

CS 440/ECE448 Lecture 19: Bayes Net Inference Mark Hasegawa-Johnson, 3/2019 Including slides by

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

CS440/ECE448 Lecture 26: Speech Mark Hasegawa-Johnson, 4/17/2019, CC-By 3.0 Outline Human

CS440/ECE448 Lecture 27: Societal Impacts of AI Slides by Svetlana Lazebnik, 12/2017 Image

CS440/ECE448 Lecture 8: Two-Player Games Slides by Svetlana Lazebnik 9/2016 Modified by Mark

CS440/ECE448 Lecture 21: Markov Decision Processes Slides by Svetlana Lazebnik, 11/2016 Modified

CS440/ECE448: Artificial Intelligence Lecture 2: History and Themes Slides by Svetlana Lazebnik,

CS440/ECE448 Lecture 12: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

CS440/ECE448 Lecture 28: Review I Final Exam Mon, May 6, 9:3010:45 Covers all lectures after

CS440/ECE448 Lecture 15: Bayesian Networks By Mark Hasegawa-Johnson, 2/2020 With some slides by

devious plans darkness lurks in a hidden corner SMITE G O D S G R A N D R E V E R S A L

Building a Better Astrophysics AMR Code with Charm++: Enzo-P/Cello (or more adventures in

Algorithms for Machine Learning Chiranjib Bhattacharyya Dept of CSA, IISc chibha@chalmers.se

Information Retrieval Models EARIA 2016 Eric Gaussier Univ. Grenoble Alpes - CNRS, INRIA - LIG

Discrete Ion Signature on Energy-Time Spectrogram Maxwellian Fitting for Velocity

Cartographic Visualization Generally: using InfoVis techniques to stretch cartography to new

D3 Tutorial Geographical Map Edit by Jiayi Xu and Han-Wei Shen, The Ohio State University Three

Teaching Texture Mapping as a rectangular solid, and the lawn and the sky were Visually created

Sambuz

Useful Links

Newsletter

Mail Us