Nave Bayesian Learning Sven Koenig, USC Russell and Norvig, 3 rd - PDF document

12/18/2019 Naïve Bayesian Learning Sven Koenig, USC Russell and Norvig, 3 rd Edition, Sections 13.5.2 and 20.2.2 These slides are new and can contain mistakes and typos. Please report them to Sven (skoenig@usc.edu). 1 Naïve Bayesian Learning • We now apply what we have learned to machine learning. 2 1

12/18/2019 Inductive Learning for Classification • Labeled examples Feature_1 Feature_2 Class true true true true false false false true false Learn f(Feature_1, Feature_2) = Class from f(true, true) = true f(true, false) = false f(false, true) = false The function needs to be consistent with all labeled examples and should make the fewest mistakes on the unlabeled examples. • Unlabeled examples Feature_1 Feature_2 Class false false ? true true ? 3 Naïve Bayesian Learning • Assume that the features are conditionally independent of each other given the class. • This naïve (= potentially wrong) assumption keeps the number of parameters to be learned small. Class … Feature_1 Feature_n Naïve Bayesian Network 4 2

12/18/2019 Naïve Bayesian Learning • Use maximum-likelihood estimates to learn the probabilities in the conditional probability tables from the labeled examples, that is, use frequencies to estimate the probabilities. Feature_1 Feature_2 Class true true true true false false false true false P(Class) Class 1/3 There are two examples whose class is false: Feature_1 of one of it is true and Feature_1 of the other one of it is false. Class P(Feature_1| Class) Class P(Feature_2 | Class) true 1 Feature_1 Feature_2 true 1 false 1/2 false 1/2 5 Naïve Bayesian Learning • Calculate the probabilities of the class values given the feature values for unlabeled examples Feature_1 Feature_2 Class false false ? P(Class) Class 1/3 Class P(Feature_1| Class) Class P(Feature_2 | Class) true 1 Feature_1 Feature_2 true 1 false 1/2 false 1/2 • Either make a probabilistic prediction by outputting P(Class | NOT Feature_1, NOT Feature_2) or a deterministic prediction by outputting the more likely class. 6 3

12/18/2019 Naïve Bayesian Learning • P(Class, NOT Feature_1, NOT Feature_2) = P(Class) P(NOT Feature_1 | Class) P(NOT Feature_2 | Class) = 1/3 0 0 = 0 • P(NOT Class, NOT Feature_1, NOT Feature_2) = P(NOT Class) P(NOT Feature_1 | NOT Class) P(NOT Feature_2 | NOT Class) = 2/3 1/2 1/2 = 1/6 • P(NOT Feature_1, NOT Feature_2) = P(Class, NOT Feature_1, NOT Feature_2) + P(NOT Class, NOT Feature_1, NOT Feature_2) = 0 + 1/6 = 1/6 • P(Class | NOT Feature_1, NOT Feature_2) = P(Class, NOT Feature_1, NOT Feature_2) / P(NOT Feature_1, NOT Feature_2) = 0 / (1/6) = 0 • P(NOT Class | NOT Feature_1, NOT Feature_2) = P(NOT Class, NOT Feature_1, NOT Feature_2) / P(NOT Feature_1, NOT Feature_2) = (1/6) / (1/6) = 1 Feature_1 Feature_2 Class false false P(Class | NOT Feature_1, NOT Feature_2) = 0 or false 7 Naïve Bayesian Learning • Calculate the probabilities of the class values given the feature values for unlabeled examples Feature_1 Feature_2 Class true true ? P(Class) Class 1/3 Class P(Feature_1| Class) Class P(Feature_2 | Class) true 1 Feature_1 Feature_2 true 1 false 1/2 false 1/2 • Either make a probabilistic prediction by outputting P(Class | Feature_1, Feature_2) or a deterministic prediction by outputting the more likely class. 8 4

12/18/2019 Naïve Bayesian Learning • P(Class, Feature_1, Feature_2) = P(Class) P(Feature_1 | Class) P(Feature_2 | Class) = 1/3 1 1 = 1/3 • P(NOT Class, Feature_1, Feature_2) = P(NOT Class) P(Feature_1 | NOT Class) P(Feature_2 | NOT Class) = 2/3 1/2 1/2 = 1/6 • P(Feature_1, Feature_2) = P(Class, Feature_1, Feature_2) + P(NOT Class, Feature_1, Feature_2) = 1/3 + 1/6 = 1/2 • P(Class | Feature_1, Feature_2) = P(Class, Feature_1, Feature_2) / P(Feature_1, Feature_2) = (1/3) / (1/2) = 2/3 • P(NOT Class | Feature_1, Feature_2) = P(NOT Class, Feature_1, Feature_2) / P(Feature_1, Feature_2) = (1/6) / (1/2) = 1/3 Feature_1 Feature_2 Class true true P(Class | Feature_1, Feature_2) = 2/3 or true 9 Naïve Bayesian Learning • For inductive learning, we typically demand that the learned function is consistent with all labeled examples (if possible). However, then we should have calculated P(Class | Feature_1, Feature_2) = 1. • This is not possible because the naïve Bayesian assumption does not hold for the labeled examples (see next slide). • Thus, a naïve Bayesian network cannot represent the labeled examples correctly and thus cannot represent all Boolean functions correctly. • Just like for single perceptrons, this does not mean that they should not be used. They will make some mistakes for some Boolean functions but they often work well, that is, make few mistakes on the labeled and unlabeled examples. 10 5

12/18/2019 Naïve Bayesian Learning • The assumption that the features are conditionally independent of each other given the class does not hold for the labeled examples. Feature_1 Feature_2 Class true true true true false false false true false • For example, P(Feature_1 | NOT Class) = 1/2 but P(Feature_1 | Feature_2, NOT Class) = 0. 11 Naïve Bayesian Learning • Properties (some versus decision trees) • Are very tolerant of noise in feature and class values of examples • Can make deterministic or probabilistic predictions • Learn quickly even for large problems • Cannot represent all Boolean functions (since the naïve Bayesian assumption does not hold for all of them) • Early application • Email spam detectors (where Feature_i = “How often does the i th word in a dictionary appear in the email?” and Class = “Is the email spam?”) 12 6

Nave Bayesian Learning Sven Koenig, USC Russell and Norvig, 3 rd - PDF document

12/18/2019 Nave Bayesian Learning Sven Koenig, USC Russell and Norvig, 3 rd Edition, Sections 13.5.2 and 20.2.2 These slides are new and can contain mistakes and typos. Please report them to Sven (skoenig@usc.edu). 1 Nave Bayesian Learning

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Beyond Uniform Priors in Bayesian Network Structure Learning (for Discrete Bayesian Networks)

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Function Space Priors in Bayesian Deep Learning Roger Grosse Motivation Today Bayesian deep

Adversarial Approaches to Bayesian Learning and Bayesian Approaches to Adversarial Robustness

www.film-english.com by Kieran Donaghy

THE WRITING PROCESS ATI TEAS ENGLISH AND LANGUAGE USAGE THE WRITING PROCESS For the TEAS, you

Encouraging and Rewarding Classroom Innovation Andrew Dorsey Eric Salahub President

1 Peter Series Lesson #017 May 21, 2015 Dean Bible Ministries www.deanbibleministries.org Dr.

Third Quarter & Nine Months 2019 Financial Results 16 October 2019 Outline Performance

CITY OF VAUGHAN SECONDARY SUITES STUDY EPTEMBER 26 2013 TAS K FORCE MEETING 2 S Agenda

Settlement process After energy schedules and the system price are determined, comes the settlement

Community Prevention & Wellness Initiative (CPWI) I-502 DSHS/DBHR Prevention Implementation

Nave Bayesian Learning Sven Koenig, USC Russell and Norvig, 3 rd - PDF document

12/18/2019 Nave Bayesian Learning Sven Koenig, USC Russell and Norvig, 3 rd Edition, Sections 13.5.2 and 20.2.2 These slides are new and can contain mistakes and typos. Please report them to Sven (skoenig@usc.edu). 1 Nave Bayesian Learning

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Beyond Uniform Priors in Bayesian Network Structure Learning (for Discrete Bayesian Networks)

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Function Space Priors in Bayesian Deep Learning Roger Grosse Motivation Today Bayesian deep

Adversarial Approaches to Bayesian Learning and Bayesian Approaches to Adversarial Robustness

www.film-english.com by Kieran Donaghy

THE WRITING PROCESS ATI TEAS ENGLISH AND LANGUAGE USAGE THE WRITING PROCESS For the TEAS, you

Encouraging and Rewarding Classroom Innovation Andrew Dorsey Eric Salahub President

1 Peter Series Lesson #017 May 21, 2015 Dean Bible Ministries www.deanbibleministries.org Dr.

Third Quarter &amp; Nine Months 2019 Financial Results 16 October 2019 Outline Performance

CITY OF VAUGHAN SECONDARY SUITES STUDY EPTEMBER 26 2013 TAS K FORCE MEETING 2 S Agenda

Settlement process After energy schedules and the system price are determined, comes the settlement

Community Prevention &amp; Wellness Initiative (CPWI) I-502 DSHS/DBHR Prevention Implementation

Third Quarter & Nine Months 2019 Financial Results 16 October 2019 Outline Performance

Community Prevention & Wellness Initiative (CPWI) I-502 DSHS/DBHR Prevention Implementation