Machine Learning Nave Bayes classifiers Types of classifiers We - PowerPoint PPT Presentation

10-701 Machine Learning Naïve Bayes classifiers

Types of classifiers • We can divide the large variety of classification approaches into three major types 1. Instance based classifiers - Use observation directly (no models) - e.g. K nearest neighbors 2. Generative: - build a generative statistical model - e.g., Bayesian networks 3. Discriminative - directly estimate a decision rule/boundary - e.g., decision tree

Bayes decision rule • If we know the conditional probability P(X | y) we can determine the appropriate class by using Bayes rule:   ( | ) ( ) def P X y i P y i    ( | ) ( ) P y i X q X i ( ) P X But how do we determine p(X|y)?

Recall… y – the class label Computing p(X|y) X – input attributes (features) • Consider a … age employmen education edunum marital job relation race gender hours country wealth … dataset with 16 13 Never_marr … 39 State_gov Bachelors Adm_clericNot_in_famWhite Male 40 United_Stapoor … 51 Self_emp_n Bachelors 13 Married Exec_mana Husband White Male 13 United_Stapoor attributes (lets … 39 Private HS_grad 9 Divorced Handlers_cNot_in_famWhite Male 40 United_Stapoor … 54 Private 11th 7 Married Handlers_cHusband Black Male 40 United_Stapoor assume they are … 28 Private Bachelors 13 Married Prof_specia Wife Black Female 40 Cuba poor … 38 Private Masters 14 Married Exec_mana Wife White Female 40 United_Stapoor all binary). How 5 Married_sp … 50 Private 9th Other_serviNot_in_famBlack Female 16 Jamaica poor … 52 Self_emp_n HS_grad 9 Married Exec_mana Husband White Male 45 United_Starich many parameters 14 Never_marr … 31 Private Masters Prof_specia Not_in_famWhite Female 50 United_Starich … 42 Private Bachelors 13 Married Exec_mana Husband White Male 40 United_Starich to we need to … 37 Private Some_colle 10 Married Exec_mana Husband Black Male 80 United_Starich … 30 State_gov Bachelors 13 Married Prof_specia Husband Asian Male 40 India rich estimate to fully 13 Never_marr … 24 Private Bachelors Adm_clericOwn_child White Female 30 United_Stapoor 12 Never_marr … determine 33 Private Assoc_acd Sales Not_in_famBlack Male 50 United_Stapoor … 41 Private Assoc_voc 11 Married Craft_repair Husband Asian Male 40 *MissingVa rich … p(X|y)? 34 Private 7th_8th 4 Married Transport_m Husband Amer_IndiaMale 45 Mexico poor 9 Never_marr … 26 Self_emp_n HS_grad Farming_fis Own_child White Male 35 United_Stapoor 9 Never_marr … 33 Private HS_grad Machine_op Unmarried White Male 40 United_Stapoor … 38 Private 11th 7 Married Sales Husband White Male 50 United_Stapoor … 44 Self_emp_n Masters 14 Divorced Exec_mana Unmarried White Female 45 United_Starich … 41 Private Doctorate 16 Married Prof_specia Husband White Male 60 United_Starich : : : : : : : : : : : : : Learning the values for the full conditional probability table would require enormous amounts of data

Naïve Bayes Classifier • Naïve Bayes classifiers assume that given the class label (Y) the attributes are conditionally independent of each other:   j ( | ) ( | ) p X y p x y j j Product of probability Specific model for terms attribute j • Using this idea the full classification rule becomes:   ˆ arg max ( | ) y p y v X v   ( | ) ( ) p X y v p y v  arg max v ( ) p X     j arg max ( | ) ( ) p x y v p y v v j v are the classes j we have

Conditional likelihood: Full version       j j ( | 1 , ) ( | 1 , ) L X y p x y i i i i 1 j The specific parameters for attribute Vector of binary The set of all j in class 1 attributes for sample i parameters in the NB model Note the following: 1. We assumes conditional independence between attributes given the class label 2. We learn a different set of parameters for the two classes (class 1 and class 2).

Learning parameters       j j ( | 1 , ) ( | 1 , ) L X y p x y i i i i 1 j • Let X 1 … X k1 be the set of input samples with label „y=1‟ • Assume all attributes are binary   j • To determine the MLE parameters for ( 1 | 1 ) p x y we simply count how many times the j‟th entry of those samples in class 1 is 0 (termed n0) and how many times its 1 (n1). Then we set: 1 n    j ( 1 | 1 ) p x y  0 1 n n

Final classification • Once we computed all parameters for attributes in both classes we can easily decide on the label of a new sample X.   ˆ arg max ( | ) y p y v X v   ( | ) ( ) p X y v p y v  arg max v ( ) p X     j arg max ( | ) ( ) p x y v p y v v j j Prior on the prevalence of Perform this computation for both class 1 and class samples from each class 2 and select the class that leads to a higher probability as your decision

Example: Text classification • What is the major topic of this article?

Example: Text classification • Text classification is all around us

Feature transformation • How do we encode the set of features (words) in the document? • What type of information do we wish to represent? What can we ignore? • Most common encoding: „ Bag of Words ‟ • Treat document as a collection of words and encode each document as a vector based on some dictionary • The vector can either be binary (present / absent information for each word) or discrete (number of appearances) • Google is a good example • Other applications include job search adds, spam filtering and many more.

Feature transformation: Bag of Words • In this example we will use a binary vector • For document X i we will use a vector of m* indicator features {  j (X i )} for whether a word appears in the document -  j (X i ) = 1, if word j appears in document X i ;  j (X i ) = 0 if it does not appear in the document •  ( X i ) =[  1 (X i ) …  m (X i ) ] T is the resulting feature vector for the entire dictionary for document X i • For notational simplicity we will replace each document X i with a fixed length vector  i =[  1 …  m ] T , where  j =  j (X i ) . *The size of the vector for English is usually ~10000 words

Assume we would like to classify documents Example as election related or not. Dictionary • Washington • Congress … 54. Romney 55. Obama 56. Nader  54 =  54 (X i ) = 1  55 =  55 (X i ) = 1  56 =  56 (X i ) = 0

Example: cont. We would like to classify documents as election related or not. • Given a collection of documents with their labels (usually termed „training data‟) we learn the parameters for our model. • For example, if we see the word „Obama‟ in n1 out of the n documents labeled as „election‟ we set p (‘obama’|’election’)=n1/n • Similarly we compute the priors ( p(‘election’) ) based on the proportion of the documents from both classes.

Example: Classifying Election (E) or Sports (S) Assume we learned the following model P(  romney =1 |E) = 0.8, P(  romney =1 | S) = 0.1 P(S) = 0.5 P(  obama =1|E) = 0.9, P(  obama =1| S) = 0.05 P(E) = 0.5 P(  clinton =1|E) = 0.9, P(  clinton =1|S) = 0.05 P(  football =1|E) = 0.1, P(  football =1|S) = 0.7 For a specific document we have the following feature vector  romney = 1  obama = 1  clinton = 1  football = 0 P(y = E | 1,1,1,0)  0.8*0.9*0.9*0.9*0.5 = 0.5832 P(y = S | 1,1,1,0)  0.1*0.05*0.05*0.3*0.5 = 0.000075 So the document is classified as „Election‟

Naïve Bayes classifiers for continuous values • So far we assumed a binomial or discrete distribution for the data given the model (p(X i |y)) • However, in many cases the data contains continuous features: - Height, weight - Levels of genes in cells - Brain activity • For these types of data we often use a Gaussian model • In this model we assume that the observed input vector X is generated from the following distribution X ~ N(  ,  )

Gaussian Bayes Classifier Assumption • The i‟th record in the database is created using the following algorithm Generate the output (the “class”) by drawing 1. y i ~Multinomial(p 1 ,p 2 ,… p Ny ) 2. Generate the inputs from a Gaussian PDF that depends on the value of y i : x i ~ N( m i , S i ) .

Gaussian Bayes Classification   ( | ) ( ) p X y v P y v   • To determine the class when using the ( | ) P y v X ( ) p X Gaussian assumption we need to compute p(X|y):   1 1         1 T ( | ) exp ( ) ( ) P X y X X     / 2 1 / 2   n ( 2 ) | | 2 Once again, we need lots of data to compute the values of the mean  and the covariance matrix 

Machine Learning Nave Bayes classifiers Types of classifiers We - PowerPoint PPT Presentation

10-701 Machine Learning Nave Bayes classifiers Types of classifiers We can divide the large variety of classification approaches into three major types 1. Instance based classifiers - Use observation directly (no models) - e.g. K

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Fu Hua Fu Hua Fudan University Fudan University 1 Morbidity and mortality of major cancers

Cancer Registration and Cancer Registration and Statistics in Korea Statistics in Korea Sohee

MAHEU, CHRISTINE RESEARCH CONTEXT IN FRANCE Breast cancer surveillance and screening

Lecture6_ModulesNumPyIO August 30, 2018 1 Lecture 6: Modules, NumPy, and File I/O CBIO (CSCI)

Welcome! Developed by the PLANET MassCONECT Team, 2018 Funded by NCI (U54 CA156732) PLANET

NAACCR Data Quality Indicators NAACCR 2011 2012 Webinar Series June 14, 2012 Q&A

Allan Grill MD, CCFP(COE), MPH, FCFP, CCPE Lisa Ruddy, RN Markham Family Health Team AFHTO 2017

CMS Web Interface Kick Off Program Year 2017 Disclaimer This presentation was current at the

Machine Learning Nave Bayes classifiers Types of classifiers We - PowerPoint PPT Presentation

10-701 Machine Learning Nave Bayes classifiers Types of classifiers We can divide the large variety of classification approaches into three major types 1. Instance based classifiers - Use observation directly (no models) - e.g. K

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Fu Hua Fu Hua Fudan University Fudan University 1 Morbidity and mortality of major cancers

Cancer Registration and Cancer Registration and Statistics in Korea Statistics in Korea Sohee

MAHEU, CHRISTINE RESEARCH CONTEXT IN FRANCE Breast cancer surveillance and screening

Lecture6_ModulesNumPyIO August 30, 2018 1 Lecture 6: Modules, NumPy, and File I/O CBIO (CSCI)

Welcome! Developed by the PLANET MassCONECT Team, 2018 Funded by NCI (U54 CA156732) PLANET

NAACCR Data Quality Indicators NAACCR 2011 2012 Webinar Series June 14, 2012 Q&amp;A

Allan Grill MD, CCFP(COE), MPH, FCFP, CCPE Lisa Ruddy, RN Markham Family Health Team AFHTO 2017

CMS Web Interface Kick Off Program Year 2017 Disclaimer This presentation was current at the

NAACCR Data Quality Indicators NAACCR 2011 2012 Webinar Series June 14, 2012 Q&A