2: Naive Bayes Classification Machine Learning and Real-world Data - PowerPoint PPT Presentation

2: Naive Bayes Classification Machine Learning and Real-world Data Simone Teufel and Ann Copestake Computer Laboratory University of Cambridge Lent 2017

Last session: an algorithmic solution to sentiment detection You built a symbolic system. The information source in your system was the sentiment lexicon. It was based on human intuition and required much human labour to build. You evaluated it in terms of accuracy. Accuracy is an adequate metric because the data was balanced. Is there a way to achieve a higher accuracy?

Machine Learning We will start today with a simple machine learning (ML) application Definition of ML: a program that learns from data, i.e., adapts its behaviour after having been exposed to new data. Hypothesis: we can learn which words (out of all words we encounter in reviews) express sentiment rather than relying on a fixed set of words decided independently from the data and before the experiment (sentiment lexicon approach).

Two tasks in ML – classification vs prediction Classification: Which class (label) should the data I see have? This is what we are doing here. Prediction: Which data is likely to occur in the given situation?

Features and classes Input: easily observable data [often not obviously meaningful] – features f i (or observations o i ) Output: meaningful label associated with the data [cannot be algorithmically determined] – class c n Classification algorithm is a function that maps from features f i to target class c n

Statistical Machine Learning Your system from Task 1 is already a classification algorithm, but it’s not an ML algorithm A statistical classifier maximises the probability that a class c is associated with the observations o , and returns the maximising class ˆ c : ˆ c = argmax P ( c | o ) c ∈ C c is a class c ∈ C = { c 1 . . . c m } , the set of classes. In our case, the observations o are the entire document d .

Testing and Training A machine learning algorithm has two phases: training and testing. Training: the process of making observations about some known data set You are allowed to manipulate the f i (and maybe look at c n while you do that) Testing: the process of applying the knowledge obtained in the training stage to some new, unseen data Important principle: never test on data that you trained a system on

Supervised vs unsupervised ML Supervised ML: you use the classes that come with the data in the training and the testing phase. Unsupervised ML: you use the classes only in the testing phase.

Naive Bayes Classifier � c NB = argmax P ( c | d ) = argmax P ( c ) P ( w i | c ) c ∈ C c ∈ C i ∈ positions Document d is represented by word positions w i , the word encountered at position i in the test document; positions is the set of indexes into the words in the document. In the training phase, you will collect whatever information you need to calculate P ( w i | c ) and P ( c ) . In the testing phase, you will apply the above formula to derive c NB , the classifier’s decision. This is supervised ML because you use information about the classes during training.

Data Split From last time, you have 1800 documents which you used for evaluation. We now perform a data split into 200 for testing, 1600 for training. You may later want to compare how well the NB System is doing in comparison to the symbolic system. As the NB system is evaluated only on 200 documents. Therefore, you should rerun your symbolic system on the same 200 documents.

Maximum Likelihood Estimates (MLE) ˆ P ( w i | c ) , ˆ P ( c ) Maximum Likelihood estimation (MLE) = finding the parameter values that maximize the likelihood of making the observations given the parameters count ( w i , c ) ˆ P ( w i | c ) = � w ∈ V count ( w , c ) P ( c ) = N c ˆ N doc N c : number of documents with class c N doc : total number of documents count ( w i , c ) : number of word positions w i occurring together with a class c V : vocabulary of distinct words

A problem you might run into A certain word may not have occurred together with one of the classes in the training data, so the count is 0. Part of your task today: understand why this is a problem work out what you could do to deal with it

Your task for today Task 2: Write code that calculates the MLE ˆ P ( w i | c ) and ˆ Pc , using only the training set. Now you have covered the training phase. Then write code for testing, i.e., apply your classifier to the validation set. Measure accuracy on the 200 documents. When you design your data structures, you may want to consider that you will in later sessions dynamically split data into a training and test set.

Ticking today Task 1 – Symbolic Classifier

Literature Textbook Jurafsky and Martin Edition 2, Chapter 6.2: Naive Bayes Classifier

2: Naive Bayes Classification Machine Learning and Real-world Data - PowerPoint PPT Presentation

2: Naive Bayes Classification Machine Learning and Real-world Data Simone Teufel and Ann Copestake Computer Laboratory University of Cambridge Lent 2017 Last session: an algorithmic solution to sentiment detection You built a symbolic

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

Introduction to Machine Learning Classification: Naive Bayes Learning goals 15 Understand the

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Naive Bayes Classication Naive Bayes Classication In [1]: % matplotlib inline from

CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke

PLUGIN CLASSIFIERS: NAIVE BAYES, LDA, PLUGIN CLASSIFIERS: NAIVE BAYES, LDA, LOGISTIC REGRESSION

Spam Filtering with Naive Bayes Classifier Yuriy Arabskyy June 6, 2017 Table of contents What

Statistical classification Lecture notes Naive Bayes Bayes' theorem P ( c|a ) P ( a ) = P ( a|c

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemels lectures

Outline Naive Credal Classifier 2: an extension of Naive Bayes Introducing NCC2 1 for

INF4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 (Mostly Text)

Naive Bayes case study Training set: 10,000 emails that are either SPAM or HAM Testing set:

CS 730/730W/830: Intro AI Naive Bayes Boosting 1 handout: slides asst 5 milestone was due

Presentation of a Scientific Paper Naive Bayes Models for Probability Estimation Daniel Lowd and

Contjnuous Experimentatjon and A/B Testjng: A Mapping Study Rasmus Ros and Per Runeson A/B

LECTURE 18 MORE ON BOOLEANS AND ITERABLES MCS 260 Fall 2020 David Dumas / REMINDERS Quiz 6

F R O M Z E R O T O P O RTA B I L I T Y ? Maximilian Michels mxm@apache.org A PA C H E B E

QGP Tomography Magdalena Djordjevic, Brief overview of Quark Gluon Plasma QGP is a new form

Class 15 Questions/comments Testing continued Assign (see Schedule for links)

- chisq.test(x, y) runs <- 1000 rbeta(runs, shape1, shape2) runs <- 1000 experiment_1

APACHE S2GRAPH (INCUBATING) AS A USER EVENT HUB KAKAO CORP. ABSTRACT Apache S2Graph

CompSci514/ECE558: Computer Networks Lecture 1: Course Introduction Xiaowei Yang

2: Naive Bayes Classification Machine Learning and Real-world Data - PowerPoint PPT Presentation

2: Naive Bayes Classification Machine Learning and Real-world Data Simone Teufel and Ann Copestake Computer Laboratory University of Cambridge Lent 2017 Last session: an algorithmic solution to sentiment detection You built a symbolic

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

Introduction to Machine Learning Classification: Naive Bayes Learning goals 15 Understand the

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Naive Bayes Classication Naive Bayes Classication In [1]: % matplotlib inline from

CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke

PLUGIN CLASSIFIERS: NAIVE BAYES, LDA, PLUGIN CLASSIFIERS: NAIVE BAYES, LDA, LOGISTIC REGRESSION

Spam Filtering with Naive Bayes Classifier Yuriy Arabskyy June 6, 2017 Table of contents What

Statistical classification Lecture notes Naive Bayes Bayes' theorem P ( c|a ) P ( a ) = P ( a|c

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun &amp; Rich Zemels lectures

Outline Naive Credal Classifier 2: an extension of Naive Bayes Introducing NCC2 1 for

INF4080 2020 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lnning 2 (Mostly Text)

Naive Bayes case study Training set: 10,000 emails that are either SPAM or HAM Testing set:

CS 730/730W/830: Intro AI Naive Bayes Boosting 1 handout: slides asst 5 milestone was due

Presentation of a Scientific Paper Naive Bayes Models for Probability Estimation Daniel Lowd and

Contjnuous Experimentatjon and A/B Testjng: A Mapping Study Rasmus Ros and Per Runeson A/B

LECTURE 18 MORE ON BOOLEANS AND ITERABLES MCS 260 Fall 2020 David Dumas / REMINDERS Quiz 6

F R O M Z E R O T O P O RTA B I L I T Y ? Maximilian Michels mxm@apache.org A PA C H E B E

QGP Tomography Magdalena Djordjevic, Brief overview of Quark Gluon Plasma QGP is a new form

Class 15 Questions/comments Testing continued Assign (see Schedule for links)

- chisq.test(x, y) runs &lt;- 1000 rbeta(runs, shape1, shape2) runs &lt;- 1000 experiment_1

APACHE S2GRAPH (INCUBATING) AS A USER EVENT HUB KAKAO CORP. ABSTRACT Apache S2Graph

CompSci514/ECE558: Computer Networks Lecture 1: Course Introduction Xiaowei Yang

CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemels lectures

- chisq.test(x, y) runs <- 1000 rbeta(runs, shape1, shape2) runs <- 1000 experiment_1