Nave Bayes & Maxent Models CMSC 473/673 UMBC September 18 th , - PowerPoint PPT Presentation

Naïve Bayes & Maxent Models CMSC 473/673 UMBC September 18 th , 2017 Some slides adapted from 3SLP

Announcements: Assignment 1 Due 11:59 AM, Wednesday 9/20 < 2 days Use submit utility with: class id cs473_ferraro assignment id a1 We must be able to run it on GL! Common pitfall #1: forgetting files Common pitfall #2: incorrect paths to files Common pitfall #3: 3 rd party libraries

Announcements: Course Project Official handout will be out Wednesday 9/20 Until then, focus on assignment 1 Teams of 1-3 Mixed undergrad/grad is encouraged but not required Some novel aspect is needed Ex 1: reimplement existing technique and apply to new domain Ex 2: reimplement existing technique and apply to new (human) language Ex 3: explore novel technique on existing problem

Recap from last time…

Two Different Philosophical Frameworks prior likelihood probability posterior probability marginal likelihood (probability) Posterior Classification/Decoding Noisy Channel Model Decoding maximum a posteriori there are others too (CMSC 478/678)

Posterior Decoding: Probabilistic Text Classification Assigning subject Age/gender identification categories, topics, or Language Identification genres Sentiment analysis Spam detection … Authorship identification prior class-based likelihood probability of (language model) class class observed observation likelihood (averaged over all classes) data

Noisy Channel Model Decode Rerank hypothesized reweight what you what I want to intent according to actually see tell you “sad stories” what’s likely “The Os lost “sports” “sports” “sports” again…”

Noisy Channel Machine translation Part-of-speech tagging Speech-to-text Morphological analysis Spelling correction Image captioning Text normalization … translation/ (clean) possible decode language (clean) model model output observed observation (noisy) likelihood (noisy) text

Classify or Decode with Bayes Rule

Classify or Decode with Bayes Rule constant with respect to X

Classify or Decode with Bayes Rule

Classify or Decode with Bayes Rule how likely is label X overall? how well does text Y represent label X ?

Classify or Decode with Bayes Rule how likely is label X overall? how well does text Y represent label X ? For “simple” or “flat” labels: * iterate through labels * evaluate score for each label, keeping only the best (n best) * return the best (or n best) label and score

Classify or Decode with Bayes Rule how likely is text (complex output) X overall? how well does text (complex input) Y represent text (complex output) X ?

Classify or Decode with Bayes Rule how likely is text (complex output) X overall? how well does text (complex input) Y represent text can be (complex output) X ? complicated * iterate through labels * evaluate score for each label, keeping only the best (n best) * return the best (or n best) label and score

Classify or Decode with Bayes Rule how likely is text (complex output) X overall? how well does text (complex input) Y represent text can be (complex output) X ? complicated * iterate through labels * evaluate score for each label, keeping only the best (n best) we’ll come back to this in October * return the best (or n best) label and score

Evaluation: the 2-by-2 contingency table Actually Actually Correct Incorrect Selected/ Guessed Not selected/ not guessed

Classification Evaluation: the 2-by-2 contingency table Actually Actually Correct Incorrect Selected/ Guessed Not selected/ not guessed Classes/Choices

Classification Evaluation: the 2-by-2 contingency table Actually Actually Correct Incorrect Selected/ True Positive Guessed (TP) Guessed Correct Not selected/ not guessed Classes/Choices

Classification Evaluation: the 2-by-2 contingency table Actually Actually Correct Incorrect Selected/ True Positive False Positive Guessed (TP) (FP) Guessed Guessed Correct Correct Not selected/ not guessed Classes/Choices

Classification Evaluation: the 2-by-2 contingency table Actually Actually Correct Incorrect Selected/ True Positive False Positive Guessed (TP) (FP) Guessed Guessed Correct Correct Not selected/ False Negative not guessed (FN) Guessed Correct Classes/Choices

Classification Evaluation: the 2-by-2 contingency table Actually Actually Correct Incorrect Selected/ True Positive False Positive Guessed (TP) (FP) Guessed Guessed Correct Correct Not selected/ False Negative True Negative not guessed (FN) (TN) Guessed Guessed Correct Correct Classes/Choices

Classification Evaluation: Accuracy, Precision, and Recall Accuracy : % of items correct Actually Correct Actually Incorrect Selected/Guessed True Positive (TP) False Positive (FP) Not select/not guessed False Negative (FN) True Negative (TN)

Classification Evaluation: Accuracy, Precision, and Recall Accuracy : % of items correct Precision : % of selected items that are correct Actually Correct Actually Incorrect Selected/Guessed True Positive (TP) False Positive (FP) Not select/not guessed False Negative (FN) True Negative (TN)

Classification Evaluation: Accuracy, Precision, and Recall Accuracy : % of items correct Precision : % of selected items that are correct Recall : % of correct items that are selected Actually Correct Actually Incorrect Selected/Guessed True Positive (TP) False Positive (FP) Not select/not guessed False Negative (FN) True Negative (TN)

A combined measure: F Weighted (harmonic) average of P recision & R ecall

A combined measure: F Weighted (harmonic) average of P recision & R ecall algebra (not important)

A combined measure: F Weighted (harmonic) average of P recision & R ecall Balanced F1 measure: β =1

Sec. 15.2.4 Micro- vs. Macro-Averaging If we have more than one class, how do we combine multiple performance measures into one quantity? Macroaveraging : Compute performance for each class, then average. Microaveraging : Collect decisions for all classes, compute contingency table, evaluate.

Sec. 15.2.4 Micro- vs. Macro-Averaging: Example Class 1 Class 2 Micro Ave. Table Truth Truth Truth Truth Truth Truth : yes : no : yes : no : yes : no Classifier: 10 10 Classifier: 90 10 Classifier: 100 20 yes yes yes Classifier: 10 970 Classifier: 10 890 Classifier: 20 1860 no no no Macroaveraged precision: (0.5 + 0.9)/2 = 0.7 Microaveraged precision: 100/120 = .83 Microaveraged score is dominated by score on common classes

Language Modeling as Naïve Bayes Classifier prior class-based likelihood probability of observed (language model) class class data posterior probability observation likelihood (averaged over all classes) Posterior Classification/Decoding Noisy Channel Model Decoding maximum a posteriori

The Bag of Words Representation

The Bag of Words Representation 39

Bag of Words Representation seen 2 classifier sweet 1 γ ( )=c whimsical 1 recommend 1 happy 1 classifier ... ...

Language Modeling as Naïve Bayes Classifier Start with Bayes Rule

Language Modeling as Naïve Bayes Classifier Adopt naïve bag of words representation Y i

Language Modeling as Naïve Bayes Classifier Adopt naïve bag of words representation Y i Assume position doesn’t matter

Language Modeling as Naïve Bayes Classifier Adopt naïve bag of words representation Y i Assume position doesn’t matter Assume the feature probabilities are independent given the class X

Multinomial Naïve Bayes: Learning From training corpus, extract Vocabulary

Multinomial Naïve Bayes: Learning From training corpus, extract Vocabulary Calculate P ( c j ) terms For each c j in C do docs j = all docs with class = c j

Multinomial Naïve Bayes: Learning From training corpus, extract Vocabulary Calculate P ( c j ) terms Calculate P ( w k | c j ) terms For each c j in C do Text j = single doc containing all docs j For each word w k in Vocabulary docs j = all docs with class = c j n k = # of occurrences of w k in Text j 𝑞 𝑥 𝑙 | 𝑑 𝑘 = class LM

Naïve Bayes and Language Modeling Naïve Bayes classifiers can use any sort of feature But if, as in the previous slides We use only word features we use all of the words in the text (not a subset) Then Naïve Bayes has an important similarity to language modeling

Sec.13.2.1 Naïve Bayes as a Language Model Positive Model Negative Model 0.1 I 0.2 I 0.1 love 0.001 love 0.01 this 0.01 this 0.05 fun 0.005 fun 0.1 film 0.1 film

Sec.13.2.1 Naïve Bayes as a Language Model Which class assigns the higher probability to s ? Positive Model Negative Model I love this fun film 0.1 I 0.2 I 0.1 love 0.001 love 0.01 this 0.01 this 0.05 fun 0.005 fun 0.1 film 0.1 film

Nave Bayes & Maxent Models CMSC 473/673 UMBC September 18 th , - PowerPoint PPT Presentation

Nave Bayes & Maxent Models CMSC 473/673 UMBC September 18 th , 2017 Some slides adapted from 3SLP Announcements: Assignment 1 Due 11:59 AM, Wednesday 9/20 < 2 days Use submit utility with: class id cs473_ferraro assignment id a1 We

Overview MAXENT-Modeling: A framework for Discrete MAXENT-Models and RMs IRT-Modeling?

From Maxent to Machine Learning and Back T. Sears ANU March 2007 T. Sears (ANU) From Maxent to

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

Nave Bayes, Maxent and Neural Models CMSC 473/673 UMBC Some slides adapted from 3SLP Outline

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

MaxEnt Models and Discriminative Estimation Gerald Penn CS224N/Ling284 [based on slides by

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

T-orders in MaxEnt Arto Anttila (Stanford University) and Giorgio Magri (CNRS) Society for

Duality in a maximum generalized entropy model Shinto Eguchi Osamu Komori Atsumi Ohara MaxEnt

Maxent Models (III), & Neural Language Models CMSC 473/673 UMBC September 25 th , 2017 Some

Empirical Bayes Newton Method Bayesian Linear Models MAP Learning Will Penny MEG Source

Analysis in Hindi Naman Bansal Umair Z Ahmed MOTIVATION Why Sentiment Analysis? Labeling

CSC 411: Introduction to Machine Learning Lecture 1 - Introduction Roger Grosse, Amir-massoud

Sentiment Analysis of Peer Review Texts for Scholarly Papers Ke Wang & Xiaojun Wan

ANALOGUE TELEVISION ANALOGUE TELEVISION Fernando Pereira Fernando Pereira Instituto Superior

Lecture 1 Number Representation CS 230 - Spring 2020 1-1 Number Representation Radix

CS 126 Lecture T3: Formal Languages Outline Introduction Defining grammar Type 3

State-Based Mode Switching with Applications to Mixed-Criticality Systems Pontus Ekberg , Martin

Focus Mismatch Detection in stereoscopic content Frdric Devernay, Sergi Pujades and Vijay

Nave Bayes & Maxent Models CMSC 473/673 UMBC September 18 th , - PowerPoint PPT Presentation

Nave Bayes & Maxent Models CMSC 473/673 UMBC September 18 th , 2017 Some slides adapted from 3SLP Announcements: Assignment 1 Due 11:59 AM, Wednesday 9/20 < 2 days Use submit utility with: class id cs473_ferraro assignment id a1 We

Overview MAXENT-Modeling: A framework for Discrete MAXENT-Models and RMs IRT-Modeling?

From Maxent to Machine Learning and Back T. Sears ANU March 2007 T. Sears (ANU) From Maxent to

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

Nave Bayes, Maxent and Neural Models CMSC 473/673 UMBC Some slides adapted from 3SLP Outline

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

MaxEnt Models and Discriminative Estimation Gerald Penn CS224N/Ling284 [based on slides by

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

T-orders in MaxEnt Arto Anttila (Stanford University) and Giorgio Magri (CNRS) Society for

Duality in a maximum generalized entropy model Shinto Eguchi Osamu Komori Atsumi Ohara MaxEnt

Maxent Models (III), &amp; Neural Language Models CMSC 473/673 UMBC September 25 th , 2017 Some

Empirical Bayes Newton Method Bayesian Linear Models MAP Learning Will Penny MEG Source

Analysis in Hindi Naman Bansal Umair Z Ahmed MOTIVATION Why Sentiment Analysis? Labeling

CSC 411: Introduction to Machine Learning Lecture 1 - Introduction Roger Grosse, Amir-massoud

Sentiment Analysis of Peer Review Texts for Scholarly Papers Ke Wang &amp; Xiaojun Wan

ANALOGUE TELEVISION ANALOGUE TELEVISION Fernando Pereira Fernando Pereira Instituto Superior

Lecture 1 Number Representation CS 230 - Spring 2020 1-1 Number Representation Radix

CS 126 Lecture T3: Formal Languages Outline Introduction Defining grammar Type 3

State-Based Mode Switching with Applications to Mixed-Criticality Systems Pontus Ekberg , Martin

Focus Mismatch Detection in stereoscopic content Frdric Devernay, Sergi Pujades and Vijay

Maxent Models (III), & Neural Language Models CMSC 473/673 UMBC September 25 th , 2017 Some

Sentiment Analysis of Peer Review Texts for Scholarly Papers Ke Wang & Xiaojun Wan