 
              Maxent Models (II) CMSC 473/673 UMBC September 20 th , 2017
Announcements: Assignment 1 Due 11:59 PM, Saturday 9/23 ~3.5 days Use submit utility with: class id cs473_ferraro assignment id a1 We must be able to run it on GL! Common pitfall #1: forgetting files Common pitfall #2: incorrect paths to files Common pitfall #3: 3 rd party libraries
Announcements: Question 6 π (π) π¦ π π¦ πβπ+1 : π¦ πβ1 ) = π π π (π) π¦ πβπ+1 :π¦ π + 1 β π π π (πβ1) (π¦ π |π¦ πβπ+2 :π¦ πβ1 ) π (π) π¦ π π¦ πβπ+1 : π¦ πβ1 ) = π π,π π (π) π¦ πβπ+1 :π¦ π + π π,πβ1 π (πβ1) π¦ πβπ+2 :π¦ π + β― π π,0 π (0) β πβ1 π π,0 = 1 β ΰ· π π,πβπ π=0
Announcements: Course Project Official handout will be out Friday 9/22 Until then, focus on assignment 1 Teams of 1-3 Mixed undergrad/grad is encouraged but not required Some novel aspect is needed Ex 1: reimplement existing technique and apply to new domain Ex 2: reimplement existing technique and apply to new (human) language Ex 3: explore novel technique on existing problem
Recap from last timeβ¦
Classify or Decode with Bayes Rule how likely is label X overall? how well does text Y represent label X ? For βsimpleβ or βflatβ labels: * iterate through labels * evaluate score for each label, keeping only the best (n best) * return the best (or n best) label and score
Classification Evaluation: the 2-by-2 contingency table Actually Actually Correct Incorrect Selected/ True Positive False Positive Guessed (TP) (FP) Guessed Guessed Correct Correct Not selected/ False Negative True Negative not guessed (FN) (TN) Guessed Guessed Correct Correct Classes/Choices
Classification Evaluation: Accuracy, Precision, and Recall Accuracy : % of items correct Precision : % of selected items that are correct Recall : % of correct items that are selected Actually Correct Actually Incorrect Selected/Guessed True Positive (TP) False Positive (FP) Not select/not guessed False Negative (FN) True Negative (TN)
Language Modeling as NaΓ―ve Bayes Classifier Adopt naΓ―ve bag of words representation Y i Assume position doesnβt matter Assume the feature probabilities are independent given the class X
NaΓ―ve Bayes Summary Potential Advantages Potential Issues Model the posterior in one go? Very Fast, low storage requirements Robust to Irrelevant Features Are the features really uncorrelated? Very good in domains with many equally important features Are plain counts always appropriate? Optimal if the independence assumptions hold Are there βbetterβ ways of handling missing/noisy data? Dependable baseline for text (automated, more principled) classification (but often not the best)
NaΓ―ve Bayes Summary Potential Advantages Potential Issues Model the posterior in one go? Very Fast, low storage requirements Model the posterior in one go? Relevant for classificationβ¦ Robust to Irrelevant Features Are the features really Are the features really uncorrelated? uncorrelated? Very good in domains with many equally important features Are plain counts always Are plain counts always appropriate? appropriate? Optimal if the independence assumptions hold Are there βbetterβ ways of Are there βbetterβ ways of handling missing/noisy data? handling missing/noisy data? Dependable baseline for text (automated, more principled) (automated, more principled) classification (but often not the best)
NaΓ―ve Bayes Summary Potential Advantages Potential Issues Model the posterior in one go? Very Fast, low storage requirements Relevant for Model the posterior in one go? classificationβ¦ and Robust to Irrelevant Features Are the features really language modeling Are the features really uncorrelated? uncorrelated? Very good in domains with many equally important features Are plain counts always Are plain counts always appropriate? appropriate? Optimal if the independence assumptions hold Are there βbetterβ ways of Are there βbetterβ ways of handling missing/noisy data? handling missing/noisy data? Dependable baseline for text (automated, more principled) (automated, more principled) classification (but often not the best)
Maximum Entropy Models a more general language model argmax π π π π) β π(π)
Maximum Entropy Models a more general language model argmax π π π π) β π(π) classify in one go argmax π π π π)
Document Classification Three people have been A TTACK fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region. We need to score the different combinations.
Score and Combine Our Possibilities score 1 (fatally shot, A TTACK ) C OMBINE posterior score 2 (seriously wounded, A TTACK ) probability of score 3 (Shining Path, A TTACK ) A TTACK β¦ score k (department, A TTACK ) β¦ are all of these uncorrelated?
Score and Combine Our Possibilities score 1 (fatally shot, A TTACK ) C OMBINE posterior score 2 (seriously wounded, A TTACK ) probability of score 3 (Shining Path, A TTACK ) A TTACK β¦ Q: What are the score and combine functions for NaΓ―ve Bayes?
Scoring Our Possibilities Three people have been fatally shot, and five people, including a score( , ) = mayor, were seriously wounded as A TTACK a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region . score 1 (fatally shot, A TTACK ) Learn these scoresβ¦ but score 2 (seriously wounded, A TTACK ) how? score 3 (Shining Path, A TTACK ) What do we optimize? β¦
Maxent Modeling Three people have been fatally shot, and five people, including p( | ) β a mayor, were seriously A TTACK wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region . Three people have been fatally S NAP ( score( , ) ) shot, and five people, including a mayor, were seriously wounded A TTACK as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region .
Maxent Modeling Three people have been fatally p( | ) β shot, and five people, including a mayor, were seriously wounded as A TTACK a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region . Three people have been fatally exp ( score( , ) ) shot, and five people, including a mayor, were seriously wounded A TTACK as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region . exp gives a positive, unnormalized probability f(x) = exp(x)
Maxent Modeling Three people have been fatally shot, and five people, including p( | ) β a mayor, were seriously A TTACK wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region . exp ( ) ) score 1 (fatally shot, A TTACK ) score 2 (seriously wounded, A TTACK ) score 3 (Shining Path, A TTACK ) β¦ Learn the scores (but weβll declare what combinations should be looked at)
Maxent Modeling Three people have been fatally shot, and five people, including p( | ) β a mayor, were seriously A TTACK wounded as a result of a Shining Path attack today against a community in Junin department, central Peruvian mountain region . exp ( ) ) weight 1 * applies 1 (fatally shot, A TTACK ) weight 2 * applies 2 (seriously wounded, A TTACK ) weight 3 * applies 3 (Shining Path, A TTACK ) β¦
Q : What if none of our features apply?
Guiding Principle for Log-Linear Models β[The log -linear estimate] is the least biased estimate possible on the given information; i.e., it is maximally noncommittal with regard to missing information .β Edwin T. Jaynes, 1957
Guiding Principle for Log-Linear Models β[The log -linear estimate] is the least biased estimate possible on the given information; i.e., it is exp( ΞΈΒ· f) ο¨ maximally noncommittal exp( ΞΈΒ· 0) = 1 with regard to missing information .β Edwin T. Jaynes, 1957
Easier-to-write form exp ( ) ) ΞΈ 1 * f 1 (fatally shot, A TTACK ) ΞΈ 2 * f 2 (seriously wounded, A TTACK ) ΞΈ 3 * f 3 (Shining Path, A TTACK ) β¦
Easier-to-write form exp ( ) ) ΞΈ 1 * f 1 (fatally shot, A TTACK ) ΞΈ 2 * f 2 (seriously wounded, A TTACK ) ΞΈ 3 * f 3 (Shining Path, A TTACK ) β¦ K K weights features
Easier-to-write form dot product exp ( ) ΞΈ Β·f ( doc , A TTACK ) K-dimensional K-dimensional weight vector feature vector
Log-Linear Models
Log-Linear Models
Log-Linear Models Feature function(s) Sufficient statistics βStrengthβ function(s)
Log-Linear Models Feature Weights Natural parameters Distribution Parameters
Log-Linear Models How do we normalize?
Recommend
More recommend