cs 4501 machine learning for nlp
play

CS 4501 Machine Learning for NLP Text Classification (I): Logistic - PowerPoint PPT Presentation

CS 4501 Machine Learning for NLP Text Classification (I): Logistic Regression Yangfeng Ji Department of Computer Science University of Virginia Overview 1. Problem Definition 2. Bag-of-Words Representation 3. Case Study: Sentiment Analysis


  1. CS 4501 Machine Learning for NLP Text Classification (I): Logistic Regression Yangfeng Ji Department of Computer Science University of Virginia

  2. Overview 1. Problem Definition 2. Bag-of-Words Representation 3. Case Study: Sentiment Analysis 4. Logistic Regression 5. 퐿 2 Regularization 6. Demo Code 1

  3. Problem Definition

  4. Case I: Sentiment Analysis [Pang et al., 2002] 3

  5. Case II: Topic Classification Example topics ◮ Business ◮ Arts ◮ Technology ◮ Sports ◮ · · · 4

  6. Classification ◮ Input: a text 풙 ◮ Example: a product review on Amazon ◮ Output: 푦 ∈ Y , where Y is the predefined category set (sample space) ◮ Example: Y = { Positive , Negative } 1 In this course, we use 풙 for both text and its representation with no distinction 5

  7. Classification ◮ Input: a text 풙 ◮ Example: a product review on Amazon ◮ Output: 푦 ∈ Y , where Y is the predefined category set (sample space) ◮ Example: Y = { Positive , Negative } The pipeline of text classification: 1 Text Numeric Vector 풙 Classifier Category 푦 1 In this course, we use 풙 for both text and its representation with no distinction 5

  8. Probabilistic Formulation With the conditional probability 푃 ( 푌 | 푿 ) , the prediction on 푌 for a given text 푿 = 풙 is 푦 = argmax ˆ 푃 ( 푌 = 푦 | 푿 = 풙 ) (1) 푦 ∈ Y 6

  9. Probabilistic Formulation With the conditional probability 푃 ( 푌 | 푿 ) , the prediction on 푌 for a given text 푿 = 풙 is 푦 = argmax ˆ 푃 ( 푌 = 푦 | 푿 = 풙 ) (1) 푦 ∈ Y Or, for simplicity 푦 = argmax ˆ 푃 ( 푦 | 풙 ) (2) 푦 ∈ Y 6

  10. Key Questions Recall ◮ The formulation defined in the previous slide 푦 = argmax ˆ 푃 ( 푌 = 푦 | 푿 = 풙 ) (3) 푦 ∈ Y ◮ The pipeline of text classification Text Numeric Vector 풙 Classifier Category 푦 7

  11. Key Questions Recall ◮ The formulation defined in the previous slide 푦 = argmax ˆ 푃 ( 푌 = 푦 | 푿 = 풙 ) (3) 푦 ∈ Y ◮ The pipeline of text classification Text Numeric Vector 풙 Classifier Category 푦 Building a text classifier is about answering the following two questions 1. How to represent a text as 풙 ? 2. How to estimate 푃 ( 푦 | 풙 ) ? 7

  12. Key Questions Recall ◮ The formulation defined in the previous slide 푦 = argmax ˆ 푃 ( 푌 = 푦 | 푿 = 풙 ) (3) 푦 ∈ Y ◮ The pipeline of text classification Text Numeric Vector 풙 Classifier Category 푦 Building a text classifier is about answering the following two questions 1. How to represent a text as 풙 ? ◮ Bag-of-words representation 2. How to estimate 푃 ( 푦 | 풙 ) ? 7

  13. Key Questions Recall ◮ The formulation defined in the previous slide 푦 = argmax ˆ 푃 ( 푌 = 푦 | 푿 = 풙 ) (3) 푦 ∈ Y ◮ The pipeline of text classification Text Numeric Vector 풙 Classifier Category 푦 Building a text classifier is about answering the following two questions 1. How to represent a text as 풙 ? ◮ Bag-of-words representation 2. How to estimate 푃 ( 푦 | 풙 ) ? ◮ Logistic regression models 7

  14. Key Questions Recall ◮ The formulation defined in the previous slide 푦 = argmax ˆ 푃 ( 푌 = 푦 | 푿 = 풙 ) (3) 푦 ∈ Y ◮ The pipeline of text classification Text Numeric Vector 풙 Classifier Category 푦 Building a text classifier is about answering the following two questions 1. How to represent a text as 풙 ? ◮ Bag-of-words representation 2. How to estimate 푃 ( 푦 | 풙 ) ? ◮ Logistic regression models ◮ Neural network classifiers 7

  15. Bag-of-Words Representation

  16. Bag-of-Words Representation Example Texts Text 1: I love coffee. Text 2: I don’t like tea. 9

  17. Bag-of-Words Representation Example Texts Text 1: I love coffee. Text 2: I don’t like tea. Step I : convert a text into a collection of tokens Tokenized Texts Tokenized text 1: I love coffee Tokenized text 2: I don t like tea 9

  18. Bag-of-Words Representation Example Texts Text 1: I love coffee. Text 2: I don’t like tea. Step I : convert a text into a collection of tokens Tokenized Texts Tokenized text 1: I love coffee Tokenized text 2: I don t like tea Step II : build a dictionary/vocabulary Vocabulary { I love coffee don t like tea } 9

  19. Bag-of-Words Representations Step III : based on the vocab, convert each text into a numeric representation as Bag-of-Words Representations I love coffee don t like tea 풙 ( 1 ) = 0] T [1 1 1 0 0 0 풙 ( 2 ) = 1] T [1 0 0 1 1 1 10

  20. Bag-of-Words Representations Step III : based on the vocab, convert each text into a numeric representation as Bag-of-Words Representations I love coffee don t like tea 풙 ( 1 ) = 0] T [1 1 1 0 0 0 풙 ( 2 ) = 1] T [1 0 0 1 1 1 The pipeline of text classification: Text Numeric Vector 풙 Classifier Category 푦 Bag-of-words Representation 10

  21. Preprocessing for Building Vocab 1. convert all characters to lowercase UVa , UVA → uva 11

  22. Preprocessing for Building Vocab 1. convert all characters to lowercase UVa , UVA → uva 2. map low frequency words to a special token unk Zipf’s law: 푓 ( 푤 푡 ) ∝ 1 / 푟 푡 11

  23. Information Embedded in BoW Representations It is critical to keep in mind about what information is preserved in bag-of-words representations: ◮ Keep: ◮ words in texts ◮ Lose: ◮ word order ◮ sentence boundary ◮ paragraph boundary ◮ · · · 12

  24. Case Study: Sentiment Analysis

  25. A Simple Predictor Consider the following toy example Tokenized Texts Tokenized text 1: I love coffee Tokenized text 2: I don t like tea 14

  26. A Simple Predictor Consider the following toy example Tokenized Texts Tokenized text 1: I love coffee Tokenized text 2: I don t like tea I love coffee don t like tea 풙 ( 1 ) 0 ] T [1 1 1 0 0 0 0 ] T [0 1 0 0 0 1 풘 Pos 0 ] T [0 0 0 1 0 0 풘 Neg 14

  27. A Simple Predictor Consider the following toy example Tokenized Texts Tokenized text 1: I love coffee Tokenized text 2: I don t like tea I love coffee don t like tea 풙 ( 1 ) 0 ] T [1 1 1 0 0 0 0 ] T [0 1 0 0 0 1 풘 Pos 0 ] T [0 0 0 1 0 0 풘 Neg The prediction of sentiment polarity can be formulated as the following 풘 T Pos 풙 = 1 > 풘 T Neg 풙 = 0 (4) 14

  28. A Simple Predictor Consider the following toy example Tokenized Texts Tokenized text 1: I love coffee Tokenized text 2: I don t like tea I love coffee don t like tea 풙 ( 1 ) 0 ] T [1 1 1 0 0 0 0 ] T [0 1 0 0 0 1 풘 Pos 0 ] T [0 0 0 1 0 0 풘 Neg The prediction of sentiment polarity can be formulated as the following 풘 T Pos 풙 = 1 > 풘 T Neg 풙 = 0 (4) Essentially, this way of prediction is counting the positive and 14 negative words in the text.

  29. Another Example The limitation of word counting I love coffee don t like tea 풙 ( 2 ) 1 ] T [1 0 0 1 1 1 0 ] T 풘 Pos [0 1 0 0 0 1 0 ] T 풘 Neg [0 0 0 1 0 0 15

  30. Another Example The limitation of word counting I love coffee don t like tea 풙 ( 2 ) 1 ] T [1 0 0 1 1 1 0 ] T 풘 Pos [0 1 0 0 0 1 0 ] T 풘 Neg [0 0 0 1 0 0 ◮ Different words should contribute differently. e.g., not vs. dislike 15

  31. Another Example The limitation of word counting I love coffee don t like tea 풙 ( 2 ) 1 ] T [1 0 0 1 1 1 0 ] T 풘 Pos [0 1 0 0 0 1 0 ] T 풘 Neg [0 0 0 1 0 0 ◮ Different words should contribute differently. e.g., not vs. dislike ◮ Sentiment word lists are not complete Example II: Positive Din Tai Fung, every time I go eat at anyone of the locations around the King County area, I keep being reminded on why I have to keep coming back to this restaurant. · · · 15

  32. Logistic Regression

  33. Log-linear Models Directly modeling a linear classifier as ℎ 푦 ( 풙 ) = 풘 T 푦 풙 + 푏 푦 (5) with ◮ 풙 ∈ ℕ 푉 : vector, bag-of-words representation ◮ 풘 푦 ∈ ℝ 푉 : vector, classification weights associated with label 푦 ◮ 푏 푦 ∈ ℝ : scalar, label bias in the training set 푦 17

  34. Log-linear Models Directly modeling a linear classifier as ℎ 푦 ( 풙 ) = 풘 T 푦 풙 + 푏 푦 (5) with ◮ 풙 ∈ ℕ 푉 : vector, bag-of-words representation ◮ 풘 푦 ∈ ℝ 푉 : vector, classification weights associated with label 푦 ◮ 푏 푦 ∈ ℝ : scalar, label bias in the training set 푦 About Label Bias Consider a case where we have 90 positive examples and 10 negative examples in the training set. With 푏 Pos > 푏 Neg , a classifier can get 90% predictions correct without even resorting the texts. 17

  35. Logistic Regression Rewrite the linear decision function in the log probabilitic form log 푃 ( 푦 | 풙 ) ∝ 풘 T 푦 풙 + 푏 푦 (6) � ����� �� ����� � ℎ 푦 ( 풙 ) 18

  36. Logistic Regression Rewrite the linear decision function in the log probabilitic form log 푃 ( 푦 | 풙 ) ∝ 풘 T 푦 풙 + 푏 푦 (6) � ����� �� ����� � ℎ 푦 ( 풙 ) or, the probabilistic form is 푃 ( 푦 | 풙 ) ∝ exp ( 풘 T 푦 풙 + 푏 푦 ) (7) 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend