Text Classification Contd + Document Representations Prof. Sameer - - PowerPoint PPT Presentation

text classification contd document representations
SMART_READER_LITE
LIVE PREVIEW

Text Classification Contd + Document Representations Prof. Sameer - - PowerPoint PPT Presentation

Text Classification Contd + Document Representations Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 January 17, 2017 Based on slides from Nathan Schneider, Noah Smith, Dan Klein and everyone else they copied from. Outline Logistic


slide-1
SLIDE 1

Text Classification Contd + Document Representations

  • Prof. Sameer Singh

CS 295: STATISTICAL NLP WINTER 2017

January 17, 2017

Based on slides from Nathan Schneider, Noah Smith, Dan Klein and everyone else they copied from.

slide-2
SLIDE 2

Outline

CS 295: STATISTICAL NLP (WINTER 2017) 2

Logistic Regression Brief Intro to Neural Networks Document Representations

slide-3
SLIDE 3

Outline

CS 295: STATISTICAL NLP (WINTER 2017) 3

Logistic Regression Brief Intro to Neural Networks Document Representations

slide-4
SLIDE 4

Text Classification

CS 295: STATISTICAL NLP (WINTER 2017) 4

Human machine interface for ABC computer applications

  • Human Computer Interaction
  • Theory
  • Artificial Intelligence
  • Systems

Paper Title CS Area

slide-5
SLIDE 5

Linear Models

CS 295: STATISTICAL NLP (WINTER 2017) 5

Human machine interface for ABC computer applications

slide-6
SLIDE 6

Matrix/Neural View

CS 295: STATISTICAL NLP (WINTER 2017) 6

slide-7
SLIDE 7

Naïve Bayes as a Linear Model

CS 295: STATISTICAL NLP (WINTER 2017) 7

slide-8
SLIDE 8

Joint vs Conditional Likelihood

CS 295: STATISTICAL NLP (WINTER 2017) 8

slide-9
SLIDE 9

Logistic Regression Model

CS 295: STATISTICAL NLP (WINTER 2017) 9

slide-10
SLIDE 10

Logistic Regression: 2 classes

CS 295: STATISTICAL NLP (WINTER 2017) 10

slide-11
SLIDE 11

Estimating the parameters

CS 295: STATISTICAL NLP (WINTER 2017) 11

slide-12
SLIDE 12

Gradient Descent

CS 295: STATISTICAL NLP (WINTER 2017) 12

slide-13
SLIDE 13

Tips and Tricks: TF-IDF

CS 295: STATISTICAL NLP (WINTER 2017) 13

Sparsity of Words

  • Remember Zipf’s Law? Lots of rare words
  • For classification, they can be more informative!
slide-14
SLIDE 14

Tips and Tricks: TF-IDF

CS 295: STATISTICAL NLP (WINTER 2017) 14

Why use log(proportion)

  • It works…
  • Importance is not a linear function
  • IDF is an additive function
slide-15
SLIDE 15

Tips and Tricks: Regularization

CS 295: STATISTICAL NLP (WINTER 2017) 15

Overfitting

  • Training data is finite: thus has spurious correlations
  • Rare words that occur with one label!
  • Or don’t occur often enough
  • Curse of the Zipf’s Law continues…

For a word that occurs 10 times… There are many that occur ~10 times!

slide-16
SLIDE 16

Tips and Tricks: Regularization

CS 295: STATISTICAL NLP (WINTER 2017) 16

Fixing Overfitting

  • Ignore rare words (opposite of TF-IDF)
  • Penalize really high weights…

Regularization Strength Accuracy

slide-17
SLIDE 17

Tips and Tricks: Featurizing

CS 295: STATISTICAL NLP (WINTER 2017) 17

slide-18
SLIDE 18

Outline

CS 295: STATISTICAL NLP (WINTER 2017) 18

Logistic Regression Brief Intro to Neural Networks Document Representations

slide-19
SLIDE 19

Neural View of Log. Regression

CS 295: STATISTICAL NLP (WINTER 2017) 19

slide-20
SLIDE 20

Linear vs Non-linear Model

CS 295: STATISTICAL NLP (WINTER 2017) 20

slide-21
SLIDE 21

Introducing a Hidden Layer

CS 295: STATISTICAL NLP (WINTER 2017) 21

slide-22
SLIDE 22

What is Deep Learning?

CS 295: STATISTICAL NLP (WINTER 2017) 22

Many hidden layers In NLP, utilize unlabeled data to learn representations… (next lecture)

slide-23
SLIDE 23

Outline

CS 295: STATISTICAL NLP (WINTER 2017) 23

Logistic Regression Brief Intro to Neural Networks Document Representations

slide-24
SLIDE 24

Document Similarity

CS 295: STATISTICAL NLP (WINTER 2017) 24

Relation of user perceived response time to error measurement A survey of user opinion of computer system response time The generation of random, binary, ordered trees

slide-25
SLIDE 25

Cosine Distance

CS 295: STATISTICAL NLP (WINTER 2017) 25

Advantages

  • Between -1 and 1 (0 means no overlap)
  • If all >0, it is between 0 and 1
  • Size of vectors don’t matter
slide-26
SLIDE 26

Term Document Matrix

CS 295: STATISTICAL NLP (WINTER 2017) 26

slide-27
SLIDE 27

Local and Global Weighting

CS 295: STATISTICAL NLP (WINTER 2017) 27

Local Weighting

  • Binary:
  • Term Freq:
  • Log:

Global Weighting

  • Binary:
  • Normal:
  • IDF:
slide-28
SLIDE 28

Example: Documents

CS 295: STATISTICAL NLP (WINTER 2017) 28

c1: Human machine interface for ABC computer applications c2: A survey of user opinion of computer system response time c3: The EPS user interface management system c4: System and human system engineering testing of EPS c5: Relation of user perceived response time to error measurement m1: The generation of random, binary, ordered trees m2: The intersection graph of paths in trees m3: Graph minors IV: Widths of trees and well-quasi-ordering m4: Graph minors: A survey

From http://lsa.colorado.edu/papers/dp1.LSAintro.pdf

slide-29
SLIDE 29

Example: Term-Doc Matrix

CS 295: STATISTICAL NLP (WINTER 2017) 29 c1 c2 c3 c4 c5 m1 m2 m3 m4

human interface computer user system response time EPS survey trees graph minors

slide-30
SLIDE 30

Example: Distance Matrix

CS 295: STATISTICAL NLP (WINTER 2017) 30

c1 c2 c3 c4 c5 m1 m2 m3 m4

c1 c2 c3 c4 c5 m1 m2 m3 m4

slide-31
SLIDE 31

Problems with Sparse Vectors

CS 295: STATISTICAL NLP (WINTER 2017) 31

c1: Human machine interface for ABC computer applications c2: A survey of user opinion of computer system response time m4: Graph minors: A survey

slide-32
SLIDE 32

Example: Distance Matrix

CS 295: STATISTICAL NLP (WINTER 2017) 32

c1 c2 c3 c4 c5 m1 m2 m3 m4

c1 c2 c3 c4 c5 m1 m2 m3 m4

slide-33
SLIDE 33

Option 1: Clustering

CS 295: STATISTICAL NLP (WINTER 2017) 33

slide-34
SLIDE 34

Example: Clustering

CS 295: STATISTICAL NLP (WINTER 2017) 34

c1 c2 c3 c4 c5 m1 m2 m3 m4 c1 c2 c3 c4 c5 m1 m2 m3 m4

slide-35
SLIDE 35

Upcoming…

CS 295: STATISTICAL NLP (WINTER 2017) 35

  • Homework 1 is up!
  • No more material will be covered
  • Due: January 26, 2017

Homework

  • Project pitch is due January 23, 2017!
  • Start assembling teams now
  • Tons of datasets on the “projects” page on website

Project