Introduction to Artificial Intelligence CoreNLP, Semantic Analysis, - - PowerPoint PPT Presentation

introduction to artificial intelligence corenlp semantic
SMART_READER_LITE
LIVE PREVIEW

Introduction to Artificial Intelligence CoreNLP, Semantic Analysis, - - PowerPoint PPT Presentation

Introduction to Artificial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classifier Janyl Jumadinova November 18, 2016 CoreNLP Reference: http://stanfordnlp.github.io/CoreNLP/ Package available in /opt/corenlp/ Run: java -cp


slide-1
SLIDE 1

Introduction to Artificial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classifier

Janyl Jumadinova November 18, 2016

slide-2
SLIDE 2

CoreNLP

◮ Reference: http://stanfordnlp.github.io/CoreNLP/ ◮ Package available in /opt/corenlp/ ◮ Run: java -cp

"/opt/corenlp/stanford-corenlp-3.7.0/*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP

  • annotators tokenize,ssplit,pos,lemma,ner -file

input.txt

2/24

slide-3
SLIDE 3

CoreNLP Annotators

http://stanfordnlp.github.io/CoreNLP/annotators.html

◮ tokenize: Creates tokens from the given text. 3/24

slide-4
SLIDE 4

CoreNLP Annotators

http://stanfordnlp.github.io/CoreNLP/annotators.html

◮ tokenize: Creates tokens from the given text. ◮ ssplit: Separates a sequence of tokens into sentences. 3/24

slide-5
SLIDE 5

CoreNLP Annotators

http://stanfordnlp.github.io/CoreNLP/annotators.html

◮ tokenize: Creates tokens from the given text. ◮ ssplit: Separates a sequence of tokens into sentences. ◮ pos: Creates Parts of Speech (POS) tags for tokens. ◮ ner: Performs Named Entity Recognition classification. 3/24

slide-6
SLIDE 6

CoreNLP Annotators

http://stanfordnlp.github.io/CoreNLP/annotators.html

◮ lemma: Creates word lemmas for tokens. 4/24

slide-7
SLIDE 7

CoreNLP Annotators

http://stanfordnlp.github.io/CoreNLP/annotators.html

◮ lemma: Creates word lemmas for tokens.

– The goal of lemmatization (as of stemming) is to reduce related forms of a word to a common base form.

4/24

slide-8
SLIDE 8

CoreNLP Annotators

http://stanfordnlp.github.io/CoreNLP/annotators.html

◮ lemma: Creates word lemmas for tokens.

– The goal of lemmatization (as of stemming) is to reduce related forms of a word to a common base form. – Lemmatization usually uses a vocabulary and morphological analysis of words to:

  • remove inflectional endings only, and
  • to return the base or dictionary form of a word, which is

known as the lemma.

4/24

slide-9
SLIDE 9

Sentiment Analysis

5/24

slide-10
SLIDE 10

Sentiment Analysis

◮ https://www.csc.ncsu.edu/faculty/healey/tweet_viz/

tweet_app/

◮ http://www.alchemyapi.com/developers/

getting-started-guide/twitter-sentiment-analysis

◮ www.sentiment140.com 6/24

slide-11
SLIDE 11

Sentiment analysis has many other names

◮ Opinion extraction ◮ Opinion mining ◮ Sentiment mining ◮ Subjectivity analysis 7/24

slide-12
SLIDE 12

Sentiment analysis is the detection of attitudes

◮ “enduring, affectively colored beliefs, dispositions towards

  • bjects or persons”

8/24

slide-13
SLIDE 13

Attitudes

◮ Holder (source) of attitude ◮ Target (aspect) of attitude 9/24

slide-14
SLIDE 14

Attitudes

◮ Holder (source) of attitude ◮ Target (aspect) of attitude ◮ Type of attitude

  • From a set of types:

Like, love, hate, value, desire, etc.

  • Or (more commonly) simple weighted polarity:

positive, negative, neutral, together with strength

9/24

slide-15
SLIDE 15

Attitudes

◮ Holder (source) of attitude ◮ Target (aspect) of attitude ◮ Type of attitude

  • From a set of types:

Like, love, hate, value, desire, etc.

  • Or (more commonly) simple weighted polarity:

positive, negative, neutral, together with strength

◮ Text containing the attitude

  • Sentence or entire document

9/24

slide-16
SLIDE 16

Sentiment analysis

◮ Simplest task:

Is the attitude of this text positive or negative?

10/24

slide-17
SLIDE 17

Sentiment analysis

◮ Simplest task:

Is the attitude of this text positive or negative?

◮ More complex:

Rank the attitude of this text from 1 to 5

10/24

slide-18
SLIDE 18

Sentiment analysis

◮ Simplest task:

Is the attitude of this text positive or negative?

◮ More complex:

Rank the attitude of this text from 1 to 5

◮ Advanced:

Detect the target, source, or complex attitude types

10/24

slide-19
SLIDE 19

Baseline Algorithm

◮ Tokenization ◮ Feature Extraction ◮ Classification using different classifiers

– Naive Bayes – MaxEnt – SVM

11/24

slide-20
SLIDE 20

Sentiment Tokenization Issues

◮ Deal with HTML and XML markup ◮ Twitter/Facebook/... mark-up (names, hash tags) ◮ Capitalization (preserve for words in all caps) ◮ Phone numbers, dates ◮ Emoticons 12/24

slide-21
SLIDE 21

Extracting Features for Sentiment Classification

◮ How to handle negation:

I didn’t like this movie vs. I really like this movie

13/24

slide-22
SLIDE 22

Extracting Features for Sentiment Classification

◮ How to handle negation:

I didn’t like this movie vs. I really like this movie

◮ Which words to use?

–Only adjectives –All words

13/24

slide-23
SLIDE 23

Negation

Add NOT to every word between negation and following punctuation

14/24

slide-24
SLIDE 24

Naive Bayes Algorithm

◮ Simple (“naive”) classification method based on Bayes rule ◮ Relies on very simple representation of document:

  • Bag of words

15/24

slide-25
SLIDE 25

Naive Bayes Algorithm

16/24

slide-26
SLIDE 26

Naive Bayes Algorithm

17/24

slide-27
SLIDE 27

Naive Bayes Algorithm

18/24

slide-28
SLIDE 28

Naive Bayes Algorithm

For a document d and a class c

19/24

slide-29
SLIDE 29

Naive Bayes Algorithm

20/24

slide-30
SLIDE 30

Naive Bayes Algorithm

21/24

slide-31
SLIDE 31

Naive Bayes Algorithm

22/24

slide-32
SLIDE 32

Binarized (Boolean feature) Multinomial Naive Bayes

Intuition:

◮ Word occurrence may matter more than word frequency ◮ The occurrence of the word fantastic tells us a lot ◮ The fact that it occurs 5 times may not tell us much more. 23/24

slide-33
SLIDE 33

Binarized (Boolean feature) Multinomial Naive Bayes

Intuition:

◮ Word occurrence may matter more than word frequency ◮ The occurrence of the word fantastic tells us a lot ◮ The fact that it occurs 5 times may not tell us much more.

Boolean Multinomial Naive Bayes

Clips all the word counts in each document at 1

23/24

slide-34
SLIDE 34

Neural Networks and Deep Learning: Next!

◮ http://nlp.stanford.edu/sentiment/ ◮ java -cp "/opt/corenlp/stanford-corenlp-3.7.0/*"

  • Xmx2g edu.stanford.nlp.sentiment.SentimentPipeline
  • file input.txt

24/24