introduction to artificial intelligence corenlp semantic
play

Introduction to Artificial Intelligence CoreNLP, Semantic Analysis, - PowerPoint PPT Presentation

Introduction to Artificial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classifier Janyl Jumadinova November 18, 2016 CoreNLP Reference: http://stanfordnlp.github.io/CoreNLP/ Package available in /opt/corenlp/ Run: java -cp


  1. Introduction to Artificial Intelligence CoreNLP, Semantic Analysis, Naives Bayes Classifier Janyl Jumadinova November 18, 2016

  2. CoreNLP ◮ Reference: http://stanfordnlp.github.io/CoreNLP/ ◮ Package available in /opt/corenlp/ ◮ Run: java -cp "/opt/corenlp/stanford-corenlp-3.7.0/*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -file input.txt 2/24

  3. CoreNLP Annotators http://stanfordnlp.github.io/CoreNLP/annotators.html ◮ tokenize: Creates tokens from the given text. 3/24

  4. CoreNLP Annotators http://stanfordnlp.github.io/CoreNLP/annotators.html ◮ tokenize: Creates tokens from the given text. ◮ ssplit: Separates a sequence of tokens into sentences. 3/24

  5. CoreNLP Annotators http://stanfordnlp.github.io/CoreNLP/annotators.html ◮ tokenize: Creates tokens from the given text. ◮ ssplit: Separates a sequence of tokens into sentences. ◮ pos: Creates Parts of Speech (POS) tags for tokens. ◮ ner: Performs Named Entity Recognition classification. 3/24

  6. CoreNLP Annotators http://stanfordnlp.github.io/CoreNLP/annotators.html ◮ lemma: Creates word lemmas for tokens. 4/24

  7. CoreNLP Annotators http://stanfordnlp.github.io/CoreNLP/annotators.html ◮ lemma: Creates word lemmas for tokens. – The goal of lemmatization (as of stemming ) is to reduce related forms of a word to a common base form. 4/24

  8. CoreNLP Annotators http://stanfordnlp.github.io/CoreNLP/annotators.html ◮ lemma: Creates word lemmas for tokens. – The goal of lemmatization (as of stemming ) is to reduce related forms of a word to a common base form. – Lemmatization usually uses a vocabulary and morphological analysis of words to: - remove inflectional endings only, and - to return the base or dictionary form of a word, which is known as the lemma . 4/24

  9. Sentiment Analysis 5/24

  10. Sentiment Analysis ◮ https://www.csc.ncsu.edu/faculty/healey/tweet_viz/ tweet_app/ ◮ http://www.alchemyapi.com/developers/ getting-started-guide/twitter-sentiment-analysis ◮ www.sentiment140.com 6/24

  11. Sentiment analysis has many other names ◮ Opinion extraction ◮ Opinion mining ◮ Sentiment mining ◮ Subjectivity analysis 7/24

  12. Sentiment analysis is the detection of attitudes ◮ “enduring, affectively colored beliefs, dispositions towards objects or persons” 8/24

  13. Attitudes ◮ Holder (source) of attitude ◮ Target (aspect) of attitude 9/24

  14. Attitudes ◮ Holder (source) of attitude ◮ Target (aspect) of attitude ◮ Type of attitude - From a set of types: Like, love, hate, value, desire, etc. - Or (more commonly) simple weighted polarity: positive, negative, neutral, together with strength 9/24

  15. Attitudes ◮ Holder (source) of attitude ◮ Target (aspect) of attitude ◮ Type of attitude - From a set of types: Like, love, hate, value, desire, etc. - Or (more commonly) simple weighted polarity: positive, negative, neutral, together with strength ◮ Text containing the attitude - Sentence or entire document 9/24

  16. Sentiment analysis ◮ Simplest task : Is the attitude of this text positive or negative? 10/24

  17. Sentiment analysis ◮ Simplest task : Is the attitude of this text positive or negative? ◮ More complex : Rank the attitude of this text from 1 to 5 10/24

  18. Sentiment analysis ◮ Simplest task : Is the attitude of this text positive or negative? ◮ More complex : Rank the attitude of this text from 1 to 5 ◮ Advanced : Detect the target, source, or complex attitude types 10/24

  19. Baseline Algorithm ◮ Tokenization ◮ Feature Extraction ◮ Classification using different classifiers – Naive Bayes – MaxEnt – SVM 11/24

  20. Sentiment Tokenization Issues ◮ Deal with HTML and XML markup ◮ Twitter/Facebook/... mark-up (names, hash tags) ◮ Capitalization (preserve for words in all caps) ◮ Phone numbers, dates ◮ Emoticons 12/24

  21. Extracting Features for Sentiment Classification ◮ How to handle negation : I didn’t like this movie vs. I really like this movie 13/24

  22. Extracting Features for Sentiment Classification ◮ How to handle negation : I didn’t like this movie vs. I really like this movie ◮ Which words to use? –Only adjectives –All words 13/24

  23. Negation Add NOT to every word between negation and following punctuation 14/24

  24. Naive Bayes Algorithm ◮ Simple (“naive”) classification method based on Bayes rule ◮ Relies on very simple representation of document: - Bag of words 15/24

  25. Naive Bayes Algorithm 16/24

  26. Naive Bayes Algorithm 17/24

  27. Naive Bayes Algorithm 18/24

  28. Naive Bayes Algorithm For a document d and a class c 19/24

  29. Naive Bayes Algorithm 20/24

  30. Naive Bayes Algorithm 21/24

  31. Naive Bayes Algorithm 22/24

  32. Binarized (Boolean feature) Multinomial Naive Bayes Intuition: ◮ Word occurrence may matter more than word frequency ◮ The occurrence of the word fantastic tells us a lot ◮ The fact that it occurs 5 times may not tell us much more. 23/24

  33. Binarized (Boolean feature) Multinomial Naive Bayes Intuition: ◮ Word occurrence may matter more than word frequency ◮ The occurrence of the word fantastic tells us a lot ◮ The fact that it occurs 5 times may not tell us much more. Boolean Multinomial Naive Bayes Clips all the word counts in each document at 1 23/24

  34. Neural Networks and Deep Learning: Next! ◮ http://nlp.stanford.edu/sentiment/ ◮ java -cp "/opt/corenlp/stanford-corenlp-3.7.0/*" -Xmx2g edu.stanford.nlp.sentiment.SentimentPipeline -file input.txt 24/24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend