introduction to artificial intelligence natural language
play

Introduction to Artificial Intelligence Natural Language Processing - PowerPoint PPT Presentation

Introduction to Artificial Intelligence Natural Language Processing Janyl Jumadinova November 14, 2016 Credit: NLP Stanford Question Answering: IBMs Watson 2/25 Information Extraction 3/25 Sentiment Extraction 4/25 Source: Washington


  1. Introduction to Artificial Intelligence Natural Language Processing Janyl Jumadinova November 14, 2016 Credit: NLP Stanford

  2. Question Answering: IBM’s Watson 2/25

  3. Information Extraction 3/25

  4. Sentiment Extraction 4/25 Source: Washington Post

  5. Machine Translation 5/25

  6. Language Technology 6/25

  7. Ambiguity makes NLP hard 7/25

  8. Ambiguity makes NLP hard ◮ Teacher Strikes Idle Kids ◮ Red Tape Holds Up New Bridges ◮ Juvenile Court to Try Shooting Defendant ◮ Local High School Dropouts Cut in Half 7/25

  9. Other NLP Difficulties 8/25

  10. Progress ◮ What tools do we need? ◮ Knowledge about language ◮ Knowledge about the world ◮ A way to combine knowledge sources 9/25

  11. Progress ◮ What tools do we need? ◮ Knowledge about language ◮ Knowledge about the world ◮ A way to combine knowledge sources ◮ How we generally do this: ◮ Probabilistic models built from language data ◮ P(“maison” → “house”) → high ◮ P(“L’avocat general” → “the general avocado”) → low 9/25

  12. Basic Text Processing Regular Expressions ◮ A formal language for specifying text strings. 10/25

  13. Basic Text Processing Regular Expressions ◮ A formal language for specifying text strings. ◮ How can we search for any of these? woodchuck woodchucks Woodchuck Woodchucks 10/25

  14. Regular Expressions: Disjunctions 11/25

  15. Regular Expressions: Negation in Disjunction ◮ Negations [ ∧ Ss ] ◮ Carat means negation only when first in [] 12/25

  16. Regular Expressions: More Disjunction ◮ Woodchucks is another name for groundhog! ◮ The pipe | for disjunction 13/25

  17. Regular Expressions: ? * + . 14/25

  18. Regular Expressions: Example Find all instances of the word “the” in a text 15/25

  19. Basic Text Processing Word tokenization Every NLP task needs to do text normalization: 1. Segmenting/tokenizing words in running text 2. Normalizing word formats 3. Segmenting sentences in running text 16/25

  20. How Many Words? 17/25

  21. Simple Tokenization in UNIX 18/25

  22. Basic Text Processing Normalization Every NLP task needs to do text normalization: 1. Segmenting/tokenizing words in running text 2. Normalizing word formats 3. Segmenting sentences in running text 19/25

  23. Issues in Tokenization ◮ Finland’s capital → Finland Finlands Finland’s ◮ what’re, I’m, isn’t → What are, I am, is not ◮ Hewlett-Packard → Hewlett Packard ◮ state-of-the-art → state of the art ◮ Lowercase → lower-case lowercase lower case ◮ San Francisco → one token or two? 20/25

  24. Issues in Tokenization ◮ Finland’s capital → Finland Finlands Finland’s ◮ what’re, I’m, isn’t → What are, I am, is not ◮ Hewlett-Packard → Hewlett Packard ◮ state-of-the-art → state of the art ◮ Lowercase → lower-case lowercase lower case ◮ San Francisco → one token or two? ◮ Language Issues : French, German, Japanese, Chinese,... 20/25

  25. Basic Text Processing Stemming Every NLP task needs to do text normalization: 1. Segmenting/tokenizing words in running text 2. Normalizing word formats 3. Segmenting sentences in running text 21/25

  26. Stemming ◮ Reduce terms to their stems in information retrieval ◮ Stemming is crude chopping of affixes language dependent ◮ Example: automate(s) , automatic , automation all reduced to automat . 22/25

  27. Porter’s Algorithm Most common English stemmer. 23/25

  28. Sentence Segmentation ◮ !, ? are relatively unambiguous 24/25

  29. Sentence Segmentation ◮ !, ? are relatively unambiguous ◮ Period “.” is quite ambiguous - Sentence boundary - Abbreviations like Inc. or Dr. - Numbers like .02 or 4.3 24/25

  30. Sentence Segmentation ◮ !, ? are relatively unambiguous ◮ Period “.” is quite ambiguous - Sentence boundary - Abbreviations like Inc. or Dr. - Numbers like .02 or 4.3 ◮ Build a binary classifier - Classifiers: hand-written rules, regular expressions, or machine-learning 24/25

  31. Determining if a word is end-of-sentence: a Decision Tree 25/25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend