Introduction to Artificial Intelligence Natural Language Processing
Janyl Jumadinova November 14, 2016
Credit: NLP Stanford
Introduction to Artificial Intelligence Natural Language Processing - - PowerPoint PPT Presentation
Introduction to Artificial Intelligence Natural Language Processing Janyl Jumadinova November 14, 2016 Credit: NLP Stanford Question Answering: IBMs Watson 2/25 Information Extraction 3/25 Sentiment Extraction 4/25 Source: Washington
Credit: NLP Stanford
2/25
3/25
Source: Washington Post
4/25
5/25
6/25
7/25
◮ Teacher Strikes Idle Kids ◮ Red Tape Holds Up New Bridges ◮ Juvenile Court to Try Shooting Defendant ◮ Local High School Dropouts Cut in Half 7/25
8/25
◮ What tools do we need?
◮ Knowledge about language ◮ Knowledge about the world ◮ A way to combine knowledge sources
9/25
◮ What tools do we need?
◮ Knowledge about language ◮ Knowledge about the world ◮ A way to combine knowledge sources
◮ How we generally do this:
◮ Probabilistic models built from language data ◮ P(“maison”→ “house”)→ high ◮ P(“L’avocat general”→ “the general avocado”)→ low
9/25
◮ A formal language for specifying text strings. 10/25
◮ A formal language for specifying text strings. ◮ How can we search for any of these?
10/25
11/25
◮ Negations [∧Ss] ◮ Carat means negation only when first in [] 12/25
◮ Woodchucks is another name for groundhog! ◮ The pipe| for disjunction 13/25
14/25
15/25
16/25
17/25
18/25
19/25
◮ Finland’s capital → Finland Finlands Finland’s ◮ what’re, I’m, isn’t → What are, I am, is not ◮ Hewlett-Packard → Hewlett Packard ◮ state-of-the-art → state of the art ◮ Lowercase → lower-case lowercase lower case ◮ San Francisco → one token or two? 20/25
◮ Finland’s capital → Finland Finlands Finland’s ◮ what’re, I’m, isn’t → What are, I am, is not ◮ Hewlett-Packard → Hewlett Packard ◮ state-of-the-art → state of the art ◮ Lowercase → lower-case lowercase lower case ◮ San Francisco → one token or two? ◮ Language Issues: French, German, Japanese, Chinese,... 20/25
21/25
◮ Reduce terms to their stems in information retrieval ◮ Stemming is crude chopping of affixes language dependent ◮ Example: automate(s), automatic, automation all reduced
22/25
23/25
◮ !, ? are relatively unambiguous 24/25
◮ !, ? are relatively unambiguous ◮ Period “.” is quite ambiguous
24/25
◮ !, ? are relatively unambiguous ◮ Period “.” is quite ambiguous
◮ Build a binary classifier
24/25
25/25