Introduction to Artificial Intelligence Natural Language Processing - - PowerPoint PPT Presentation

introduction to artificial intelligence natural language
SMART_READER_LITE
LIVE PREVIEW

Introduction to Artificial Intelligence Natural Language Processing - - PowerPoint PPT Presentation

Introduction to Artificial Intelligence Natural Language Processing Janyl Jumadinova November 14, 2016 Credit: NLP Stanford Question Answering: IBMs Watson 2/25 Information Extraction 3/25 Sentiment Extraction 4/25 Source: Washington


slide-1
SLIDE 1

Introduction to Artificial Intelligence Natural Language Processing

Janyl Jumadinova November 14, 2016

Credit: NLP Stanford

slide-2
SLIDE 2

Question Answering: IBM’s Watson

2/25

slide-3
SLIDE 3

Information Extraction

3/25

slide-4
SLIDE 4

Sentiment Extraction

Source: Washington Post

4/25

slide-5
SLIDE 5

Machine Translation

5/25

slide-6
SLIDE 6

Language Technology

6/25

slide-7
SLIDE 7

Ambiguity makes NLP hard

7/25

slide-8
SLIDE 8

Ambiguity makes NLP hard

◮ Teacher Strikes Idle Kids ◮ Red Tape Holds Up New Bridges ◮ Juvenile Court to Try Shooting Defendant ◮ Local High School Dropouts Cut in Half 7/25

slide-9
SLIDE 9

Other NLP Difficulties

8/25

slide-10
SLIDE 10

Progress

◮ What tools do we need?

◮ Knowledge about language ◮ Knowledge about the world ◮ A way to combine knowledge sources

9/25

slide-11
SLIDE 11

Progress

◮ What tools do we need?

◮ Knowledge about language ◮ Knowledge about the world ◮ A way to combine knowledge sources

◮ How we generally do this:

◮ Probabilistic models built from language data ◮ P(“maison”→ “house”)→ high ◮ P(“L’avocat general”→ “the general avocado”)→ low

9/25

slide-12
SLIDE 12

Basic Text Processing

Regular Expressions

◮ A formal language for specifying text strings. 10/25

slide-13
SLIDE 13

Basic Text Processing

Regular Expressions

◮ A formal language for specifying text strings. ◮ How can we search for any of these?

woodchuck woodchucks Woodchuck Woodchucks

10/25

slide-14
SLIDE 14

Regular Expressions: Disjunctions

11/25

slide-15
SLIDE 15

Regular Expressions: Negation in Disjunction

◮ Negations [∧Ss] ◮ Carat means negation only when first in [] 12/25

slide-16
SLIDE 16

Regular Expressions: More Disjunction

◮ Woodchucks is another name for groundhog! ◮ The pipe| for disjunction 13/25

slide-17
SLIDE 17

Regular Expressions: ? * + .

14/25

slide-18
SLIDE 18

Regular Expressions: Example

Find all instances of the word “the” in a text

15/25

slide-19
SLIDE 19

Basic Text Processing

Word tokenization Every NLP task needs to do text normalization:

  • 1. Segmenting/tokenizing words in running text
  • 2. Normalizing word formats
  • 3. Segmenting sentences in running text

16/25

slide-20
SLIDE 20

How Many Words?

17/25

slide-21
SLIDE 21

Simple Tokenization in UNIX

18/25

slide-22
SLIDE 22

Basic Text Processing

Normalization Every NLP task needs to do text normalization:

  • 1. Segmenting/tokenizing words in running text
  • 2. Normalizing word formats
  • 3. Segmenting sentences in running text

19/25

slide-23
SLIDE 23

Issues in Tokenization

◮ Finland’s capital → Finland Finlands Finland’s ◮ what’re, I’m, isn’t → What are, I am, is not ◮ Hewlett-Packard → Hewlett Packard ◮ state-of-the-art → state of the art ◮ Lowercase → lower-case lowercase lower case ◮ San Francisco → one token or two? 20/25

slide-24
SLIDE 24

Issues in Tokenization

◮ Finland’s capital → Finland Finlands Finland’s ◮ what’re, I’m, isn’t → What are, I am, is not ◮ Hewlett-Packard → Hewlett Packard ◮ state-of-the-art → state of the art ◮ Lowercase → lower-case lowercase lower case ◮ San Francisco → one token or two? ◮ Language Issues: French, German, Japanese, Chinese,... 20/25

slide-25
SLIDE 25

Basic Text Processing

Stemming Every NLP task needs to do text normalization:

  • 1. Segmenting/tokenizing words in running text
  • 2. Normalizing word formats
  • 3. Segmenting sentences in running text

21/25

slide-26
SLIDE 26

Stemming

◮ Reduce terms to their stems in information retrieval ◮ Stemming is crude chopping of affixes language dependent ◮ Example: automate(s), automatic, automation all reduced

to automat.

22/25

slide-27
SLIDE 27

Porter’s Algorithm

Most common English stemmer.

23/25

slide-28
SLIDE 28

Sentence Segmentation

◮ !, ? are relatively unambiguous 24/25

slide-29
SLIDE 29

Sentence Segmentation

◮ !, ? are relatively unambiguous ◮ Period “.” is quite ambiguous

  • Sentence boundary
  • Abbreviations like Inc. or Dr.
  • Numbers like .02 or 4.3

24/25

slide-30
SLIDE 30

Sentence Segmentation

◮ !, ? are relatively unambiguous ◮ Period “.” is quite ambiguous

  • Sentence boundary
  • Abbreviations like Inc. or Dr.
  • Numbers like .02 or 4.3

◮ Build a binary classifier

  • Classifiers: hand-written rules, regular expressions, or

machine-learning

24/25

slide-31
SLIDE 31

Determining if a word is end-of-sentence: a Decision Tree

25/25