Natural Language Processing George Konidaris gdk@cs.brown.edu - - PowerPoint PPT Presentation

natural language processing
SMART_READER_LITE
LIVE PREVIEW

Natural Language Processing George Konidaris gdk@cs.brown.edu - - PowerPoint PPT Presentation

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans use language to communicate.


slide-1
SLIDE 1

Natural Language Processing

George Konidaris gdk@cs.brown.edu

Fall 2019

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4

Natural Language Processing

Understanding spoken/written sentences in a natural language. Major area of research in AI. Why?

  • Humans use language to communicate.
  • Most natural interface.
  • Huge amounts of NLP “knowledge” around.
  • E.g., books, the entire internet.
  • Generative power
  • Key to intelligence?
  • Hints as to underlying mechanism
  • Key indicator of intelligence
slide-5
SLIDE 5

Natural Language Processing

It is also incredibly hard. Why? I saw a bat. Lucy owns a parrot that is larger than a cat. John kissed his wife, and so did Sam. Mary invited Sue for a visit, but she told her she had to go to work. I went to the hospital, and they told me to go home and rest. The price of tomatoes in Des Moines has gone through the roof. Mozart was born in Salzburg and Beethoven, in Bonn.

(examples via Ernest Davis, NYU)

slide-6
SLIDE 6

Natural Language Processing

“If you are a fan of the justices who fought throughout the Rehnquist years to pull the Supreme Court to the right, Alito is a home run - a strong and consistent conservative with the skill to craft opinions that make radical results appear inevitable and the ability to build trusting professional relationships across ideological lines.” (TNR, Nov. 2005)

(examples via Ernest Davis, NYU)

slide-7
SLIDE 7

Component Problems

“the cat sat on the mat” perception

S NP VP Article Noun VP PP Prep NP Article Noun Verb

  • n

sat cat mat the the

syntactic analysis

SatOn(x = Cat, y = Mat)

semantic analysis Cat? Mat? disambiguation

SatOn(cat3, mat16)

incorporation

slide-8
SLIDE 8

Perception

“The cat sat on the mat.”

slide-9
SLIDE 9

Major Challenges

Speaker accent, volume, tone. No pauses - word boundaries? Noise. Variation.

slide-10
SLIDE 10

Speech Recognition

th ah ca t

slide-11
SLIDE 11

Speech Recognition Using HMMs

St St+1 Ot Ot+1

Must store:

  • P(O | S)
  • P(St+1 | St)

transition model

  • bservation

model

  • prob. of observed

audio given phoneme

  • prob. of one phoneme

following another

slide-12
SLIDE 12

Issues

Phoneme sequence not Markov

  • Must introduce memory for context
  • k-Markov Models

People speak faster or slower

  • “Window” does not have fixed length
  • Dynamic Time Warping

Quite a simplistic model for a complex phenomenon.

Nevertheless, speech recognition tech based

  • n HMMs commercially-viable mid-1990s.
slide-13
SLIDE 13

Speech Recognition with Deep Nets

Mid-to-late 2000s: replace HMM with Deep Net.

x1 x2 h11 h12 h13

  • 1
  • 2

hn1 hn2 hn3

…. th ah ca 0.1 0.3 0.1 …

slide-14
SLIDE 14

Speech Recognition with Deep Nets

How to deal with dependency on prior states and

  • bservations?

x1 x2 h1 h2 h3

  • 1
  • 2

Recurrent nets: form of memory.

slide-15
SLIDE 15

Component Problems

“the cat sat on the mat” perception

S NP VP Article Noun VP PP Prep NP Article Noun Verb

  • n

sat cat mat the the

syntactic analysis

SatOn(x = Cat, y = Mat)

semantic analysis Cat? Mat? disambiguation

SatOn(cat3, mat16)

incorporation

slide-16
SLIDE 16

Syntactic Analysis

Syntax: characteristic of language.

  • Structure.
  • Composition.

But observed in linear sequence.

S NP VP Article Noun VP PP Prep NP Article Noun Verb

  • n

sat cat mat the the

slide-17
SLIDE 17

Syntactic Analysis

How to describe this structure? Formal grammar.

  • Set of rules for generating sentences.
  • Varying power:
  • Recursively enumerable (equiv. Turing Machines)
  • Context-Sensitive
  • Context-Free
  • Regular

Each uses a set of rewrite rules to generate syntactically correct sentences.

Colorless green ideas sleep furiously.

slide-18
SLIDE 18

Formal Grammars

Two types of symbols:

  • Terminals (stop and output this)
  • Non-terminals (one is a start symbol)

Production (rewrite) rules that modify a string of symbols by matching expression on left, and replacing it with one on right. S → AB A → AA A → a B → BBB B → b ab aaaaaab abbb aabbbbb

slide-19
SLIDE 19

Context-Free Grammars

Rules must be of the form: where A is a single non-terminal and B is any sequence of terminals and non-terminal. Why is this called context-free? A → B

slide-20
SLIDE 20

Probabilistic CFGs

Attach a probability to each rewrite rule: Probabilities for the same left symbol sum to 1. Why do this?

A → B[0.3]

More vs. less likely sentences.

A → AA[0.6]

A → a[0.1]

Probability distribution over valid sentences.

slide-21
SLIDE 21

E0

Lexicon

(R&N)

slide-22
SLIDE 22

E0

(R&N)

Grammar

slide-23
SLIDE 23

S NP VP Article Noun VP PP Prep NP Article Noun Verb

  • n

sat cat mat the the

slide-24
SLIDE 24

Component Problems

“the cat sat on the mat” perception

S NP VP Article Noun VP PP Prep NP Article Noun Verb

  • n

sat cat mat the the

syntactic analysis

SatOn(x = Cat, y = Mat)

semantic analysis Cat? Mat? disambiguation

SatOn(cat3, mat16)

incorporation

slide-25
SLIDE 25

Semantic Analysis

Semantics: what the sentence actually means, eventually in terms of symbols available to the agent (e.g., a KB).

“the cat sat on the mat”

SatOn(x = Cat, y = Mat) SatOn(cat3, mat16)

slide-26
SLIDE 26

Semantic Analysis

Key idea: compositional semantics. The semantics of sentences are built out of the semantics of their constituent parts. “The cat sat on the mat.” Therefore there is a clear relationship between syntactic analysis and semantic analysis.

slide-27
SLIDE 27

Semantic Analysis

Useful step:

  • Probability of parse depends on words
  • Lexicalized PCFGs

V P(v) → V erb(v)NP(n)[P1(v, n)]

variables probability depends

  • n variable bindings

ate bandanna vs. ate banana

slide-28
SLIDE 28

Semantic Analysis

“John loves Mary” Desired output: Loves(John, Mary) Semantic parsing:

  • Exploit compositionality of parsing to build semantics.

(R&N)

slide-29
SLIDE 29

Semantic Analysis

S(Loves(John, Mary)) NP(John) VP(λx Loves(x, Mary)) Name(John) NP(Mary) Name(Mary) Verb(λy, λx Loves(x, y)) Mary loves John

λ-expression symbols in KB sentence to add to KB

slide-30
SLIDE 30

Machine Translation

Major goal of NLP research for decades.

Document in Russian Document in English

slide-31
SLIDE 31

Competing Approaches

Document in Russian Document in English Formal Language

slide-32
SLIDE 32

Competing Approaches

Document in Russian Document in English

slide-33
SLIDE 33

Google Translate

100 languages, 200 million people daily