Natural Language Processing Lecture 132/26/2015 Martha Palmer - - PowerPoint PPT Presentation

natural language processing
SMART_READER_LITE
LIVE PREVIEW

Natural Language Processing Lecture 132/26/2015 Martha Palmer - - PowerPoint PPT Presentation

Natural Language Processing Lecture 132/26/2015 Martha Palmer Today Start on Parsing Top-down vs. Bottom-up Speech and Language Processing - Jurafsky and Martin 2/26/15 2 Summary Context-free grammars can be used to model


slide-1
SLIDE 1

Natural Language Processing

Lecture 13—2/26/2015 Martha Palmer

slide-2
SLIDE 2

2/26/15

Speech and Language Processing - Jurafsky and Martin

2

Today

Start on Parsing

Top-down vs. Bottom-up

slide-3
SLIDE 3

2/26/15

Speech and Language Processing - Jurafsky and Martin

3

Summary

Context-free grammars can be used to model various facts about the syntax of a language. When paired with parsers, such grammars consititute a critical component in many applications. Constituency is a key phenomena easily captured with CFG rules.

But agreement and subcategorization do pose significant problems

Treebanks pair sentences in a corpus with their corresponding trees.

slide-4
SLIDE 4

2/26/15

Speech and Language Processing - Jurafsky and Martin

4

Parsing

Parsing with CFGs refers to the task of assigning proper trees to input strings Proper here means a tree that covers all and only the elements of the input and has an S at the top It doesn’t actually mean that the system can select the correct tree from among all the possible trees

slide-5
SLIDE 5

Automatic Syntactic Parse

slide-6
SLIDE 6

2/26/15

Speech and Language Processing - Jurafsky and Martin

6

For Now

Assume…

You have all the words already in some buffer The input is not POS tagged prior to parsing We won’t worry about morphological analysis All the words are known These are all problematic in various ways, and would have to be addressed in real applications.

slide-7
SLIDE 7

2/26/15

Speech and Language Processing - Jurafsky and Martin

7

Top-Down Search

Since we’re trying to find trees rooted with an S (Sentences), why not start with the rules that give us an S. Then we can work our way down from there to the words.

slide-8
SLIDE 8

2/26/15

Speech and Language Processing - Jurafsky and Martin

8

Top Down Space

slide-9
SLIDE 9

2/26/15

Speech and Language Processing - Jurafsky and Martin

9

Bottom-Up Parsing

Of course, we also want trees that cover the input words. So we might also start with trees that link up with the words in the right way. Then work your way up from there to larger and larger trees.

slide-10
SLIDE 10

2/26/15

Speech and Language Processing - Jurafsky and Martin

10

Bottom-Up Search

slide-11
SLIDE 11

2/26/15

Speech and Language Processing - Jurafsky and Martin

11

Bottom-Up Search

slide-12
SLIDE 12

2/26/15

Speech and Language Processing - Jurafsky and Martin

12

Bottom-Up Search

slide-13
SLIDE 13

2/26/15

Speech and Language Processing - Jurafsky and Martin

13

Bottom-Up Search

slide-14
SLIDE 14

2/26/15

Speech and Language Processing - Jurafsky and Martin

14

Bottom-Up Search

slide-15
SLIDE 15

2/26/15

Speech and Language Processing - Jurafsky and Martin

15

Control

Of course, in both cases we left out how to keep track of the search space and how to make choices

Which node to try to expand next Which grammar rule to use to expand a node

One approach is called backtracking.

Make a choice, if it works out then fine If not then back up and make a different choice

Same as with ND-Recognize

slide-16
SLIDE 16

2/26/15

Speech and Language Processing - Jurafsky and Martin

16

Problems

Even with the best filtering, backtracking methods are doomed because of two inter-related problems

Ambiguity and search control (choice) Shared subproblems

slide-17
SLIDE 17

2/26/15

Speech and Language Processing - Jurafsky and Martin

17

Ambiguity

slide-18
SLIDE 18

Structural Ambiguities

Its very important to separate PP’s that are part of the verb subcategorization frame from PP’s that modify the entire event.

The man saw the woman on the hill with the telescope. The man saw the woman on the hill with the telescope.

18

Woman has telescope Man has telescope

slide-19
SLIDE 19

2/26/15

Speech and Language Processing - Jurafsky and Martin

19

Shared Sub-Problems

No matter what kind of search (top-down

  • r bottom-up or mixed) that we choose...

We can’t afford to redo work we’ve already done. Without some help naïve backtracking will lead to such duplicated work.

slide-20
SLIDE 20

2/26/15

Speech and Language Processing - Jurafsky and Martin

20

Sample L1 Grammar

slide-21
SLIDE 21

CSE391 – 2005

NLP

21

State space representations: Recursive transition nets

s :- np,vp. np:- pronoun; noun; det,adj, noun; np,pp.

S1 S2 S3 NP VP

S

S4 S5 S6 det noun

NP

noun pronoun pp adj NP

slide-22
SLIDE 22

CSE391 – 2005

NLP

22

State space representations: Recursive transition nets, cont.

VP:- VP, PP. VP:- V; V,NP; V,NP,NP; V,NP,PP.

S13 S14 VP PP

VP

S7 S8 S12 V NP

VP

V PP aux V S10 S11 NP NP S9 NP

slide-23
SLIDE 23

Parses

23

VP S NP The cat sat on the mat S1: S → NP, VP

S1

slide-24
SLIDE 24

Parses

24

VP S NP the cat Det N The cat sat on the mat S1: S → NP, VP S2: NP → Det, N

S1 S2

slide-25
SLIDE 25

Parses

25

V VP S NP the sat cat Det N The cat sat on the mat S1: S → NP, VP S2: NP → Det, N S3: VP → V

S1 S2 S3

slide-26
SLIDE 26

Parses

26

V PP VP S NP the the mat sat cat

  • n

NP Prep Det N Det N The cat sat on the mat S1: S → NP, VP S2: NP → Det, N S3: VP → V S4: VP → VP, PP

S1 S2 S3 S4

slide-27
SLIDE 27

Multiple parses for a single sentence

NLP 27

V PP VP S NP time an arrow flies like NP Prep Time flies like an arrow. N Det N

slide-28
SLIDE 28

Multiple Parses for a single sentence

NLP 28

V NP VP S NP flies like an N Det Time flies like an arrow. N time arrow N

slide-29
SLIDE 29

CSE391 – 2005

NLP

29

Lexicon

noun(cat). noun(mat). det(the). det(a). verb(sat). prep(on). noun(flies). noun(time). noun(arrow). det(an). verb(flies). verb(time). prep(like).

slide-30
SLIDE 30

CSE391 – 2005

NLP

30

Lexicon with Roots

noun(cat,cat). noun(mat,mat). det(the,the) det(a,a). verb(sat,sit). prep(on,on). noun(flies,fly). noun(time,time). noun(arrow,arrow). det(an,an). verb(flies,fly). verb(time,time). prep(like,like).

slide-31
SLIDE 31

CSE391 – 2005

NLP

31

Parses

V VP S NP can water hold can NP aux The old can can hold the water. N the det N the det

  • ld

adj

slide-32
SLIDE 32

CSE391 – 2005

NLP

32

Structural ambiguities

That factory can can tuna. That factory cans cans of tuna and salmon.

slide-33
SLIDE 33

CSE391 – 2005

NLP

33

Lexicon The old can can hold the water.

Noun(can,can) Noun(cans,can) Noun(water,water) Noun(hold,hold) Noun(holds,hold) Det(the,the) Verb(hold,hold) Verb(holds,hold) Verb(can, can) Aux(can,can) Adj(old,old) Noun(old, old)

slide-34
SLIDE 34

Simple Context Free Grammar in BNF

S → NP VP NP → Pronoun | Noun | Det Adj Noun |NP PP PP → Prep NP V → Verb | Aux Verb VP → V | V NP | V NP NP | V NP PP | VP PP

NLP 34

slide-35
SLIDE 35

CSE391 – 2005

NLP

35

Top-down parse in progress

[The, old, can, can, hold, the, water]

S → NP VP NP → NP? NP → Pronoun? Pronoun? fail NP → Noun? Noun? fail NP → Det Adj Noun? Det? the ADJ?old Noun? Can

Succeed. Succeed. VP?

slide-36
SLIDE 36

CSE391 – 2005

NLP

36

Top-down parse in progress

[can, hold, the, water]

VP → VP? V → Verb? Verb? fail V → Aux Verb? Aux? can Verb? hold

succeed succeed

fail [the, water]

slide-37
SLIDE 37

CSE391 – 2005

NLP

37

Top-down parse in progress

[can, hold, the, water] VP → V NP? V → Verb? Verb? fail V → Aux Verb? Aux? can Verb? hold

NP → Pronoun?

Pronoun? fail NP → Noun? Noun? fail NP → Det Adj Noun? Det? the Noun? water

SUCCEED SUCCEED

slide-38
SLIDE 38

CSE391 – 2005

NLP

38

Top-down approach

Start with goal of sentence

S → NP VP S → Wh-word Aux NP VP

Will try to find an NP 4 different ways before trying a parse where the verb comes first. What would be better?

slide-39
SLIDE 39

CSE391 – 2005

NLP

39

Bottom-up approach

Start with words in sentence. What structures do they correspond to? Once a structure is built, kept on a CHART.

slide-40
SLIDE 40

40

Bottom-up parse in progress

det adj noun aux verb det noun. The old can can hold the water. det noun aux/verb noun/verb noun det noun.

slide-41
SLIDE 41

Bottom-up parse in progress

det adj noun aux verb det noun. The old can can hold the water. det noun aux/verb noun/verb noun det noun.

NP NP NP NP NP S S VP VP VP V V

slide-42
SLIDE 42

NLP

42

Bottom-up parse in progress – What is wrong w/ bottom parse?

det adj noun aux verb det noun. The old can can hold the water. det noun aux/verb noun/verb noun det noun/verb.

slide-43
SLIDE 43

NLP

43

Bottom-up parse, corrected

The old can can hold the water. det noun verb noun noun det noun/verb.

NP NP NP VP V S

slide-44
SLIDE 44

NLP

44

Headlines

Police Begin Campaign To Run Down Jaywalkers Iraqi Head Seeks Arms Teacher Strikes Idle Kids Miners Refuse To Work After Death Juvenile Court To Try Shooting Defendant

slide-45
SLIDE 45

NLP

45

Headlines

Drunk Gets Nine Months in Violin Case Enraged Cow Injures Farmer with Ax Hospitals are Sued by 7 Foot Doctors Milk Drinkers Turn to Powder Lung Cancer in Women Mushrooms

slide-46
SLIDE 46

NLP

46

Top-down vs. Bottom-up

Helps with POS ambiguities – only consider relevant POS Rebuilds the same structure repeatedly Spends a lot of time

  • n impossible parses

(trees that are not consistent with any of the words) Has to consider every POS Builds each structure

  • nce

Spends a lot of time on useless structures (trees that make no sense globally) What would be better?