Discovery of Linguistic Relations Using Lexical Attraction Deniz - - PDF document

discovery of linguistic relations using lexical
SMART_READER_LITE
LIVE PREVIEW

Discovery of Linguistic Relations Using Lexical Attraction Deniz - - PDF document

Discovery of Linguistic Relations Using Lexical Attraction Deniz Yuret Overview Motivation Demonstration Theory, Learning, Algorithm Evaluation Contributions Syntax and Semantics independently constrain linguistic relations


slide-1
SLIDE 1

Discovery of Linguistic Relations Using Lexical Attraction Deniz Yuret

slide-2
SLIDE 2

Overview

  • Motivation
  • Demonstration
  • Theory, Learning, Algorithm
  • Evaluation
  • Contributions
slide-3
SLIDE 3

Syntax and Semantics independently constrain linguistic relations

  • I saw the Statue of Liberty flying over New

York. – Lenat, 1984

  • I hit the boy with the girl with long hair

with a hammer with vengeance. – Schank, 1973

  • Colorless green ideas sleep furiously.

– Chomsky, 1956

slide-4
SLIDE 4

Contributions of this thesis

  • Opening a door for the use of common

sense knowledge in language processing and acquisition.

  • A learning paradigm that bootstraps by

interdigitating learning with processing.

slide-5
SLIDE 5

Bringing common sense into language

John eats ice−cream S O John ice−cream eat

slide-6
SLIDE 6

Bootstrapping by interdigitating learning and processing

P M

slide-7
SLIDE 7

Phrase structure versus dependency structure

The glorious sun will shine in the winter Determiner Adjective Noun NP NP2 Aux Verb VP VP2 Prep PP S Noun NP2 Determiner NP The glorious sun will shine in the winter

slide-8
SLIDE 8

Discovery of Linguistic Relations An Example Simple Sentence 1/5 (Before training)

* these people also want more government money for education . *

slide-9
SLIDE 9

Simple Sentence 2/5 (After 1000 words of training)

* these people also want more government money for education . *

slide-10
SLIDE 10

Simple Sentence 3/5 (After 10,000 words of training)

* these people also want more government money for education . *

slide-11
SLIDE 11

Simple Sentence 4/5 (After 100,000 words of training)

* these people also want more government money for education . *

slide-12
SLIDE 12

Simple Sentence 5/5 (After 1,000,000 words of training)

* these people also want more government money for education . *

slide-13
SLIDE 13

Bringing common sense into language The theory

John eats ice−cream S O John ice−cream eat

slide-14
SLIDE 14

A Theory of Syntactic Relations

  • Lexical attraction is the likelihood of a

syntactic relation

  • The context of a word is given by its syn-

tactic relations

  • Syntactic relations can be formalized as a

graph

  • Entropy is determined by syntactic rela-

tions

slide-15
SLIDE 15

H = −

  • pi log pi

The information content of a word:

The IRA is fighting British rule in Northern Ireland

4.20 15.85 7.33 13.27 12.38 13.20 5.80 12.60 14.65

Total: 99.28 bits

slide-16
SLIDE 16

The word pair and relative information:

Ireland 3.53 Northern Northern 1.48 Ireland Northern Ireland 12.60 12.60 14.65 14.65

slide-17
SLIDE 17

The lexical attraction link:

Ireland Northern 12.60 14.65 11.12

slide-18
SLIDE 18

Language Model Determines the Context

The IRA is fighting British rule in Northern Ireland

4.20 12.90 3.73 10.54 8.66 5.96 3.57 9.25 3.53 > > > > > > > >

Total: 99.28 → 62.34 bits

slide-19
SLIDE 19

Context should be determined by syntactic re- lations:

The man with the dog spoke

?

The man with the dog spoke

slide-20
SLIDE 20

Context should be determined by syntactic re- lations:

The IRA is fighting British rule in Northern Ireland

1.25 6.60 4.60 13.27 5.13 8.13 2.69 1.48 6.70 < < < > < > < <

Total: 62.34 → 49.85 bits

slide-21
SLIDE 21

Dependency structure is acyclic:

  • Mathematically: cannot use all the lexical

attraction links in a cycle.

  • Linguistically: cannot construct a consis-

tent head-modifier structure.

A B C

slide-22
SLIDE 22

Syntactic relations form a planar tree: (Links do not cross)

I met the woman in the red dress in the afternoon I met the woman in the afternoon in the red dress

?

slide-23
SLIDE 23

Syntactic relations form a planar tree: (Links do not cross)

  • Hays and Lecerf (1960) discovered that

(almost) all sentences in a language are planar.

  • Gaifman (1965) proved that a planar de-

pendency grammar can generate the same set of languages as a context free gram- mar.

  • Planar trees can be encoded with constant

number of bits per word.

slide-24
SLIDE 24

Cayley’s formula for counting trees: T(n) = nn−2 Planar trees are polynomial in n:

The IRA is fighting British rule in Northern Ireland

< < < > < > < <

Encoding: LPLLPPRLPRLPLPPP L:10 R:11 P:0 Upper bound: 3 bits per word

slide-25
SLIDE 25

Lexical attraction is symmetric

The IRA is fighting British rule The IRA is fighting British rule The IRA is fighting British rule

slide-26
SLIDE 26

Lexical attraction is symmetric S = (W, L, w0) W = { wi } L = { (wi, wj) } P(S) = P(L)P(w0)

  • (wi,wj)∈L

P(wj | wi) = P(L)P(w0)

  • (wi,wj)∈L

P(wi, wj) P(wi) = P(L)

  • wi∈W

P(wi)

  • (wi,wj)∈L

P(wi, wj) P(wi)P(wj)

slide-27
SLIDE 27

Dependency structure is an undirected, acyclic, planar graph:

The IRA is fighting British rule in Northern Ireland

4.20 15.85 7.33 13.27 12.38 13.20 5.80 12.60 14.65 2.95 9.25 2.73 5.07 7.25 7.95 3.11 11.12

slide-28
SLIDE 28

Information in a Sentence = Information in Words + Information in the Tree

  • Mutual Information in Syntactic Relations
slide-29
SLIDE 29

The Memory

P M

slide-30
SLIDE 30

The memory observes the processor

kick the ball now kick the ball now ball now the kick

slide-31
SLIDE 31

Learning simple structures

kick the ball now the ball ball the throw at with in kick the ball now

slide-32
SLIDE 32

Simple structures help see complex structures

kick the ball now kick ball now the kick the now ball

slide-33
SLIDE 33

Learning complex structures

kick the ball now kick ball now the kick the now ball kick the ball now

slide-34
SLIDE 34

The Processor

P M

slide-35
SLIDE 35
  • We need to discover the best linkage.

* these people also want more government money for education . *

slide-36
SLIDE 36
  • Words are read in left to right order.

* these

118

slide-37
SLIDE 37
  • New word considers links with previous

words.

* these people

118 348

slide-38
SLIDE 38
  • Cycles are not allowed.
  • Link with minimum score gets rejected.

* these people

118 348 55

slide-39
SLIDE 39
  • Link with negative value not accepted.

* these people also

118 348 −164

slide-40
SLIDE 40
  • Link crossing not allowed.
  • Link with minimum score gets eliminated.

* these people also want

118 348 178 143 315

slide-41
SLIDE 41

* these people also want

118 348 143 315 261

slide-42
SLIDE 42
  • The two constraints straighten out previ-
  • us mistakes by eliminating bad links.

* these people also want more government money

118 348 143 315 126 53 43 401

slide-43
SLIDE 43
  • Eliminating bad links 2/3

* these people also want more government money

118 348 143 315 126 43 401 209

slide-44
SLIDE 44
  • Eliminating bad links 3/3

* these people also want more government money

118 348 143 315 43 401 209 66

slide-45
SLIDE 45
  • New link can knock off old link in cycle.

* these people also want more government money for education

118 348 143 315 43 401 209 261 258 392

slide-46
SLIDE 46
  • The final result.

* these people also want more government money for education .

118 348 143 315 43 401 209 261 392 107

slide-47
SLIDE 47

Discovery of Linguistic Relations Using Lexical Attraction A demonstration

  • Long distance link
  • Complex noun phrase
  • Syntactic ambiguity
slide-48
SLIDE 48

Long Distance Link 1/3 (After 1,000 words of training)

* the cause of his death friday was not given . *

slide-49
SLIDE 49

Long Distance Link 2/3 (After 100,000 words of training)

* the cause of his death friday was not given . *

slide-50
SLIDE 50

Long Distance Link 3/3 (After 10,000,000 words of training)

* the cause of his death friday was not given . *

slide-51
SLIDE 51

Complex Noun Phrase 1/4 (After 10,000 words of training)

* the new york stock exchange composite index fell . *

slide-52
SLIDE 52

Complex Noun Phrase 2/4 (After 100,000 words of training)

* the new york stock exchange composite index fell . *

slide-53
SLIDE 53

Complex Noun Phrase 3/4 (After 1,000,000 words of training)

* the new york stock exchange composite index fell . *

slide-54
SLIDE 54

Complex Noun Phrase 4/4 (After 10,000,000 words of training)

* the new york stock exchange composite index fell . *

slide-55
SLIDE 55

Syntactic Ambiguity 1/3 (After 1,000,000 words of training)

* many people died in the clashes in the west in september . *

slide-56
SLIDE 56

Syntactic Ambiguity 1/3 (After 10,000,000 words of training)

* many people died in the clashes in the west in september . *

slide-57
SLIDE 57

Syntactic Ambiguity 2/3 (After 500,000 words of training)

* a number

  • f

people protested . * * the number

  • f

people increased . *

slide-58
SLIDE 58

Syntactic Ambiguity 2/3 (After 5,000,000 words of training)

* a number

  • f

people protested . * * the number

  • f

people increased . *

slide-59
SLIDE 59

Syntactic Ambiguity 3/3 (After 1,000,000 words of training)

* the driver saw the airplane flying over washington . * * the pilot saw the train flying over washington . *

slide-60
SLIDE 60

Syntactic Ambiguity 3/3 (After 10,000,000 words of training)

* the driver saw the airplane flying over washington . * * the pilot saw the train flying over washington . *

slide-61
SLIDE 61

Results

  • Evaluation criteria
  • Upper and lower bounds
  • Link accuracy
  • Related work
slide-62
SLIDE 62

Evaluation criteria: Content-word links

I saw the mountains flying over New York

? ?

People want more money for education

? ?

slide-63
SLIDE 63

Training

  • Up to 100 million words of Associated Press

material. Testing

  • 200 out-of-sample sentences.
  • Selected from 5000 word vocabulary (90%
  • f all the words seen in the corpus).
  • 3152 words (15.76 words per sentence).
  • Hand parsed with 1287 content-word links.
slide-64
SLIDE 64

Accuracy: n1 = human links n2 = program links n12 = common links

  • Precision = n12 / n2
  • Recall = n12 / n1
slide-65
SLIDE 65

Lower bound: Random lexical attraction → 8.9% precision, 5.4% recall Linking every adjacent word → 41% recall Upper bound: 85% of syntactically related pairs have posi- tive lexical attraction

slide-66
SLIDE 66

Recording adjacent pairs

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 10 100 1000 10000 100000 1e+06 1e+07 1e+08 Percentage Number of words trained Procedure 1: Recording adjacent pairs Precision Recall

Precision = 67% Recall = 41%

slide-67
SLIDE 67

Recording all pairs

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 10 100 1000 10000 100000 1e+06 1e+07 1e+08 Percentage Number of words trained Procedure 2: Recording all pairs Precision Recall

Precision = 55% Recall = 48%

slide-68
SLIDE 68

Using feedback from processor

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 10 100 1000 10000 100000 1e+06 1e+07 1e+08 Percentage Number of words trained Procedure 3: Recording pairs selected by processor Precision Recall

Precision = 62% Recall = 52%

slide-69
SLIDE 69

Related work

  • Magerman and Marcus, 1990
  • Lari and Young, 1990
  • Pereira and Schabes, 1992
  • Briscoe and Waegner, 1992
  • Carroll and Charniak, 1992
  • Stolcke, 1994
  • Chen, 1996
  • de Marcken, 1996
slide-70
SLIDE 70

de Marcken, 1995

S CP AP C A BP B S CP C A B AP BP AP => A BP BP => B CP => AP C AP => A BP => AP B CP => BP C

slide-71
SLIDE 71

Lessons learned

  • Training with words instead of parts of

speech enable the program to learn com- mon but idiosyncratic usages of words.

  • Not committing to early generalizations

prevent the program from making irrecov- erable mistakes early.

  • Using a representation that makes the rel-

evant features (such as syntactic relations) explicit simplifies learning.

slide-72
SLIDE 72

Contributions

  • Opening a door for common sense in lan-

guage

  • Bootstrapping from zero by interdigitat-

ing learning and processing

slide-73
SLIDE 73

Future Work

  • Second degree models
  • History mechanism
  • Categorization and generalization