Linguists get the abstraction, machines get the details Hal Daum - - PowerPoint PPT Presentation

linguists get the abstraction machines get the details
SMART_READER_LITE
LIVE PREVIEW

Linguists get the abstraction, machines get the details Hal Daum - - PowerPoint PPT Presentation

Hal Daum III (me@hal3.name) Linguists get the abstraction, machines get the details Hal Daum III Computer Science / Linguistics University of Maryland, College Park me@hal3.name Symbol Pushing Slide 1 Hal Daum III (me@hal3.name)


slide-1
SLIDE 1

Symbol Pushing Slide 1 Hal Daumé III (me@hal3.name)

Linguists get the abstraction, machines get the details

Hal Daumé III

Computer Science / Linguistics University of Maryland, College Park me@hal3.name

slide-2
SLIDE 2

Symbol Pushing Slide 2 Hal Daumé III (me@hal3.name)

NLP's use of linguists, a caricature

Linguists develop theory

Linguists richly annotate data (eg treebank)

NLP people train systems (eg parser)

Parser output fed into machine translation system

Machine translation system has no idea what the input symbols mean

NP, VP, VBD, .... might as well be X1, X2, X3, …

slide-3
SLIDE 3

Symbol Pushing Slide 3 Hal Daumé III (me@hal3.name)

Where does this model work?

Works when entire pipeline is learned from data

And we make no use of prior knowledge

Pretty much any other time

Where does this model not work?

slide-4
SLIDE 4

Symbol Pushing Slide 4 Hal Daumé III (me@hal3.name)

Inferring Tags from the Structure

➢ INPUT: ➢ OUTPUT: ➢ Baseline: ➢ Random guessing: 4% accuracy

The man ate a big sandwich D

N V D J N

slide-5
SLIDE 5

Symbol Pushing Slide 5 Hal Daumé III (me@hal3.name)

Sources of Knowledge

➢ Seeds (frequent words for each tag)

➢ N: membro, milhoes, obras ➢ D: as [the,2f] o [the,1m] os [the,2m] ➢ V: afector, gasta, juntar ➢ P: com, como, de, em

➢ Typological rules:

➢ Art ← Noun ➢ Prp → Noun

➢ Tag knowledge:

➢ Open class ➢ Closed class

slide-6
SLIDE 6

Symbol Pushing Slide 6 Hal Daumé III (me@hal3.name)

Preliminary Results

No Seeds Seeds

10 20 30 40 50 60 No O/C Open/Close d

slide-7
SLIDE 7

Symbol Pushing Slide 7 Hal Daumé III (me@hal3.name)

Preliminary Results: Open/Closed

No Rules Art<-N Prp->N Both 20 25 30 35 40 45 50 55 60 No Rules Art<-N Prp->N Both 20 25 30 35 40 45 50 55 60

NO SEEDS SEEDS

slide-8
SLIDE 8

Symbol Pushing Slide 8 Hal Daumé III (me@hal3.name)

I'd like NLP to use more linguistics, but...

Linguistic models are often developed without any reference to computation

Many NLP students do not learn (or appreciate) much beyond other than Syntax I

Linguistic theories seem to be good in the abstract, but (perhaps) not so much in the details