Injecting Linguistics into NLP by Annotation Eduard Hovy - - PowerPoint PPT Presentation

injecting linguistics into nlp by annotation
SMART_READER_LITE
LIVE PREVIEW

Injecting Linguistics into NLP by Annotation Eduard Hovy - - PowerPoint PPT Presentation

Injecting Linguistics into NLP by Annotation Eduard Hovy Information Sciences Institute University of Southern California Lesson 1: Banko and Brill, HLT-01 Confusion set disambiguation task: {youre | your}, {to | too | two}, {its |


slide-1
SLIDE 1

Injecting Linguistics into NLP by Annotation

Eduard Hovy

Information Sciences Institute University of Southern California

slide-2
SLIDE 2

Lesson 1: Banko and Brill, HLT-01

  • Confusion set disambiguation task:

{you‘re | your}, {to | too | two}, {its | it‘s}

  • 5 Algorithms: ngram table, winnow, perceptron,

transformation-based learning, decision trees

  • Training: 106  109 words
  • Lessons:

– All methods improved to almost same point – Simple method can end above complex one – Don‘t waste your time with algorithms and optimization

slide-3
SLIDE 3

Lesson 1: Banko and Brill, HLT-01

  • Confusion set disambiguation task:

{you‘re | your}, {to | too | two}, {its | it‘s}

  • 5 Algorithms: ngram table, winnow, perceptron,

transformation-based learning, decision trees

  • Training: 106  109 words
  • Lessons:

– All methods improved to almost same point – Simple method can end above complex one – Don‘t waste your time with algorithms and optimization

You don‘t have to be smart, you just need enough training data

slide-4
SLIDE 4

Lesson 2: Och, ACL-02

  • Best MT system in world (ArabicEnglish, by

BLEU and NIST, 2002–2005): Och‘s work

  • Method: learn ngram correspondence patterns

(alignment templates) using MaxEnt (log-linear translation model) and trained to maximize BLEU score

  • Approximately: EBMT + Viterbi search
  • Lesson: the more you store, the better your MT

w1 w2 w3 w4 w5 w4 w3 w2 w1

w1 w2 w3 w4 w5  w1 w2 w3 w4

slide-5
SLIDE 5

Lesson 2: Och, ACL-02

  • Best MT system in world (ArabicEnglish, by

BLEU and NIST, 2002–2005): Och‘s work

  • Method: learn ngram correspondence patterns

(alignment templates) using MaxEnt (log-linear translation model) and trained to maximize BLEU score

  • Approximately: EBMT + Viterbi search
  • Lesson: the more you store, the better your MT

w1 w2 w3 w4 w5 w4 w3 w2 w1

w1 w2 w3 w4 w5  w1 w2 w3 w4

You don‘t have to be smart, you just need enough storage

slide-6
SLIDE 6

Lesson 3: Chiang et al., HLT-2009

  • 11,001 New Features for Statistical MT. David Chiang, Kevin

Knight, Wei Wang. 2009. Proc. NAACL HLT. Best paper award

  • Learn MT rules: NP-C(x0:NPB PP(IN(of x1:NPB)) <–> x1 de x0
  • Several hundred count features of various kinds: reward rules

seen more often; punish rules that partly overlap; punish rules that insert is, the, etc. into English …

  • 10,000 word context features: for each triple (f; e; f+1), feature that

counts the number of times that f is aligned to e and f+1 occurs to the right of f ; and similarly for triples (f; e; f-1) with f-1

  • ccurring to the

left of f . Restrict words to the 100 most frequent in training data

slide-7
SLIDE 7

Lesson 3: Chiang et al., HLT-2009

  • 11,001 New Features for Statistical MT. David Chiang, Kevin

Knight, Wei Wang. 2009. Proc. NAACL HLT. Best paper award

  • Learn MT rules: NP-C(x0:NPB PP(IN(of x1:NPB)) <–> x1 de x0
  • Several hundred count features of various kinds: reward rules

seen more often; punish rules that partly overlap; punish rules that insert is, the, etc. into English …

  • 10,000 word context features: for each triple (f; e; f+1), feature that

counts the number of times that f is aligned to e and f+1 occurs to the right of f ; and similarly for triples (f; e; f-1) with f-1

  • ccurring to the

left of f . Restrict words to the 100 most frequent in training data

You don‘t have to know anything, you just need enough features

slide-8
SLIDE 8

Lesson 4: Fleischman and Hovy, ACL-03

  • Text mining: classify locations and people from

free text into fine-grain classes

– Simple appositive IE patterns – 2+ mill examples, collapsed into 1 mill instances (avg: 2 mentions/instance, 40+ for George W. Bush)

  • Test: QA on ―who is X?‖:

– 100 questions from AskJeeves – System 1: Table of instances – System 2: ISI‘s TextMap QA system – Table system scored 25% better – Over half of questions that TextMap got wrong could have benefited from information in the concept-instance pairs – This method took 10 seconds, TextMap took ~9 hours

Performance on a Question Answ ering Task 10 15 20 25 30 35 40 45 50

Partial Correct Incorrect

% Correct State of the Art System Extraction System

slide-9
SLIDE 9

Lesson 4: Fleischman and Hovy, ACL-03

  • Text mining: classify locations and people from

free text into fine-grain classes

– Simple appositive IE patterns – 2+ mill examples, collapsed into 1 mill instances (avg: 2 mentions/instance, 40+ for George W. Bush)

  • Test: QA on ―who is X?‖:

– 100 questions from AskJeeves – System 1: Table of instances – System 2: ISI‘s TextMap QA system – Table system scored 25% better – Over half of questions that TextMap got wrong could have benefited from information in the concept-instance pairs – This method took 10 seconds, TextMap took ~9 hours

Performance on a Question Answ ering Task 10 15 20 25 30 35 40 45 50

Partial Correct Incorrect

% Correct State of the Art System Extraction System

You don‘t have to reason, you just need to collect the knowledge beforehand

slide-10
SLIDE 10

Four lessons

  • You don‘t have to be smart, you just need

enough training data

  • You don‘t have to be smart, you just need

enough memory

  • You don‘t have to be smart, you just need

enough features

  • You don‘t have to be smart, you just need

to collect the knowledge beforehand

  • Conclusion:

— the web has all you need — memory gets cheaper — computers get faster …we are moving to a new world: NLP as table lookup

slide-11
SLIDE 11

So you may be happy with this,

but I am not … I want to understand what‘s going on in language and thought

  • We have no theory of language or even of

language processing in NLP

  • Our general approach is:

– Goal: Transform notation 1 into notation 2 (maybe adding tags…) – Learn how to do this automatically – Design an algorithm to beat the other guy

  • How can one inject understanding?
slide-12
SLIDE 12
  • Generally, to reduce the size of a transformation

table / statistical model, you introduce a generalization step:

– POS tags, syntactic trees, modality labels…

  • If you‘re smart, the theory behind the

generalization actually ‗explains‘ or ‗captures‘ the phenomenon

– Classes of the phenomenon + rules linking them

  • ‗Good‘ NLP can test the adequacy of a theory by

determining the table reduction factor

  • How can you introduce the generalization info?
slide-13
SLIDE 13

Annotation!

  • 1. Preparation

– Choose the corpus – Build the interfaces

  • 2. Instantiating the theory

– Create the annotation choices – Test-run them for stability

  • 3. Annotation

– Annotate – Reconcile among annotators

  • 4. Validation

– Measure inter-annotator agreement – Possibly adjust theory instantiation

  • 5. Delivery

– Wrap the result – Which corpus? – Interface design issues – How remain true to theory? – How many annotators? – Which procedure? – Which measures? ‗annotation science‘

slide-14
SLIDE 14

The new NLP world

  • Fundamental methodological assumptions of NLP:

– Old-style NLP: process is deterministic; manually written rules will exactly generate desired product – Statistical NLP: process is (somewhat) nondeterministic; probabilities predict likelihood of products – Underlying assumption: as long as annotator consistency can be achieved, there is systematicity, and systems will learn to find it

  • Theory creation (and testing!) through corpus

annotation

– But we (still) have to manually identify generalizations (= equivalence classes of individual instances of phenomena) to

  • btain expressive generality/power

– This is the ‗theory‘ – (and we need to understand how to do annotation properly)

slide-15
SLIDE 15

Who are the people with the ‗theory‘?

Not us!

  • Our ‗theory‘ of sentiment
  • Our ‗theory‘ of entailment
  • Our ‗theory‘ of MT
  • Our ‗theory‘ of IR
  • Our ‗theory‘ of QA
slide-16
SLIDE 16

A fruitful cycle

  • Each one influences the others
  • Different people like different work

Analysis, theorizing, annotation Machine learning of transformations Storage in large tables,

  • ptimization

annotated corpus automated creation method problems: low performance evaluation

Linguists, psycholinguists, cognitive linguists… Current NLP researchers NLP companies

slide-17
SLIDE 17

Toward a theory of NLP?

  • Basic tenets:
  • 1. NLP is notation transformation
  • 2. There exists a natural and optimal set of transformation steps, each

involving a dedicated and distinct representation

  • Problem: syntax-semantics and semantics-pragmatics interfaces
  • 3. Each rep. is based on a suitable (family of) theories in linguistics,

philosophy, rhetorics, social interaction studies, etc.

  • Problem: which theory/ies? Why?
  • 4. Except for a few circumscribed phenomena (morphology, number

expressions, etc.), the phenomena being represented are too complex and interrelated for human-built rules to handle them well

  • Puzzle: but they can (usually) be annotated in corpora: why?
  • 5. A set of machine learning algorithms and a set of features can be

used to learn the transformations from suitably annotated corpora

  • Problem: which algorithms and features? Why?
  • Observation: We (almost) completely lack the theoretical framework

to describe and measure the informational content and complexity of the representation levels we use — a challenge for the future

slide-18
SLIDE 18

The face of NLP tomorrow

Three (and a Half) Trends—The Near Future of NLP:

  • 1. Machine learning transformations
  • 2. Analysis and corpus construction
  • 3. Table construction and use
  • 4. Evaluation frameworks

Who are you???

slide-19
SLIDE 19

Thank you!