Injecting Linguistics into NLP by Annotation Eduard Hovy - - PowerPoint PPT Presentation
Injecting Linguistics into NLP by Annotation Eduard Hovy - - PowerPoint PPT Presentation
Injecting Linguistics into NLP by Annotation Eduard Hovy Information Sciences Institute University of Southern California Lesson 1: Banko and Brill, HLT-01 Confusion set disambiguation task: {youre | your}, {to | too | two}, {its |
SLIDE 1
SLIDE 2
Lesson 1: Banko and Brill, HLT-01
- Confusion set disambiguation task:
{you‘re | your}, {to | too | two}, {its | it‘s}
- 5 Algorithms: ngram table, winnow, perceptron,
transformation-based learning, decision trees
- Training: 106 109 words
- Lessons:
– All methods improved to almost same point – Simple method can end above complex one – Don‘t waste your time with algorithms and optimization
SLIDE 3
Lesson 1: Banko and Brill, HLT-01
- Confusion set disambiguation task:
{you‘re | your}, {to | too | two}, {its | it‘s}
- 5 Algorithms: ngram table, winnow, perceptron,
transformation-based learning, decision trees
- Training: 106 109 words
- Lessons:
– All methods improved to almost same point – Simple method can end above complex one – Don‘t waste your time with algorithms and optimization
You don‘t have to be smart, you just need enough training data
SLIDE 4
Lesson 2: Och, ACL-02
- Best MT system in world (ArabicEnglish, by
BLEU and NIST, 2002–2005): Och‘s work
- Method: learn ngram correspondence patterns
(alignment templates) using MaxEnt (log-linear translation model) and trained to maximize BLEU score
- Approximately: EBMT + Viterbi search
- Lesson: the more you store, the better your MT
w1 w2 w3 w4 w5 w4 w3 w2 w1
w1 w2 w3 w4 w5 w1 w2 w3 w4
SLIDE 5
Lesson 2: Och, ACL-02
- Best MT system in world (ArabicEnglish, by
BLEU and NIST, 2002–2005): Och‘s work
- Method: learn ngram correspondence patterns
(alignment templates) using MaxEnt (log-linear translation model) and trained to maximize BLEU score
- Approximately: EBMT + Viterbi search
- Lesson: the more you store, the better your MT
w1 w2 w3 w4 w5 w4 w3 w2 w1
w1 w2 w3 w4 w5 w1 w2 w3 w4
You don‘t have to be smart, you just need enough storage
SLIDE 6
Lesson 3: Chiang et al., HLT-2009
- 11,001 New Features for Statistical MT. David Chiang, Kevin
Knight, Wei Wang. 2009. Proc. NAACL HLT. Best paper award
- Learn MT rules: NP-C(x0:NPB PP(IN(of x1:NPB)) <–> x1 de x0
- Several hundred count features of various kinds: reward rules
seen more often; punish rules that partly overlap; punish rules that insert is, the, etc. into English …
- 10,000 word context features: for each triple (f; e; f+1), feature that
counts the number of times that f is aligned to e and f+1 occurs to the right of f ; and similarly for triples (f; e; f-1) with f-1
- ccurring to the
left of f . Restrict words to the 100 most frequent in training data
SLIDE 7
Lesson 3: Chiang et al., HLT-2009
- 11,001 New Features for Statistical MT. David Chiang, Kevin
Knight, Wei Wang. 2009. Proc. NAACL HLT. Best paper award
- Learn MT rules: NP-C(x0:NPB PP(IN(of x1:NPB)) <–> x1 de x0
- Several hundred count features of various kinds: reward rules
seen more often; punish rules that partly overlap; punish rules that insert is, the, etc. into English …
- 10,000 word context features: for each triple (f; e; f+1), feature that
counts the number of times that f is aligned to e and f+1 occurs to the right of f ; and similarly for triples (f; e; f-1) with f-1
- ccurring to the
left of f . Restrict words to the 100 most frequent in training data
You don‘t have to know anything, you just need enough features
SLIDE 8
Lesson 4: Fleischman and Hovy, ACL-03
- Text mining: classify locations and people from
free text into fine-grain classes
– Simple appositive IE patterns – 2+ mill examples, collapsed into 1 mill instances (avg: 2 mentions/instance, 40+ for George W. Bush)
- Test: QA on ―who is X?‖:
– 100 questions from AskJeeves – System 1: Table of instances – System 2: ISI‘s TextMap QA system – Table system scored 25% better – Over half of questions that TextMap got wrong could have benefited from information in the concept-instance pairs – This method took 10 seconds, TextMap took ~9 hours
Performance on a Question Answ ering Task 10 15 20 25 30 35 40 45 50
Partial Correct Incorrect
% Correct State of the Art System Extraction System
SLIDE 9
Lesson 4: Fleischman and Hovy, ACL-03
- Text mining: classify locations and people from
free text into fine-grain classes
– Simple appositive IE patterns – 2+ mill examples, collapsed into 1 mill instances (avg: 2 mentions/instance, 40+ for George W. Bush)
- Test: QA on ―who is X?‖:
– 100 questions from AskJeeves – System 1: Table of instances – System 2: ISI‘s TextMap QA system – Table system scored 25% better – Over half of questions that TextMap got wrong could have benefited from information in the concept-instance pairs – This method took 10 seconds, TextMap took ~9 hours
Performance on a Question Answ ering Task 10 15 20 25 30 35 40 45 50
Partial Correct Incorrect
% Correct State of the Art System Extraction System
You don‘t have to reason, you just need to collect the knowledge beforehand
SLIDE 10
Four lessons
- You don‘t have to be smart, you just need
enough training data
- You don‘t have to be smart, you just need
enough memory
- You don‘t have to be smart, you just need
enough features
- You don‘t have to be smart, you just need
to collect the knowledge beforehand
- Conclusion:
— the web has all you need — memory gets cheaper — computers get faster …we are moving to a new world: NLP as table lookup
SLIDE 11
So you may be happy with this,
but I am not … I want to understand what‘s going on in language and thought
- We have no theory of language or even of
language processing in NLP
- Our general approach is:
– Goal: Transform notation 1 into notation 2 (maybe adding tags…) – Learn how to do this automatically – Design an algorithm to beat the other guy
- How can one inject understanding?
SLIDE 12
- Generally, to reduce the size of a transformation
table / statistical model, you introduce a generalization step:
– POS tags, syntactic trees, modality labels…
- If you‘re smart, the theory behind the
generalization actually ‗explains‘ or ‗captures‘ the phenomenon
– Classes of the phenomenon + rules linking them
- ‗Good‘ NLP can test the adequacy of a theory by
determining the table reduction factor
- How can you introduce the generalization info?
SLIDE 13
Annotation!
- 1. Preparation
– Choose the corpus – Build the interfaces
- 2. Instantiating the theory
– Create the annotation choices – Test-run them for stability
- 3. Annotation
– Annotate – Reconcile among annotators
- 4. Validation
– Measure inter-annotator agreement – Possibly adjust theory instantiation
- 5. Delivery
– Wrap the result – Which corpus? – Interface design issues – How remain true to theory? – How many annotators? – Which procedure? – Which measures? ‗annotation science‘
SLIDE 14
The new NLP world
- Fundamental methodological assumptions of NLP:
– Old-style NLP: process is deterministic; manually written rules will exactly generate desired product – Statistical NLP: process is (somewhat) nondeterministic; probabilities predict likelihood of products – Underlying assumption: as long as annotator consistency can be achieved, there is systematicity, and systems will learn to find it
- Theory creation (and testing!) through corpus
annotation
– But we (still) have to manually identify generalizations (= equivalence classes of individual instances of phenomena) to
- btain expressive generality/power
– This is the ‗theory‘ – (and we need to understand how to do annotation properly)
SLIDE 15
Who are the people with the ‗theory‘?
Not us!
- Our ‗theory‘ of sentiment
- Our ‗theory‘ of entailment
- Our ‗theory‘ of MT
- Our ‗theory‘ of IR
- Our ‗theory‘ of QA
- …
SLIDE 16
A fruitful cycle
- Each one influences the others
- Different people like different work
Analysis, theorizing, annotation Machine learning of transformations Storage in large tables,
- ptimization
annotated corpus automated creation method problems: low performance evaluation
Linguists, psycholinguists, cognitive linguists… Current NLP researchers NLP companies
SLIDE 17
Toward a theory of NLP?
- Basic tenets:
- 1. NLP is notation transformation
- 2. There exists a natural and optimal set of transformation steps, each
involving a dedicated and distinct representation
- Problem: syntax-semantics and semantics-pragmatics interfaces
- 3. Each rep. is based on a suitable (family of) theories in linguistics,
philosophy, rhetorics, social interaction studies, etc.
- Problem: which theory/ies? Why?
- 4. Except for a few circumscribed phenomena (morphology, number
expressions, etc.), the phenomena being represented are too complex and interrelated for human-built rules to handle them well
- Puzzle: but they can (usually) be annotated in corpora: why?
- 5. A set of machine learning algorithms and a set of features can be
used to learn the transformations from suitably annotated corpora
- Problem: which algorithms and features? Why?
- Observation: We (almost) completely lack the theoretical framework
to describe and measure the informational content and complexity of the representation levels we use — a challenge for the future
SLIDE 18
The face of NLP tomorrow
Three (and a Half) Trends—The Near Future of NLP:
- 1. Machine learning transformations
- 2. Analysis and corpus construction
- 3. Table construction and use
- 4. Evaluation frameworks
Who are you???
SLIDE 19