injecting linguistics into nlp by annotation
play

Injecting Linguistics into NLP by Annotation Eduard Hovy - PowerPoint PPT Presentation

Injecting Linguistics into NLP by Annotation Eduard Hovy Information Sciences Institute University of Southern California Lesson 1: Banko and Brill, HLT-01 Confusion set disambiguation task: {youre | your}, {to | too | two}, {its |


  1. Injecting Linguistics into NLP by Annotation Eduard Hovy Information Sciences Institute University of Southern California

  2. Lesson 1: Banko and Brill, HLT-01 • Confusion set disambiguation task: {you‘re | your}, {to | too | two}, {its | it‘s} • 5 Algorithms: ngram table, winnow, perceptron, transformation-based learning, decision trees • Training: 10 6  10 9 words • Lessons: – All methods improved to almost same point – Simple method can end above complex one – Don‘t waste your time with algorithms and optimization

  3. Lesson 1: Banko and Brill, HLT-01 • Confusion set disambiguation task: {you‘re | your}, {to | too | two}, {its | it‘s} • 5 Algorithms: ngram table, winnow, perceptron, transformation-based learning, decision trees • Training: 10 6  10 9 words You don‘t have to be smart, • Lessons: you just need enough training data – All methods improved to almost same point – Simple method can end above complex one – Don‘t waste your time with algorithms and optimization

  4. Lesson 2: Och, ACL-02 • Best MT system in world (Arabic  English, by BLEU and NIST, 2002 –2005): Och‘s work • Method: learn ngram correspondence patterns (alignment templates) using MaxEnt (log-linear translation model) and trained to maximize BLEU score w 4 w 3 w 1 w 2 w 3 w 4 w 5  w 1 w 2 w 3 w 4 w 2 w 1 w 1 w 2 w 3 w 4 w 5 • Approximately: EBMT + Viterbi search • Lesson: the more you store, the better your MT

  5. Lesson 2: Och, ACL-02 • Best MT system in world (Arabic  English, by BLEU and NIST, 2002 –2005): Och‘s work • Method: learn ngram correspondence patterns (alignment templates) using MaxEnt (log-linear translation model) and trained to maximize BLEU score You don‘t have to be smart, w 4 you just need enough storage w 3 w 1 w 2 w 3 w 4 w 5  w 1 w 2 w 3 w 4 w 2 w 1 w 1 w 2 w 3 w 4 w 5 • Approximately: EBMT + Viterbi search • Lesson: the more you store, the better your MT

  6. Lesson 3: Chiang et al., HLT-2009 • 11,001 New Features for Statistical MT. David Chiang, Kevin Knight, Wei Wang. 2009. Proc. NAACL HLT . Best paper award • Learn MT rules: NP-C(x0:NPB PP(IN(of x1:NPB)) < – > x1 de x0 • Several hundred count features of various kinds: reward rules seen more often; punish rules that partly overlap; punish rules that insert is, the , etc. into English … • 10,000 word context features: for each triple ( f; e; f +1 ), feature that counts the number of times that f is aligned to e and f +1 occurs to the right of f ; and similarly for triples ( f; e; f -1 ) with f -1 occurring to the left of f . Restrict words to the 100 most frequent in training data

  7. Lesson 3: Chiang et al., HLT-2009 • 11,001 New Features for Statistical MT. David Chiang, Kevin Knight, Wei Wang. 2009. Proc. NAACL HLT . Best paper award • Learn MT rules: NP-C(x0:NPB PP(IN(of x1:NPB)) < – > x1 de x0 • Several hundred count features of various kinds: reward rules seen more often; punish rules that partly overlap; punish rules that insert is, the , etc. into English … You don‘t have to know anything, • 10,000 word context features: for each triple ( f; e; f +1 ), feature that counts the number of times that f is aligned to e and f +1 occurs to you just need enough features the right of f ; and similarly for triples ( f; e; f -1 ) with f -1 occurring to the left of f . Restrict words to the 100 most frequent in training data

  8. Lesson 4: Fleischman and Hovy, ACL-03 • Text mining: classify locations and people from free text into fine-grain classes – Simple appositive IE patterns – 2+ mill examples, collapsed into 1 mill instances (avg: 2 mentions/instance, 40+ for George W. Bush) Performance on a Question • Test: QA on ―who is X?‖: Answ ering Task 50 45 – 100 questions from AskJeeves 40 % Correct 35 30 – System 1: Table of instances 25 20 – System 2: ISI‘s TextMap QA system 15 10 – Table system scored 25% better Partial Correct Incorrect State of the Art System Extraction System – Over half of questions that TextMap got wrong could have benefited from information in the concept-instance pairs – This method took 10 seconds, TextMap took ~9 hours

  9. Lesson 4: Fleischman and Hovy, ACL-03 • Text mining: classify locations and people from free text into fine-grain classes – Simple appositive IE patterns – 2+ mill examples, collapsed into 1 mill instances (avg: 2 mentions/instance, 40+ for George W. Bush) You don‘t have to reason, Performance on a Question • Test: QA on ―who is X?‖: Answ ering Task 50 you just need to collect the 45 – 100 questions from AskJeeves 40 % Correct 35 knowledge beforehand 30 – System 1: Table of instances 25 20 – System 2: ISI‘s TextMap QA system 15 10 – Table system scored 25% better Partial Correct Incorrect State of the Art System Extraction System – Over half of questions that TextMap got wrong could have benefited from information in the concept-instance pairs – This method took 10 seconds, TextMap took ~9 hours

  10. Four lessons • You don‘t have to be smart, you just need — the web has all you need enough training data • You don‘t have to be smart, you just need — memory gets cheaper enough memory • You don‘t have to be smart, you just need — computers get faster enough features • You don‘t have to be smart, you just need to collect the knowledge beforehand …we are moving to a new world: • Conclusion: NLP as table lookup

  11. So you may be happy with this, but I am not … I want to understand what‘s going on in language and thought • We have no theory of language or even of language processing in NLP • Our general approach is: – Goal: Transform notation 1 into notation 2 (maybe adding tags…) – Learn how to do this automatically – Design an algorithm to beat the other guy • How can one inject understanding?

  12. • Generally, to reduce the size of a transformation table / statistical model, you introduce a generalization step: – POS tags, syntactic trees, modality labels… • If you‘re smart, the theory behind the generalization actually ‗explains‘ or ‗captures‘ the phenomenon – Classes of the phenomenon + rules linking them • ‗Good‘ NLP can test the adequacy of a theory by determining the table reduction factor • How can you introduce the generalization info?

  13. Annotation! 1. Preparation – Which corpus? – Choose the corpus – Interface design issues – Build the interfaces 2. Instantiating the theory – How remain true to – Create the annotation choices theory? – Test-run them for stability 3. Annotation – How many annotators? – Annotate – Which procedure? – Reconcile among annotators 4. Validation – Which measures? – Measure inter-annotator agreement – Possibly adjust theory instantiation 5. Delivery – Wrap the result ‗annotation science‘

  14. The new NLP world • Fundamental methodological assumptions of NLP: – Old-style NLP: process is deterministic; manually written rules will exactly generate desired product – Statistical NLP: process is (somewhat) nondeterministic; probabilities predict likelihood of products – Underlying assumption: as long as annotator consistency can be achieved, there is systematicity, and systems will learn to find it • Theory creation (and testing!) through corpus annotation – But we (still) have to manually identify generalizations (= equivalence classes of individual instances of phenomena) to obtain expressive generality/power – This is the ‗theory‘ – (and we need to understand how to do annotation properly)

  15. Who are the people with the ‗theory‘? Not us! • Our ‗theory‘ of sentiment • Our ‗theory‘ of entailment • Our ‗theory‘ of MT • Our ‗theory‘ of IR • Our ‗theory‘ of QA • …

  16. A fruitful cycle Linguists, psycholinguists, cognitive linguists… Analysis, theorizing, annotation annotated problems: low corpus performance evaluation Storage in Machine large tables, learning of automated optimization transformations creation method NLP companies Current NLP researchers • Each one influences the others • Different people like different work

  17. Toward a theory of NLP? • Basic tenets: 1. NLP is notation transformation 2. There exists a natural and optimal set of transformation steps , each involving a dedicated and distinct representation • Problem: syntax-semantics and semantics-pragmatics interfaces 3. Each rep. is based on a suitable (family of) theories in linguistics, philosophy, rhetorics, social interaction studies, etc. • Problem: which theory/ies? Why? 4. Except for a few circumscribed phenomena (morphology, number expressions, etc.), the phenomena being represented are too complex and interrelated for human-built rules to handle them well • Puzzle: but they can (usually) be annotated in corpora: why? 5. A set of machine learning algorithms and a set of features can be used to learn the transformations from suitably annotated corpora • Problem: which algorithms and features? Why? • Observation: We (almost) completely lack the theoretical framework to describe and measure the informational content and complexity of the representation levels we use — a challenge for the future

  18. The face of NLP tomorrow Three (and a Half) Trends — The Near Future of NLP: 1. Machine learning transformations 2. Analysis and corpus construction 3. Table construction and use 4. Evaluation frameworks Who are you ???

  19. Thank you!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend