csci 562 empirical methods in natural language processing
play

CSCI 562: EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING 25 Aug - PowerPoint PPT Presentation

CSCI 562: EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING 25 Aug 2009 WHAT WE WANT What weve got? A.L.I.C.E. WHAT WOULD YOU DO? Where is USC located? DATA 1980s: if you wanted a computer to know something, you had to program


  1. CSCI 562: EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING 25 Aug 2009

  2. WHAT WE WANT What we’ve got? A.L.I.C.E.

  3. WHAT WOULD YOU DO? Where is USC located?

  4. DATA • 1980s: if you wanted a computer to “know” something, you had to program it in. • Now: just about everything has been posted on the Web by someone, somewhere. But, it is all in natural language.

  5. BAG OF WORDS USC’s two primary campuses, both located in the heart of Los Angeles, welcome thousands of guests and visitors each year. The 226-acre University Park campus, home to the College of Letters, Arts and Sciences, the Graduate Where is USC located? School, and most of USC’s professional schools, is adjacent to Exposition Park with its world-class museums and recreational facilities. A few miles to the northeast is the 61-acre Health Sciences campus, home to the Keck School of Medicine of USC and the School of Pharmacy as well as three major teaching hospitals. is 2 is 1 located 1 located 1 two 1 usc 1 usc 3 where 0 where 1 …

  6. AMBIGUITY Where can a snow leopard be seen?

  7. GRAMMAR What animal does a frog eat?

  8. GRAMMAR SBARQ WHNP SINV What animal VBZ NP VP does a frog VB NP eat t

  9. MULTILINGUALITY Tell me about Tai Yen-Hui.

  10. STRUCTURE • All language has various levels of structure , more than just a bag of words • Virtually all Natural Language Processing tasks involve inferring structure from text or transforming one kind of structure into another

  11. CS 562 - Lecture 1, part II • More about ambiguities • Key problems to address in this class • grammar formalisms • search algorithms • learning methods • Learning Examples CS 562 - Intro (part 2) 11

  12. More about Ambiguities • to middle school kids: what does this sentence mean? I saw her duck. Aravind Joshi lexical ambiguity (word-sense) CS 562 - Intro (part 2) 12

  13. More about Ambiguities • to middle school kids: what does this sentence mean? I eat sushi with tuna. Aravind Joshi structural ambiguity (PP-attachment) CS 562 - Intro (part 2) 13

  14. More about Ambiguities • to middle school kids: what does this sentence mean? I eat sushi with tuna. Aravind Joshi lexical ambiguity (word-sense) CS 562 - Intro (part 2) 14

  15. More about Ambiguities • to middle school kids: what does this sentence mean? Everybody loves somebody. Aravind Joshi ??? structural ambiguity (quantifier scope) CS 562 - Intro (part 2) 15

  16. More about Ambiguities • to middle school kids: what does this sentence mean? Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo Aravind Joshi Dogs dogs dog dog dogs. Police police police police police http://www.cse.buffalo.edu/~rapaport/BuffaloBuffalo/buffalobuffalo.html CS 562 - Intro (part 2) 16

  17. Ambiguities in Translation zi zhu zhong duan 自 助 终 端 self help terminal device CS 562 - Intro (part 2) 17

  18. Ambiguities in Translation CS 562 - Intro (part 2) 18

  19. or even... clear evidence that NLP is used in real life! CS 562 - Intro (part 2) 19

  20. Ambiguity Explosion I saw her duck. ... • how about... • I saw her duck with a telescope. • I saw her duck with a telescope in the garden... CS 562 - Intro (part 2) 20

  21. Ambiguity Explosion • exponential explosion of the search space • Q1: how to represent ambiguities (compactly)? • Q2: how to search over this space (efficiently)? • Q3: how to rank different hypotheses? S NP VP PRP VBD NP PP I saw PRP$ NN IN NP her duck with DT NN .. a telescope CS 562 - Intro (part 2) 21

  22. Answers... • Q1: how to represent ambiguities? • context-free grammar (unit 2) S • finite-state automata (unit I) NP VP • Q2: how to search in this space? PRP VBD NP PP I saw PRP$ NN IN NP • dynamic programming (units 1&2) her duck with DT NN • Q3: how to rank these hypotheses? a telescop • weighted grammar (units 1-3) • weights learned from data • (saw, with, telescope) seen more often in texts CS 562 - Intro (part 2) 22

  23. Why Learning? • learning is better than hand-written rules, because: • less work; easily adapts to new languages/domains • Powerset (now bing.com): 15 years for English grammar! • now they are writing their Chinese grammar... • and languages constantly change! • learning can work, and often works better! • machine translation: used to be dominated by rule-based • now statistical methods are better: google vs. systran • google learns from the web, and translates 40+ langs [also CS 567, Machine Learning, Fall 2009] CS 562 - Intro (part 2) 23

  24. Example - Rosetta Stone • the most famous (tri-)parallel text • machines can do the same job! (if given parallel text) • UN/EU/Ca proceedings, News, tech docs, ... CS 562 - Intro (part 2) 24

  25. A sci - fi example ( Knight, 1997 ) Y our assignment: translate this Centauri sentence into Arcturan farok crrrok hihok yorok clok kantok ok - yurp

  26. farok crrrok hihok yorok clok kantok ok - yurp 1c. ok - voon ororok sprok . 7c. lalok farok ororok lalok sprok izok enemok . 1a. at - voon bichat dat . 7a. wat jjat bichat wat dat vat eneat . 2c. ok - drubel ok - voon anok plok sprok . 8c. lalok brok anok plok nok . 2a. at - drubel at - voon pippat rrat dat . 8a. iat lat pippat rrat nnat . 3c. erok sprok izok hihok ghirok . 9c. wiwok nok izok kantok ok - yurp . 3a. totat dat arrat vat hilat . 9a. totat nnat quat oloat at - yurp . 4c. ok - voon anok drok brok jok . 10c. lalok mok nok yorok ghirok clok . 4a. at - voon krat pippat sat lat . 10a. wat nnat gat mat bat hilat . 5c. wiwok farok izok stok . 11c. lalok nok crrrok hihok yorok zanzanok . 5a. totat jjat quat cat . 11a. wat nnat arrat mat zanzanat . 6c. lalok sprok izok jok stok . 12c. lalok rarok nok izok hihok mok . 6a. wat dat krat quat cat . 12a. wat nnat forat arrat vat gat . ( Knight,1997 )

  27. farok crrrok hihok yorok clok kantok ok - yurp 1c. ok - voon ororok sprok . 7c. lalok farok ororok lalok sprok izok enemok . 1a. at - voon bichat dat . 7a. wat jjat bichat wat dat vat eneat . 2c. ok - drubel ok - voon anok plok sprok . 8c. lalok brok anok plok nok . 2a. at - drubel at - voon pippat rrat dat . 8a. iat lat pippat rrat nnat . 3c. erok sprok izok hihok ghirok . 9c. wiwok nok izok kantok ok - yurp . 3a. totat dat arrat vat hilat . 9a. totat nnat quat oloat at - yurp . 4c. ok - voon anok drok brok jok . 10c. lalok mok nok yorok ghirok clok . 4a. at - voon krat pippat sat lat . 10a. wat nnat gat mat bat hilat . 5c. wiwok farok izok stok . 11c. lalok nok crrrok hihok yorok zanzanok . 5a. totat jjat quat cat . 11a. wat nnat arrat mat zanzanat . 6c. lalok sprok izok jok stok . 12c. lalok rarok nok izok hihok mok . 6a. wat dat krat quat cat . 12a. wat nnat forat arrat vat gat . ( Knight,1997 )

  28. farok crrrok hihok yorok clok kantok ok - yurp 1c. ok - voon ororok sprok . 7c. lalok farok ororok lalok sprok izok enemok . 1a. at - voon bichat dat . 7a. wat jjat bichat wat dat vat eneat . 2c. ok - drubel ok - voon anok plok sprok . 8c. lalok brok anok plok nok . 2a. at - drubel at - voon pippat rrat dat . 8a. iat lat pippat rrat nnat . 3c. erok sprok izok hihok ghirok . 9c. wiwok nok izok kantok ok - yurp . 3a. totat dat arrat vat hilat . 9a. totat nnat quat oloat at - yurp . 4c. ok - voon anok drok brok jok . 10c. lalok mok nok yorok ghirok clok . 4a. at - voon krat pippat sat lat . 10a. wat nnat gat mat bat hilat . 5c. wiwok farok izok stok . 11c. lalok nok crrrok hihok yorok zanzanok . 5a. totat jjat quat cat . 11a. wat nnat arrat mat zanzanat . 6c. lalok sprok izok jok stok . 12c. lalok rarok nok izok hihok mok . 6a. wat dat krat quat cat . 12a. wat nnat forat arrat vat gat . ( Knight,1997 )

  29. farok crrrok hihok yorok clok kantok ok - yurp 1c. ok - voon ororok sprok . 7c. lalok farok ororok lalok sprok izok enemok . 1a. at - voon bichat dat . 7a. wat jjat bichat wat dat vat eneat . 2c. ok - drubel ok - voon anok plok sprok . 8c. lalok brok anok plok nok . 2a. at - drubel at - voon pippat rrat dat . 8a. iat lat pippat rrat nnat . 3c. erok sprok izok hihok ghirok . 9c. wiwok nok izok kantok ok - yurp . 3a. totat dat arrat vat hilat . 9a. totat nnat quat oloat at - yurp . 4c. ok - voon anok drok brok jok . 10c. lalok mok nok yorok ghirok clok . 4a. at - voon krat pippat sat lat . 10a. wat nnat gat mat bat hilat . 5c. wiwok farok izok stok . 11c. lalok nok crrrok hihok yorok zanzanok . 5a. totat jjat quat cat . 11a. wat nnat arrat mat zanzanat . 6c. lalok sprok izok jok stok . 12c. lalok rarok nok izok hihok mok . 6a. wat dat krat quat cat . 12a. wat nnat forat arrat vat gat . ( Knight,1997 )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend