NLP IR University of Maryland Wednesday, September 2, 2009 CLIP - PDF document

About Me CMSC 723: Computational Linguistics I ― Session #1 Introduction to NLP Jimmy Lin The iSchool NLP IR University of Maryland Wednesday, September 2, 2009 CLIP Teaching Assistant: Melissa Egan About You (pre-requisites) Administrivia � Must be interested in NLP � Text: � Speech and Language Processing: An Introduction to Natural � Must have strong computational background Language Processing, Speech Recognition, and Computational Linguistics, second edition, Daniel Jurafsky and James H. Martin � Must be a competent programmer (2008) � Do not need to have a background in linguistics � Course webpage: � http://www.umiacs.umd.edu/~jimmylin/CMSC723-2009-Fall/ � Class: � Wednesdays, 4 to 6:30pm (CSI 2107) � Two blocks, 5-10 min break in between Course Grade Out-of-Class Support � Exams: 50% � Office hours: by appointment � Class Assignments: 45% � Course mailing list: umd-cmsc723-fall-2009@googlegroups.com � Assignment 1 “warm up”: 5% � Assignments 2-5: 10% each � Class participation: 5% � Showing up for class, demonstrating preparedness, and contributing to class discussions � Policy for late and incomplete work, etc. 1

What is Computational Linguistics? � Study of computer processing of natural languages � Interdisciplinary field � Roots in linguistics and computer science (specifically, AI) � Influenced by electrical engineering, cognitive science, psychology, and other fields Let s get started! Let’s get started! � Dominated today by machine learning and statistics Dominated today by machine learning and statistics � Goes by various names � Computational linguistics � Natural language processing � Speech/language/text processing � Human language technology/technologies Where does NLP fit in CS? Science vs. Engineering � What is the goal of this endeavor? Computer Science � Understanding the phenomenon of human language � Building a better applications � Goals (usually) in tension Algorithms, Programming Systems, � Analogy: flight Theory Languages Networks … Human-Computer Artificial Databases Interaction Intelligence Machine … NLP Robotics Learning Rationalism vs. Empiricism Success Stories � Where does the source of knowledge reside? � “If it works, it’s not AI” � Chomsky’s poverty of stimulus argument � Speech recognition and synthesis � It’s an endless pendulum? � Information extraction � Automatic essay grading � Grammar checking G h ki � Machine translation 2

NLP “Layers” Speech Recognition � Conversion from raw waveforms into text � Involves lots of signal processing � “It’s hard to wreck a nice beach” Speech Morphological Semantic Parsing Recognition Analysis Analysis Reasoning, R i Planning Speech Morphological Syntactic Utterance Synthesis Realization Realization Planning Phonology Morphology Syntax Semantics Reasoning Source: Adapted from NLTK book, chapter 1 Optical Character Recognition What’s a w ord? � Conversion from raw pixels into text � Break up by spaces, right? � Involves a lot of image processing Ebay | Sells | Most | of | Skype | to | Private | Investors Swine | flu | isn’t | something | to | be | feared � What if the image is distorted, or the original text is in poor condition? � What about these? 达赖喇嘛在高雄为灾民祈福 ﺔﻄﻠﺴﻟا ﻰﻟإ ﻲﻓاﺬﻘﻟا لﻮﺻو ىﺮآذ ﻲﻴﺤﺗ ﺎﻴﺒﻴﻟ 百貨店、８月も不振大手５社の売り上げ８～１１％減 टाटा ने कहा , , घाटा पूरा करो Morphological Analysis Complex Morphology � Morpheme = smallest linguistic unit that has meaning � Turkish is an example of agglutinative language From the root “uyu-” (sleep), the following can be derived… � Inflectional uyuyorum I am sleeping uyuyorsun you are sleeping � duck + s = [ N duck] + [ plural s] uyuyor he/she/it is sleeping � duck + s = [ V duck] + [ 3rd person singular s] uyuyoruz we are sleeping uyuyorsunuz you are sleeping � Derivational uyuyorlar they are sleeping uyuduk we slept � organize, organization uyudukça as long as (somebody) sleeps uyumalıyız we must sleep � happy, happiness uyumadan without sleeping uyuman your sleeping uyurken while (somebody) is sleeping uyuyunca when (somebody) sleeps uyutmak to cause somebody to sleep uyutturmak to cause (somebody) to cause (another) to sleep uyutturtturmak to cause (somebody) to cause (some other) to cause (yet another) to sleep . . From Hakkani-Tür, Oflazer, Tür (2002) 3

What’s a phrase? Syntactic Analysis � Coherent group of words that serve some function � Parsing: the process of assigning syntactic structure � Organized around a central “head” � The head specifies the type of phrase � Examples: S � Noun phrase (NP): the happy camper NP VP � Verb phrase (VP): shot the bird � Verb phrase (VP): shot the bird N N � Prepositional phrase (PP): on the deck NP V N N det det N I saw the man I saw the man [ S [ NP I ] [ VP saw [ NP the man] ] ] Semantics Semantics: More Complexities � Different structures, same* meaning: � Scoping issues: � I saw the man. � Everyone on the island speaks two languages. � The man was seen by me. � Two languages are spoken by everyone on the island. � The man was who I saw. � Ultimately, what is meaning? � … � Simply pushing the problem onto different sets of SYMBOLS ? � Semantic representations attempt to abstract “meaning” p p g � First-order predicate logic: ∃ x, MAN (x) ∧ SEE (x, I) ∧ TENSE (past) � Semantic frames and roles: ( PREDICATE = see, EXPERIENCER = I, PATIENT = man) Lexical Semantics Pragmatics and World Know ledge � Any verb can add “able” to form an adjective. � Interpretation of sentences requires context, world knowledge, speaker intention/goals, etc. � I taught the class. The class is teachable. � I loved that bear. The bear is loveable. � Example 1: � I rejected the idea. The idea is rejectable. � Could you turn in your assignments now? (command) � Association of words with specific semantic forms � Could you finish the assignment? (question, command) � John: noun, masculine, proper � John: noun masculine proper � Example 2: E l 2 � the boys: noun, masculine, plural, human � I couldn’t decide how to catch the crook. Then I decided to spy on � load/smear verbs: specific restrictions on subjects and objects the crook with binoculars. � To my surprise, I found out he had them too. Then I knew to just follow the crook with binoculars. [ the crook [with binoculars]] vs. [the crook] [with binoculars] 4

Discourse Analysis Why is NLP hard? � Discourse: how multiple sentences fit together So easy… � Pronoun reference: � The professor told the student to finish the exam. He was pretty aggravated at how long it was taking him to complete it. � Multiple reference to same entity: � George Bush, Clinton � Inference and other relations between sentences: � The bomb exploded in front of the hotel. The fountain was destroyed, but the lobby was largely intact. At the w ord level � Part of speech � [V Duck]! � [N Duck] is delicious for dinner. � Word sense � I went to the bank to deposit my check. Ambiguity Ambiguity � I went to the bank to look out at the river � I went to the bank to look out at the river. � I went to the bank of windows and chose the one for “complaints”. At the syntactic level Difficult cases… � PP Attachment ambiguity � Requires world knowledge: � I saw the man on the hill with the telescope � The city council denied the demonstrators the permit because they advocated violence � Structural ambiguity � The city council denied the demonstrators the permit because they � I cooked her duck. feared violence � Visiting relatives can be annoying. � Requires context: � Time flies like an arrow. � Time flies like an arrow � John hit the man. He had stolen his bicycle. 5

So how do humans cope? So how do humans cope? Okay so how does NLP work? Okay, so how does NLP work? Goals for Practical Applications Rule-Based Approaches � Accurate; minimize errors (false positives/negatives) � Prevalent through the 80’s � Rationalism as the dominant approach � Maximize coverage � Manually-encoded rules for various aspects of NLP � Robust, degrades gracefully � E.g., swallow is a verb of ingestion, taking an animate subject and � Fast, scalable a physical object that is edible, … What’s the problem? More problems… � Rule engineering is time-consuming and error-prone � Systems became overly complex and difficult to debug � Natural language is full of exceptions � Unexpected interaction between rules � Rule engineering requires knowledge � Systems were brittle � Is this a bad thing? � Often broke on unexpected input (e.g., “The machine swallowed my change.” or “She swallowed my story.”) � Rule engineering is expensive � Systems were uninformed by prevalence of phenomena � Systems were uninformed by prevalence of phenomena � Experts cost a lot of money � Why WordNet thinks congress is a donkey… � Coverage is limited � Knowledge often limited to specific domains Problem isn’t with rule-based approaches per se, it’s with manual knowledge engineering… 6

NLP IR University of Maryland Wednesday, September 2, 2009 CLIP - PDF document

About Me CMSC 723: Computational Linguistics I Session #1 Introduction to NLP Jimmy Lin The iSchool NLP IR University of Maryland Wednesday, September 2, 2009 CLIP Teaching Assistant: Melissa Egan About You (pre-requisites)

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Facing NLP German Rigau i Claramunt http://adimen.si.ehu.es/~rigau IXA group Departamento de

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Prominent Research Directions in NLP Alexander Panchenko Assistant Professor for NLP About

Natural Language Processing (NLP) In 11-711 Algorithms for NLP we take an

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

Capsule Networks for NLP Will Merrill Advanced NLP 10/25/18 Capsule Networks: A Better ConvNet

Introduction to NLP and NLG Introduction to NLP Rules or Statistics?? Lexical Analysis,

Neuro Linguistic Programming (NLP) and Health Promoting Schools By Monique Veza HPS Advisor

The Outline: Be How to Be a were baptized (41b) Baptized First Responder to Be

By Paul Lamey Review 1. True disciples must remove ungodly expectations (vv. 20 27) Review 1.

An African Perspective on Sustainable ICT: From Ethics to Policy Thierry Ngosso

Intelligent Agents Chapter 2 Chapter 2 1 Outline Agents and environments Rationality

Executive Development Lecture Series - Leadership Scenarios. Updated 2018 (Presentation Slides)

Status of Re-entrant BPM R&D for ILC Main Linac H. Hayano KEK, Tsukuba, Ibaraki, Japan A.

Feature-based Grammar Ling 571 Deep Techniques for NLP February 2, 2001 Roadmap

Mapping between English Strings and Reentrant Semantic Graphs knight 3/12/15 Strings, Trees,

NLP IR University of Maryland Wednesday, September 2, 2009 CLIP - PDF document

About Me CMSC 723: Computational Linguistics I Session #1 Introduction to NLP Jimmy Lin The iSchool NLP IR University of Maryland Wednesday, September 2, 2009 CLIP Teaching Assistant: Melissa Egan About You (pre-requisites)

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Facing NLP German Rigau i Claramunt http://adimen.si.ehu.es/~rigau IXA group Departamento de

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Prominent Research Directions in NLP Alexander Panchenko Assistant Professor for NLP About

Natural Language Processing (NLP) In 11-711 Algorithms for NLP we take an

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

Capsule Networks for NLP Will Merrill Advanced NLP 10/25/18 Capsule Networks: A Better ConvNet

Introduction to NLP and NLG Introduction to NLP Rules or Statistics?? Lexical Analysis,

Neuro Linguistic Programming (NLP) and Health Promoting Schools By Monique Veza HPS Advisor

The Outline: Be How to Be a were baptized (41b) Baptized First Responder to Be

By Paul Lamey Review 1. True disciples must remove ungodly expectations (vv. 20 27) Review 1.

An African Perspective on Sustainable ICT: From Ethics to Policy Thierry Ngosso

Intelligent Agents Chapter 2 Chapter 2 1 Outline Agents and environments Rationality

Executive Development Lecture Series - Leadership Scenarios. Updated 2018 (Presentation Slides)

Status of Re-entrant BPM R&amp;D for ILC Main Linac H. Hayano KEK, Tsukuba, Ibaraki, Japan A.

Feature-based Grammar Ling 571 Deep Techniques for NLP February 2, 2001 Roadmap

Mapping between English Strings and Reentrant Semantic Graphs knight 3/12/15 Strings, Trees,

Status of Re-entrant BPM R&D for ILC Main Linac H. Hayano KEK, Tsukuba, Ibaraki, Japan A.