Introduction to the Course
- Prof. Sameer Singh
CS 295: STATISTICAL NLP WINTER 2017
January 10, 2017
Based on slides from Nathan Schneider, Mohit Bansal, Sebastian Riedel, Yejin Choi, and everyone else they copied from.
Introduction to the Course Prof. Sameer Singh CS 295: STATISTICAL - - PowerPoint PPT Presentation
Introduction to the Course Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 January 10, 2017 Based on slides from Nathan Schneider, Mohit Bansal, Sebastian Riedel, Yejin Choi, and everyone else they copied from. About Me Academic
CS 295: STATISTICAL NLP WINTER 2017
January 10, 2017
Based on slides from Nathan Schneider, Mohit Bansal, Sebastian Riedel, Yejin Choi, and everyone else they copied from.
CS 295: STATISTICAL NLP (WINTER 2017) 2
Academic Positions
extraction, entity linking and disambiguation, joint modeling
matrix/tensor factorization, probabilistic graphical models Research Interests http://sameersingh.org sameer@uci.edu
CS 295: STATISTICAL NLP (WINTER 2017) 3
Introduction to NLP Course Information Upcoming deadlines
CS 295: STATISTICAL NLP (WINTER 2017) 4
Introduction to NLP Course Information Upcoming deadlines
NLP Unstructured Ambiguous Lots and lots of it! Humans can read them, but … very slowly … can’t remember all … can’t answer questions “Knowledge” Structured Precise, Actionable Specific to the task Computers can use … quickly answer questions … memory is not a problem … don’t get tired
CS 295: STATISTICAL NLP (WINTER 2017) 5
CS 295: STATISTICAL NLP (WINTER 2017) 6
CS 295: STATISTICAL NLP (WINTER 2017) 7
CS 295: STATISTICAL NLP (WINTER 2017) 8
CS 295: STATISTICAL NLP (WINTER 2017) 9
Question Answering (instead of search) Science, by reading papers for you News Summarization Law, by reading past cases for you Healthcare, by
Assistive Technologies (dialog systems) Computational Social Sciences Digital Humanities (historical texts)
CS 295: STATISTICAL NLP (WINTER 2017) 10
Human or Computer?
CS 295: STATISTICAL NLP (WINTER 2017) 11
WHY ISN’T NLP SOLVED YET?
CS 295: STATISTICAL NLP (WINTER 2017) 12
CS 295: STATISTICAL NLP (WINTER 2017) 13
Ambiguity Sparsity Variation
CS 295: STATISTICAL NLP (WINTER 2017) 14
Ambiguity Sparsity Variation
One tries to be as informative as one possibly can, and gives as much information as is needed, and no more.
Corollary: The more you know, the less you need. Computers “know” very little.
CS 295: STATISTICAL NLP (WINTER 2017) 15
Hershey’s Bars Protest
CS 295: STATISTICAL NLP (WINTER 2017) 16
He knows you like your mother.
CS 295: STATISTICAL NLP (WINTER 2017) 17
Stolen painting found by tree.
CS 295: STATISTICAL NLP (WINTER 2017) 18
How he got into my pajamas I'll never know.
One morning I shot an elephant in my pajamas.
CS 295: STATISTICAL NLP (WINTER 2017) 19
She saw the man with the telescope.
CS 295: STATISTICAL NLP (WINTER 2017) 20
CS 295: STATISTICAL NLP (WINTER 2017) 21
My girlfriend and I met my lawyer for a drink,
but she became ill and had to leave.
CS 295: STATISTICAL NLP (WINTER 2017) 22
The city councilmen refused the demonstrators a permit because they feared violence. The city councilmen refused the demonstrators a permit because they advocated violence. “Context” is important
CS 295: STATISTICAL NLP (WINTER 2017) 23
Winograd Schema: An Open Challenge for AI
CS 295: STATISTICAL NLP (WINTER 2017) 24
Types
Clinton, Adams
Dow Jones, Thomas Cook
Kingston Identities
Kevin Smith, Jamaica, Springfield
President, Obama, Chief, Bambam,…
“Context” is important
CS 295: STATISTICAL NLP (WINTER 2017) 25
CS 295: STATISTICAL NLP (WINTER 2017) 26
Not easy even for humans
CS 295: STATISTICAL NLP (WINTER 2017) 27
Ambiguity Sparsity Variation
the
to and cornflakes mathematician s fuzziness jumbling pseudo-rapporteur lobby-ridden perfunctorily Lycketoft UNCITRAL H-0695 policyfor Commissioneris >1/3
CS 295: STATISTICAL NLP (WINTER 2017) 28
CS 295: STATISTICAL NLP (WINTER 2017) 29
CS 295: STATISTICAL NLP (WINTER 2017) 30
Zipf’s Law Regardless of the size of the data, there will be many rare words.
CS 295: STATISTICAL NLP (WINTER 2017) 31
In a document in which each character has been chosen randomly from a uniform distribution of all letters (plus a space character), the "words" follow the general trend of Zipf's. (Try it at home!)
CS 295: STATISTICAL NLP (WINTER 2017) 32
Ambiguity Sparsity Variation
CS 295: STATISTICAL NLP (WINTER 2017) 33
She gave the book to Tom vs. She gave Tom the book Some kids popped by vs. A few children visited Is that window still open? vs Please close the window
ikr smh he asked fir yo last name so he can add u on fb lolololtw Its vanished trees, the trees that had made way for Gatsby’s house, had once pandered in whispers to the last and greatest of all human dreams; for a transitory enchanted moment man must have held his breath in the presence of this continent, compelled into an aesthetic contemplation he neither understood nor desired, face to face for the last time in history with something commensurate to his capacity for wonder.
CS 295: STATISTICAL NLP (WINTER 2017) 34
HOW CAN WE GET COMPUTERS TO SOLVE THIS PROBLEM?
CS 295: STATISTICAL NLP (WINTER 2017) 35
CS 295: STATISTICAL NLP (WINTER 2017) 36
Sanders was born in Brooklyn, to Dorothy and Eli Sanders.
NNP VBD VBD IN NNP TO NNP CC NNP NNP
Sanders was born in Brooklyn, to Dorothy and Eli Sanders.
Person Location Person Person Bernie.. Bernie Sanders...
.. his mother .. his father Eli he the city Sentence Dependency Parsing, Part of speech tagging, Named entity recognition… Document Discourse analysis, Coreference, Sentiment analysis...
Bernie Sanders Eli Sanders Dorothy Sanders Brooklyn
birthplace childOf childOf spouse
Corpus Entity resolution, Entity linking, Relation extraction…
DIRECTLY USE LINGUISTICS Expensive, time-consuming... … but also, incomplete! MACHINE LEARNING! Automatically learn from data! … if the right data exists
“Every time I fire a linguist, my accuracy goes up.”
CS 295: STATISTICAL NLP (WINTER 2017) 37
From https://medium.com/@ageitgey/machine-learning-is-fun-part-5-language-translation-with-deep-learning-and-the-magic-of-sequences-2ace0acca0aa
CS 295: STATISTICAL NLP (WINTER 2017) 38
Step 1: Break into Chunks
CS 295: STATISTICAL NLP (WINTER 2017) 39
Step 2: Translations for each chunk
CS 295: STATISTICAL NLP (WINTER 2017) 40
Step 3: Generate all possible sequences In same order In different order Step 4: Find the most human sounding one 😖 😋
I want to go to the prettiest beach.
CS 295: STATISTICAL NLP (WINTER 2017) 41
Language to Knowledge
It’s quite difficult
Machine Learning!
CS 295: STATISTICAL NLP (WINTER 2017) 42
CS 295: STATISTICAL NLP (WINTER 2017) 43
Introduction to NLP Course Information Upcoming deadlines
CS 295: STATISTICAL NLP (WINTER 2017) 44
Meetings
Reader
Office Hours Course webpage: http://sameersingh.org/courses/statnlp/wi17/
CS 295: STATISTICAL NLP (WINTER 2017) 45
Basics of NLP
Critical Analysis
Research Projects
CS 295: STATISTICAL NLP (WINTER 2017) 46
Words and Representations
Language and Sequence Modeling
Sentence Structure Modeling
Applications and other topics
CS 295: STATISTICAL NLP (WINTER 2017) 47
CS 295: STATISTICAL NLP (WINTER 2017) 48
CS 295: STATISTICAL NLP (WINTER 2017) 49 Programming Homework 40% Course Project 30% Paper Summaries 15% Participation 15%
Late Submissions
Assignments
CS 295: STATISTICAL NLP (WINTER 2017) 50
4 Programming Assignments
Source Code (Python)
Writing Up (PDF)
CS 295: STATISTICAL NLP (WINTER 2017) 51
3 Paper Summaries
Summaries
Recent Conference Papers
CS 295: STATISTICAL NLP (WINTER 2017) 52
Groups for the Project
Scope of Work
Submit Four Reports
CS 295: STATISTICAL NLP (WINTER 2017) 53
Piazza participation
Class participation
CS 295: STATISTICAL NLP (WINTER 2017) 54
Introduction to NLP Course Information Upcoming deadlines
CS 295: STATISTICAL NLP (WINTER 2017) 55
Misc.
Homework
Project