Lecture 1: Introduction Kai-Wei Chang CS @ UCLA kw@kwchang.net - - PowerPoint PPT Presentation

lecture 1 introduction
SMART_READER_LITE
LIVE PREVIEW

Lecture 1: Introduction Kai-Wei Chang CS @ UCLA kw@kwchang.net - - PowerPoint PPT Presentation

Lecture 1: Introduction Kai-Wei Chang CS @ UCLA kw@kwchang.net Couse webpage: https://uclanlp.github.io/CS269-17/ ML in NLP 1 Announcements v Waiting list: Start attending the first few lectures as if you are registered. Given that some


slide-1
SLIDE 1

Lecture 1: Introduction

Kai-Wei Chang CS @ UCLA kw@kwchang.net Couse webpage: https://uclanlp.github.io/CS269-17/

1 ML in NLP

slide-2
SLIDE 2

Announcements

v Waiting list: Start attending the first few lectures as if you are registered. Given that some students will drop the class, some space will free up. v We will use Piazza as an online discussion

  • platform. Please sign up here:

piazza.com/ucla/fall2017/cs269

ML in NLP 2

slide-3
SLIDE 3

Staff

v Instructor: Kai-Wei Chang

v Email: ml17@kwchang.net v Office: BH 3732J v Office hour: 4:00 – 5:00, Tue (after class).

v TA: Md Rizwan Parvez

v Email: wua4nw@virginia.edu v Office: BH 3809 v Office hour: 12:00 – 2:00, Wed

3 ML in NLP

slide-4
SLIDE 4

This lecture

v Course Overview

v What is NLP? Why it is important? v What types of ML methods used in NLP? v What will you learn from this course?

v Course Information v What are the challenges? v Key NLP components v Key ML ideas in NLP

ML in NLP 4

slide-5
SLIDE 5

What is NLP

v Wiki: Natural language processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages.

ML in NLP 5

slide-6
SLIDE 6

Go beyond the keyword matching

v Identify the structure and meaning of words, sentences, texts and conversations v Deep understanding of broad language v NLP is all around us

ML in NLP 6

slide-7
SLIDE 7

Machine translation

ML in NLP 7

Facebook translation, image credit: Meedan.org

slide-8
SLIDE 8

Statistical machine translation

ML in NLP 8

Image credit: Julia Hockenmaier, Intro to NLP

slide-9
SLIDE 9

Dialog Systems

ML in NLP 9

slide-10
SLIDE 10

Sentiment/Opinion Analysis

ML in NLP 10

slide-11
SLIDE 11

Text Classification

v Other applications?

ML in NLP 11

www.wired.com

slide-12
SLIDE 12

Question answering

ML in NLP 12

credit: ifunny.com

'Watson' computer wins at 'Jeopardy'

slide-13
SLIDE 13

Question answering

v Go beyond search

ML in NLP 13

slide-14
SLIDE 14

Natural language instruction

ML in NLP 14

https://youtu.be/KkOCeAtKHIc?t=1m28s

slide-15
SLIDE 15

Digital personal assistant

v Semantic parsing – understand tasks v Entity linking – “my wife” = “Kellie” in the phone book

ML in NLP 15

credit: techspot.com

More on natural language instruction

slide-16
SLIDE 16

Information Extraction

v Unstructured text to database entries

ML in NLP 16

Yoav Artzi: Natural language processing

slide-17
SLIDE 17

Language Comprehension

v Q: who wrote Winnie the Pooh? v Q: where is Chris lived?

ML in NLP 17

Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield

  • Farm. When Chris was three years old, his father wrote

a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book

slide-18
SLIDE 18

What will you learn from this course

v The NLP Pipeline

v Key components for understanding text

v NLP systems/applications

v Current techniques & limitation

v Build realistic NLP tools

ML in NLP 18

slide-19
SLIDE 19

What’s not covered by this course

v Speech recognition – no signal processing v Natural language generation v Details of ML algorithms / theory v Text mining / information retrieval

ML in NLP 19

slide-20
SLIDE 20

This lecture

v Course Overview

v What is NLP? Why it is important? v What will you learn from this course?

v Course Information v What are the challenges? v Key NLP components

ML in NLP 20

slide-21
SLIDE 21

Overview

v New course, first time being offered

v Comments are welcomed v target at first- or second- year PhD students

v Lecture + Seminar v No course prerequisites, but I assume

v programming experience (for the final project) v basic ML/AI background v basics of probability calculus, and linear algebra (HW0)

ML in NLP 21

slide-22
SLIDE 22

Grading

v Attendance & participations (10%)

v Participate in discussion

v Paper summarization report (20%) v Paper presentation (30%) v Final project (40%)

v Proposal (5%) v Final Paper (25%) v Presentation (10%)

ML in NLP 22

slide-23
SLIDE 23

Paper summarization

v 1 page maximum v Pick one paper from recent ACL/NAACL/EMNLP/EACL v Summarize the paper (use you own words)

v Write a blog post using markdown or jupyter notebook: v https://einstein.ai/research/learned-in- translation-contextualized-word-vectors v https://github.com/uclanlp/reducingbias/blob/ma ster/src/fairCRF_gender_ratio.ipynb

ML in NLP 23

slide-24
SLIDE 24

ML in NLP 24

slide-25
SLIDE 25

ML in NLP 25

slide-26
SLIDE 26

Paper presentation

v Each group has 2~3 students v Read and understand 2~3 related papers

v Cannot be the same as your paper summary v Can be related to your final project v Register your choice next week

v 30 min presentation/ Q&A v Grading Rubric: 40% technical understanding, 40% presentation, 20% interaction

ML in NLP 26

slide-27
SLIDE 27

Final Project

v Work in groups (3 students) v Project proposal

v 1 page maximum (template)

v Project report

v Similar to the paper summary v Due before the final presentation

v Project presentation

v in-class presentation (tentative)

ML in NLP 27

slide-28
SLIDE 28

Late Policy

v Submission site will be closed 1hr after the deadline. v No late submission

v unless under emergency situation

ML in NLP 28

slide-29
SLIDE 29

Cheating/Plagiarism

v No. Ask if you have concerns v Rules of thumb:

v Cite your references v Clearly state what are your contributions

ML in NLP 29

slide-30
SLIDE 30

Lectures and office hours

v Participation is highly appreciated!

v Ask questions if you are still confusing v Feedbacks are welcomed v Lead the discussion in this class v Enroll Piazza

ML in NLP 30

slide-31
SLIDE 31

Topics of this class

v Fundamental NLP problems v Machine learning & statistical approaches for NLP v NLP applications v Recent trends in NLP

ML in NLP 31

slide-32
SLIDE 32

What to Read?

v Natural Language Processing

ACL, NAACL, EACL, EMNLP, CoNLL, Coling, TACL aclweb.org/anthology

v Machine learning

ICML, NIPS, ECML, AISTATS, ICLR, JMLR, MLJ

v Artificial Intelligence

AAAI, IJCAI, UAI, JAIR

ML in NLP 32

slide-33
SLIDE 33

Questions?

ML in NLP 33

slide-34
SLIDE 34

This lecture

v Course Overview

v What is NLP? Why it is important? v What will you learn from this course?

v Course Information v What are the challenges? v Key NLP components v Key ML ideas in NLP

ML in NLP 34

slide-35
SLIDE 35

Challenges – ambiguity v Word sense ambiguity

ML in NLP 35

slide-36
SLIDE 36

Challenges – ambiguity v Word sense / meaning ambiguity

ML in NLP 36

Credit: http://stuffsirisaid.com

slide-37
SLIDE 37

Challenges – ambiguity v PP attachment ambiguity

ML in NLP 37

Credit: Mark Liberman, http://languagelog.ldc.upenn.edu/nll/?p=17711

slide-38
SLIDE 38

Challenges -- ambiguity

v Ambiguous headlines:

v Include your children when baking cookies v Local High School Dropouts Cut in Half v Hospitals are Sued by 7 Foot Doctors v Iraqi Head Seeks Arms v Safety Experts Say School Bus Passengers Should Be Belted v Teacher Strikes Idle Kids

ML in NLP 38

slide-39
SLIDE 39

Challenges – ambiguity v Pronoun reference ambiguity

ML in NLP 39

Credit: http://www.printwand.com/blog/8-catastrophic-examples-of-word-choice-mistakes

slide-40
SLIDE 40

Challenges – language is not static

v Language grows and changes

v e.g., cyber lingo

ML in NLP 40

LOL Laugh out loud G2G Got to go BFN Bye for now B4N Bye for now Idk I don’t know FWIW For what it’s worth LUWAMH Love you with all my heart

slide-41
SLIDE 41

Challenges--language is compositional

ML in NLP 41

Carefully Slide

slide-42
SLIDE 42

Challenges--language is compositional

ML in NLP 42

小心: Carefully Careful Take Care Caution 地滑: Slide Landslip Wet Floor Smooth

slide-43
SLIDE 43

Challenges – scale

v Examples:

v Bible (King James version): ~700K v Penn Tree bank ~1M from Wall street journal v Newswire collection: 500M+ v Wikipedia: 2.9 billion word (English) v Web: several billions of words

ML in NLP 43

slide-44
SLIDE 44

This lecture

v Course Overview

v What is NLP? Why it is important? v What will you learn from this course?

v Course Information v What are the challenges? v Key NLP components v Key ML ideas in NLP

ML in NLP 44

slide-45
SLIDE 45

Part of speech tagging

ML in NLP 45

slide-46
SLIDE 46

Syntactic (Constituency) parsing

ML in NLP 46

slide-47
SLIDE 47

Syntactic structure => meaning

ML in NLP 47

Image credit: Julia Hockenmaier, Intro to NLP

slide-48
SLIDE 48

Dependency Parsing

ML in NLP 48

slide-49
SLIDE 49

Semantic analysis

v Word sense disambiguation v Semantic role labeling

ML in NLP 49

Credit: Ivan Titov

slide-50
SLIDE 50

Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book

50

Q: [Chris] = [Mr. Robin] ?

Slide modified from Dan Roth ML in NLP

slide-51
SLIDE 51

Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book

51

Co-reference Resolution

ML in NLP

slide-52
SLIDE 52

This lecture

v Course Overview

v What is NLP? Why it is important? v What will you learn from this course?

v Course Information v What are the challenges? v Key NLP components v Key ML ideas in NLP

ML in NLP 52

slide-53
SLIDE 53

Machine learning 101

CS6501- Advanced Machine Learning 53

slide-54
SLIDE 54

CS6501- Advanced Machine Learning 54

slide-55
SLIDE 55

CS6501- Advanced Machine Learning 55

Perceptron, decision tree, support vector machine K-NN, Naïve Bayes, logistic regression….

slide-56
SLIDE 56

Classification is generally well-understood

v Theoretically: generalization bound

v # examples to train a good model

v Algorithmically:

v Efficient algorithm for large data set

v E.g., take a few second to train a linear SVM on data with millions instances and features

v Algorithms for non-linear model

v E.g., Kernel methods

CS6501- Advanced Machine Learning 56

Is this enough to solve all real-world problems?

slide-57
SLIDE 57

Reading Comprehension

CS6501- Advanced Machine Learning 57

slide-58
SLIDE 58

Challenges

vModeling challenges

vHow to model a complex decision?

vRepresentation challenges

vHow to extract features?

vAlgorithmic challenges

v Large amount of data and complex decision structure

58

Bill Clinton, recently elected as the President of the USA, has been invited by the Russian President], [Vladimir Putin, to visit Russia. President Clinton said that he looks forward to strengthening ties between USA and Russia

Algorithm 2 is shown to perform better Berg-Kirkpatrick, ACL

  • 2010. It can also be expected to

converge faster -- anyway, the E- step changes the auxiliary function by changing the expected counts, so there's no point in finding a local maximum

  • f the auxiliary

function in each iteration a local-optimality guarantee. Consequently, LOLS can improve upon the reference policy, unlike previous

  • algorithms. This enables us to

develop structured contextual bandits, a partial information structured prediction setting with many potential applications. Can learning to search work even when the reference is poor? We provide a new learning to search algorithm, LOLS, which does well relative to the reference policy, but additionally guarantees low regret compared to deviations from the learned policy. Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference. This is unsatisfactory in many applications where the reference policy is suboptimal and the goal

  • f learning is to

Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about

  • him. The poem was printed in a

magazine for others to read. Mr. Robin then wrote a book Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield

  • Farm. When Chris was three years old,

his father wrote a poem about him. The poem was printed in a magazine for

  • thers to read. Mr. Robin then wrote a

book

Structured prediction models Deep learning models Inference / learning algorithms

slide-59
SLIDE 59

Modeling Challenges

v How to model a complex decision? v Why this is important?

CS6501- Advanced Machine Learning 59

Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield

  • Farm. When Chris was three years old,

his father wrote a poem about him. The poem was printed in a magazine for

  • thers to read. Mr. Robin then wrote a

book

slide-60
SLIDE 60

Language is structural

CS6501- Advanced Machine Learning 60

slide-61
SLIDE 61

Hand written recognition

v What is this letter?

CS6501- Advanced Machine Learning 61

slide-62
SLIDE 62

Hand written recognition

v What is this letter?

CS6501- Advanced Machine Learning 62

slide-63
SLIDE 63

Visual recognition

CS6501- Advanced Machine Learning 63

slide-64
SLIDE 64

Human body recognition

CS6501- Advanced Machine Learning 64

slide-65
SLIDE 65

Bridge the gap

v Simple classifiers are not designed for handle complex output v Need to make multiple decisions jointly v Example: POS tagging:

CS6501- Advanced Machine Learning 65 Example from Vivek Srikumar

can you can a can as a canner can can a can

slide-66
SLIDE 66

Make multiple decisions jointly

v Example: POS tagging:

v Each part needs a label

v Assign tag (V., N., A., …) to each word in the sentence

v The decisions are mutually dependent

v Cannot have verb followed by a verb

v Results are evaluated jointly

CS6501- Advanced Machine Learning 66

can you can a can as a canner can can a can

slide-67
SLIDE 67

Structured prediction problems

v Problems that

v have multiple interdependent output variables

v and the output assignments are evaluated jointly v Need a joint assignment to all the output variables v We called it joint inference, global infernece or simply inference

CS6501- Advanced Machine Learning 67

slide-68
SLIDE 68

v Input: 𝑦 ∈ 𝑌 v Truth: y∗ ∈ 𝑍(𝑦) v Predicted: ℎ(𝑦) ∈ 𝑍(𝑦) v Loss: 𝑚𝑝𝑡𝑡 𝑧, 𝑧∗

68

I can can a can Pro Md Vb Dt Nn Pro Md Nn Dt Vb Pro Md Nn Dt Md Pro Md Md Dt Nn Pro Md Md Dt Vb

Goal: make joint prediction to minimize a joint loss find ℎ ∈ 𝐼 such that ℎ x ∈ 𝑍(𝑌) minimizing 𝐹 3,4 ~6 𝑚𝑝𝑡𝑡 𝑧, ℎ 𝑦 based on 𝑂 samples 𝑦8, 𝑧8 ~𝐸

Kai-Wei Chang (University of Virginia)

A General learning setting

slide-69
SLIDE 69

v Input: 𝑦 ∈ 𝑌 v Truth: y∗ ∈ 𝑍(𝑦) v Predicted: ℎ(𝑦) ∈ 𝑍(𝑦) v Loss: 𝑚𝑝𝑡𝑡 𝑧, 𝑧∗

69

I can can a can Pro Md Vb Dt Nn Pro Md Nn Dt Vb Pro Md Nn Dt Md Pro Md Md Dt Nn Pro Md Md Dt Vb

# POS tags: 45 How many possible outputs for sentence with 10 words?

Kai-Wei Chang (University of Virginia)

Combinatorial output space

45<= = 3.4×10<D

Observation: Not all sequences are valid, and we don’t need to consider all of them

slide-70
SLIDE 70

Representation of interdependent output variables

v A compact way to represent output combinations

v Abstract away unnecessary complexities v We know how to process them

v Graph algorithms for linear chain, tree, etc.

CS6501- Advanced Machine Learning 70

Pronoun Verb Noun And Noun

Root They operate ships and banks .

slide-71
SLIDE 71

Algorithms/models for structured prediction

v Many learning algorithms can be generalized to the structured case

v Perceptron → Structured perceptron v SVM → Structured SVM v Logistic regression → Conditional random field (a.k.a. log-linear models)

v Can be solved by a reduction stack

v Structured prediction → multi-class → binary

CS6501- Advanced Machine Learning 71

slide-72
SLIDE 72

Representation Challenges

v How to obtain features?

CS6501- Advanced Machine Learning 72

Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield

  • Farm. When Chris was three years old, his father

wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book

slide-73
SLIDE 73

Representation Challenges

v How to obtain features?

  • 1. Design features based on domain knowledge

v E.g., by patterns in parse trees v By nicknames

v Need human experts/knowledge

CS6501- Advanced Machine Learning 73

When Chris was three years old, his father wrote a poem about him. Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm.

slide-74
SLIDE 74

Representation Challenges

v How to obtain features?

  • 1. Design features based on domain knowledge
  • 2. Design feature templates and then let machine

find the right ones

v E.g., use all words, pairs of words, …

CS6501- Advanced Machine Learning 74

Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book

slide-75
SLIDE 75

Representation Challenges

v How to obtain features?

  • 1. Design features based on domain knowledge
  • 2. Design feature templates and then let machine

find the right ones

v Challenges:

v # featuers can be very large

v # English words: 171K (Oxford) v # Bigram: 171𝐿 H~3×10<=, # trigram?

v For some domains, it is hard to design features

CS6501- Advanced Machine Learning 75

slide-76
SLIDE 76

Representation learning

v Learn compact representations of features

v Combinatorial (continuous representation)

CS6501- Advanced Machine Learning 76

slide-77
SLIDE 77

Representation learning

v Learn compact representations of features

v Combinatorial (continuous representation) v Hieratical/compositional

CS6501- Advanced Machine Learning 77

slide-78
SLIDE 78

What will learn from this course

v Structured prediction

v Models / inference/ learning

v Representation (deep) learning

v Input/output representations

v Combining structured models and deep learning

CS6501- Advanced Machine Learning 78