What is NLP? CMSC 473/673 http://www.qwantz.com/index.php?comic=170 - - PowerPoint PPT Presentation

what is nlp cmsc 473 673
SMART_READER_LITE
LIVE PREVIEW

What is NLP? CMSC 473/673 http://www.qwantz.com/index.php?comic=170 - - PowerPoint PPT Presentation

What is NLP? CMSC 473/673 http://www.qwantz.com/index.php?comic=170 Todays Learning Goals NLP vs. CL Terminology: NLP: vocabulary, token, type, one-hot encoding, dense embedding, parameter/weight, corpus/corpora Linguistics:


slide-1
SLIDE 1

http://www.qwantz.com/index.php?comic=170

What is NLP? CMSC 473/673

slide-2
SLIDE 2
slide-3
SLIDE 3

Today’s Learning Goals

  • NLP vs. CL
  • Terminology:

– NLP: vocabulary, token, type, one-hot encoding, dense embedding, parameter/weight, corpus/corpora – Linguistics: lexeme, morphology, syntax, semantics, “discourse”

  • NLP Tasks (high-level):

– Part of speech tagging – Syntactic parsing – Entity id/coreference

  • Universal Dependencies
slide-4
SLIDE 4

http://www.qwantz.com/index.php?comic=170

slide-5
SLIDE 5

Natural Language Processing ≈ Computational Linguistics

slide-6
SLIDE 6

Natural Language Processing ≈ Computational Linguistics

science focus computational bio computational chemistry computational X

slide-7
SLIDE 7

Natural Language Processing ≈ Computational Linguistics

science focus computational bio computational chemistry computational X build a system to translate create a QA system engineering focus

slide-8
SLIDE 8

Natural Language Processing ≈ Computational Linguistics

Machine learning Information Theory Data Science Systems Engineering Logic Theory of Computation Linguistics Cognitive Science Psychology Political Science Digital Humanities Education Both have impact in/contribute to/draw from:

slide-9
SLIDE 9

Natural Language Processing ≈ Computational Linguistics

science focus computational bio computational chemistry computational X build a system to translate create a QA system engineering focus

these views can co-exist peacefully

slide-10
SLIDE 10

What Are Words?

Linguists don’t agree (Human) Language-dependent White-space separation is a sometimes okay (for written English longform) Social media? Spoken vs. written? Other languages?

slide-11
SLIDE 11

What Are Words? Tokens vs. Types

The film got a great opening and the film went on to become a hit .

Type: an element of the vocabulary. Token: an instance of that type in running text. Vocabulary: the words (items) you know How many of each?

slide-12
SLIDE 12

Terminology: Tokens vs. Types

The film got a great opening and the film went on to become a hit .

Tokens

  • The
  • film
  • got
  • a
  • great
  • pening
  • and
  • the
  • film
  • went
  • n
  • to
  • become
  • a
  • hit
  • .

Types

  • The
  • film
  • got
  • a
  • great
  • pening
  • and
  • the
  • went
  • n
  • to
  • become
  • hit
  • .
slide-13
SLIDE 13

Terminology: Tokens vs. Types

The film got a great opening and the film went on to become a hit .

Tokens

  • The
  • film
  • got
  • a
  • great
  • pening
  • and
  • the
  • film
  • went
  • n
  • to
  • become
  • a
  • hit
  • .

Types

  • The
  • film
  • got
  • a
  • great
  • pening
  • and
  • the
  • went
  • n
  • to
  • become
  • hit
  • .
slide-14
SLIDE 14

Representing a Linguistic “Blob”

  • 1. An array of sub-blobs

word → array of characters sentence → array of words

How do you represent these?

slide-15
SLIDE 15

Representing a Linguistic “Blob”

  • 1. An array of sub-blobs

word → array of characters sentence → array of words

  • 2. Integer representation/one-hot encoding
  • 3. Dense embedding

How do you represent these?

slide-16
SLIDE 16

Representing a Linguistic “Blob”

1. An array of sub-blobs

word → array of characters sentence → array of words

2. Integer representation/one-hot encoding 3. Dense embedding Let V = vocab size (# types) 1. Represent each word type with a unique integer i, where 0 ≤ 𝑗 < 𝑊

slide-17
SLIDE 17

Representing a Linguistic “Blob”

1. An array of sub-blobs

word → array of characters sentence → array of words

2. Integer representation/one-hot encoding 3. Dense embedding Let V = vocab size (# types) 1. Represent each word type with a unique integer i, where 0 ≤ 𝑗 < 𝑊 2. Or equivalently, …

– Assign each word to some index i, where 0 ≤ 𝑗 < 𝑊 – Represent each word w with a V-dimensional binary vector 𝑓𝑥, where 𝑓𝑥,𝑗 = 1 and 0

  • therwise
slide-18
SLIDE 18

One-Hot Encoding Example

  • Let our vocab be {a, cat, saw, mouse, happy}
  • V = # types = 5

Q: What is V (# types)?

slide-19
SLIDE 19

One-Hot Encoding Example

  • Let our vocab be {a, cat, saw, mouse, happy}
  • V = # types = 5
  • Assign:

a 4 cat 2 saw 3 mouse happy 1 How do we represent “cat?”

slide-20
SLIDE 20

One-Hot Encoding Example

  • Let our vocab be {a, cat, saw, mouse, happy}
  • V = # types = 5
  • Assign:

a 4 cat 2 saw 3 mouse happy 1

𝑓cat = 1

How do we represent “cat?” How do we represent “happy?”

slide-21
SLIDE 21

One-Hot Encoding Example

  • Let our vocab be {a, cat, saw, mouse, happy}
  • V = # types = 5
  • Assign:

a 4 cat 2 saw 3 mouse happy 1

𝑓cat = 1

How do we represent “cat?”

𝑓happy = 1

How do we represent “happy?”

slide-22
SLIDE 22

Representing a Linguistic “Blob”

1. An array of sub-blobs

word → array of characters sentence → array of words

2. Integer representation/one-hot encoding 3. Dense embedding Let E be some embedding size (often 100, 200, 300, etc.) Represent each word w with an E-dimensional real- valued vector 𝑓𝑥

slide-23
SLIDE 23

A Dense Representation (E=2)

slide-24
SLIDE 24

Where Do We Observe Language?

  • All around us
  • NLP/CL: from a corpus (pl: corpora)

– Literally a “body” of text

  • In real life:

– Through curators (the LDC) – From the web (scrape Wikipedia, Reddit, etc.) – Via careful human elicitation (lab studies, crowdsourcing) – From previous efforts

  • In this class: the Universal Dependencies
slide-25
SLIDE 25

http://universaldependencies.org/

part-of-speech & syntax for > 120 languages

slide-26
SLIDE 26

http://www.qwantz.com/index.php?comic=170

“Language is Productive”

slide-27
SLIDE 27

Adapted from Jason Eisner, Noah Smith

slide-28
SLIDE 28
  • rthography

Adapted from Jason Eisner, Noah Smith

slide-29
SLIDE 29
  • rthography

morphology: study of how words change

Adapted from Jason Eisner, Noah Smith

slide-30
SLIDE 30
slide-31
SLIDE 31

Watergate

slide-32
SLIDE 32

Troopergate Watergate ➔ Bridgegate Deflategate

slide-33
SLIDE 33
  • rthography

morphology

Adapted from Jason Eisner, Noah Smith

lexemes: a basic “unit” of language

slide-34
SLIDE 34

Ambiguity

Kids Make Nutritious Snacks

slide-35
SLIDE 35

Ambiguity

Kids Make Nutritious Snacks Kids Prepare Nutritious Snacks Kids Are Nutritious Snacks

sense ambiguity

slide-36
SLIDE 36
  • rthography

morphology

Adapted from Jason Eisner, Noah Smith

lexemes syntax: study of structure in language

slide-37
SLIDE 37

Ambiguity

British Left Waffles on Falkland Islands

slide-38
SLIDE 38

Lexical Ambiguity…

British Left Waffles on Falkland Islands British Left Waffles on Falkland Islands British Left Waffles on Falkland Islands

slide-39
SLIDE 39

… yields the “Part of Speech Tagging” task

British Left Waffles on Falkland Islands British Left Waffles on Falkland Islands British Left Waffles on Falkland Islands

Adjective Noun Verb Noun Verb Noun

slide-40
SLIDE 40

Parts of Speech

Classes of words that behave like one another in “similar” contexts Pronunciation (stress) can differ: object (noun: OB-ject) vs. object (verb: ob-JECT) It can help improve the inputs to other systems (text-to-speech, syntactic parsing)

slide-41
SLIDE 41

Syntactic Ambiguity…

Pat saw Chris with the telescope on the hill. I ate the meal with friends.

slide-42
SLIDE 42

… yields the “Syntactic Parsing” task

Pat saw Chris with the telescope on the hill. I ate the meal with friends.

dobj ncomp dobj

slide-43
SLIDE 43

Syntactic Parsing

I ate the meal with friends

NP VP VP NP PP S

Syntactic parsing: perform a “meaningful” structural analysis according to grammatical rules

slide-44
SLIDE 44

Syntactic Parsing Can Help Disambiguate

I ate the meal with friends

NP VP VP NP PP S

slide-45
SLIDE 45

Syntactic Parsing Can Help Disambiguate

I ate the meal with friends

NP VP VP NP PP S NP VP S VP NP PP NP

slide-46
SLIDE 46

Clearly Show Ambiguity… But Not Necessarily All Ambiguity

I ate the meal with friends

NP VP VP NP PP S

I ate the meal with gusto I ate the meal with a fork

slide-47
SLIDE 47
  • rthography

morphology

Adapted from Jason Eisner, Noah Smith

lexemes syntax semantics: study of (literal?) meaning

slide-48
SLIDE 48
  • rthography

morphology

Adapted from Jason Eisner, Noah Smith

lexemes syntax semantics pragmatics: study of (implied?) meaning

slide-49
SLIDE 49
  • rthography

morphology

Adapted from Jason Eisner, Noah Smith

lexemes syntax semantics pragmatics discourse: study of how we communicate

slide-50
SLIDE 50

Semantics → Discourse Processing

John stopped at the donut store.

Courtesy Jason Eisner

slide-51
SLIDE 51

Semantics → Discourse Processing

John stopped at the donut store.

Courtesy Jason Eisner

slide-52
SLIDE 52

Semantics → Discourse Processing

John stopped at the donut store before work.

Courtesy Jason Eisner

slide-53
SLIDE 53

Semantics → Discourse Processing

John stopped at the donut store on his way home.

Courtesy Jason Eisner

slide-54
SLIDE 54

Semantics → Discourse Processing

John stopped at the donut shop. John stopped at the trucker shop. John stopped at the mom & pop shop. John stopped at the red shop.

Courtesy Jason Eisner

slide-55
SLIDE 55

Discourse Processing through Coreference

I spread the cloth on the table to protect it. I spread the cloth on the table to display it.

Courtesy Jason Eisner

slide-56
SLIDE 56

I spread the cloth on the table to protect it. I spread the cloth on the table to display it.

Courtesy Jason Eisner

Discourse Processing through Coreference

slide-57
SLIDE 57

I spread the cloth on the table to protect it. I spread the cloth on the table to display it.

Courtesy Jason Eisner

Discourse Processing through Coreference

slide-58
SLIDE 58

Adapted from Jason Eisner, Noah Smith

NLP + Latent Modeling

explain what you see/annotate with things “of importance” you don’t

  • rthography

morphology lexemes syntax semantics pragmatics discourse

  • bserved text
slide-59
SLIDE 59
  • rthography

morphology lexemes syntax semantics pragmatics discourse

slide-60
SLIDE 60
  • rthography

morphology lexemes syntax semantics pragmatics discourse

VISION AUDIO

prosody intonation color

slide-61
SLIDE 61

http://www.qwantz.com/index.php?comic=170

slide-62
SLIDE 62

NLP <-> Machine Learning Goal: Learn parameters (weights) θ to develop a scoring function that says how “good” some provided text is

slide-63
SLIDE 63

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today.

Goal: Learn parameters (weights) θ to develop a scoring function that says how “good” some provided text is

slide-64
SLIDE 64

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today.

pθ( )

Goal: Learn parameters (weights) θ to develop a scoring function that says how “good” some provided text is

slide-65
SLIDE 65

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today.

pθ( )

Goal: Learn parameters (weights) θ to develop a scoring function that says how “good” some provided text is Terminology: parameters: primary “knobs” of the model that are set by a learning algorithm

slide-66
SLIDE 66

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today.

s = pθ( )

Goal: Learn parameters (weights) θ to develop a scoring function that says how “good” some provided text is

slide-67
SLIDE 67

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today.

s = pθ( )

Goal: Learn parameters (weights) θ to develop a scoring function that says how “good” some provided text is Q: If we make p to be a probability distribution, what are the minimum and maximum values of s?

slide-68
SLIDE 68

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today.

s = pθ( )

Goal: Learn parameters (weights) θ to develop a scoring function that says how “good” some provided text is Q: If we make p to be a probability distribution, what are the minimum and maximum values of s?

A: 0 ≤ 𝑡 ≤ 1

slide-69
SLIDE 69

pθ(X)

probabilistic model

  • bjective

F(θ)

Use ML Techniques to Learn the Weights

slide-70
SLIDE 70

Gradient Ascent

θ2 θ1

slide-71
SLIDE 71

Gradient Ascent

θ2 θ1

slide-72
SLIDE 72

Gradient Ascent

θ2 θ1

slide-73
SLIDE 73

Gradient Ascent

“gradient of F with respect to θ”

θ2 θ1

slide-74
SLIDE 74

Gradient Ascent

“gradient of F with respect to θ” gradient: a vector of derivatives, each with respect to θk while holding all other variables constant

θ2 θ1

slide-75
SLIDE 75

http://www.qwantz.com/index.php?comic=170

slide-76
SLIDE 76

Today’s Learning Goals

  • NLP vs. CL
  • Terminology:

– NLP: vocabulary, token, type, one-hot encoding, dense embedding, parameter/weight, corpus/corpora – Linguistics: lexeme, morphology, syntax, semantics, “discourse”

  • NLP Tasks (high-level):

– Part of speech tagging – Syntactic parsing – Entity id/coreference

  • Universal Dependencies