Deep learning for NLP: Introduction CS 6956: Deep Learning for NLP - - PowerPoint PPT Presentation

deep learning for nlp introduction
SMART_READER_LITE
LIVE PREVIEW

Deep learning for NLP: Introduction CS 6956: Deep Learning for NLP - - PowerPoint PPT Presentation

Deep learning for NLP: Introduction CS 6956: Deep Learning for NLP Words are a very fantastical banquet, just so many strange dishes And yet, we seem to do fine We can understand and generate language effortlessly. Almost 1 Wouldnt it be


slide-1
SLIDE 1

CS 6956: Deep Learning for NLP

Deep learning for NLP: Introduction

slide-2
SLIDE 2

Words are a very fantastical banquet, just so many strange dishes

And yet, we seem to do fine We can understand and generate language effortlessly. Almost

1

slide-3
SLIDE 3

Wouldn’t it be great if computers could understand language?

2

slide-4
SLIDE 4

Wanted

Programs that can learn to understand and reason about the world via language

3

slide-5
SLIDE 5

Processing Natural Language

https://flic.kr/p/6fnqdv Or: An attempt to replicate (in computers) a phenomenon that is exhibited only by humans.

4

slide-6
SLIDE 6

Our goal today

Why study deep learning for natural language processing

  • What makes language different from other

applications?

  • Why deep learning?

5

slide-7
SLIDE 7

Language is fun!

6

slide-8
SLIDE 8

Language is ambiguous

7

slide-9
SLIDE 9

Language is ambiguous

8

slide-10
SLIDE 10

Language is ambiguous

I ate sushi with tuna. I ate sushi with chopsticks. I ate sushi with a friend. I saw a man with a telescope. Stolen painting found by tree.

Ambiguity can take many forms: Lexical, syntactic, semantic

9

slide-11
SLIDE 11

Language has complex structure

Mary saw a ring through the window and asked John for it. Why

  • n earth did Mary ask for a window?

“My parents are stuck at Waterloo Station. There’s been a bomb scare.” “Are they safe?” “No, bombs are really dangerous.

Anaphora resolution: Which entity/entities do pronouns refer to?

10

slide-12
SLIDE 12

Language has complex structure

Jan the children saw swim

Parsing: Identifying the syntactic structure of sentences

11

slide-13
SLIDE 13

Language has complex structure

Jan the children saw swim subject

Parsing: Identifying the syntactic structure of sentences

12

slide-14
SLIDE 14

Language has complex structure

Jan the children saw swim subject

  • bject

Parsing: Identifying the syntactic structure of sentences

13

slide-15
SLIDE 15

Language has complex structure

Jan the children saw swim subject subject

  • bject

Parsing: Identifying the syntactic structure of sentences

14

slide-16
SLIDE 16

Language has complex structure

Jan de kinderen zag zwemmen Jan the children saw swim

15

slide-17
SLIDE 17

Language has complex structure

Jan de kinderen zag zwemmen Jan the children saw swim Jan the children saw swim

16

slide-18
SLIDE 18

Language has complex structure

Jan de kinderen zag zwemmen Jan the children saw swim Jan the children saw swim

17

slide-19
SLIDE 19

Language has complex structure

Jan de kinderen zag zwemmen Jan the children saw swim Jan the children saw swim

18

slide-20
SLIDE 20

Language has complex structure

Jan de kinderen zag zwemmen Jan the children saw swim Jan the children saw swim

19

slide-21
SLIDE 21

Language has complex structure

Jan de kinderen zag zwemmen Piet helpen Jan the children saw swim Piet help

20

slide-22
SLIDE 22

Language has complex structure

Jan the children saw swim Jan de kinderen zag zwemmen Piet helpen Jan the children saw swim Piet help Piet help

21

slide-23
SLIDE 23

Language has complex structure

Jan the children saw swim Jan de kinderen zag zwemmen Piet helpen Jan the children saw swim Piet help Piet help

22

slide-24
SLIDE 24

Language has complex structure

Jan the children saw swim Jan de kinderen zag zwemmen Piet helpen Jan the children saw swim Piet help Piet help

23

slide-25
SLIDE 25

Language has complex structure

Jan de kinderen zag zwemmen Piet helpen Marie leren Jan the children saw swim Piet help Marie teach

24

slide-26
SLIDE 26

Language has complex structure

the children swim Jan saw Piet help Jan de kinderen zag zwemmen Piet helpen Marie leren Jan the children saw swim Piet help Marie teach Marie teach

25

slide-27
SLIDE 27

Language has complex structure

the children swim Jan saw Piet help Jan de kinderen zag zwemmen Piet helpen Marie leren Jan the children saw swim Piet help Marie teach Marie teach

26

slide-28
SLIDE 28

Language has complex structure

the children swim Jan saw Piet help Jan de kinderen zag zwemmen Piet helpen Marie leren Jan the children saw swim Piet help Marie teach Marie teach

27

slide-29
SLIDE 29

Language has complex structure

the children swim Jan saw Piet help Jan de kinderen zag zwemmen Piet helpen Marie leren Jan the children saw swim Piet help Marie teach Marie teach Natural language is not a context free language!

28

slide-30
SLIDE 30

Many, many linguistic phenomena

Metaphor

– makes my blood boil, apple of my eye, etc.

Metonymy

– The White House said today that …

A very long list…

29

slide-31
SLIDE 31

And, we make up things all the time

If not actually disgruntled, he was far from being gruntled. The colors … only seem really real when you viddy them on the screen. Twas brillig, and the slithy toves Did gyre and gimble in the wabe: All mimsy were the borogoves, And the mome raths outgrabe.

30

slide-32
SLIDE 32

Language can be problematic

31

slide-33
SLIDE 33

Ambiguity and variability

Language is ambiguous and can have variable meaning

– But machine learning methods can excel in these situations

There are other issues that present difficulties:

  • 1. Inputs are discrete, but numerous (words)
  • 2. Both inputs and outputs are compositional

32

slide-34
SLIDE 34
  • 1. Inputs are discrete

What do words mean? How do we represent meaning in a computationally convenient way? bunny and sunny are only one letter apart but very far in meaning bunny and rabbit are very close in meaning, but look very different And can we learn their meaning from data?

33

slide-35
SLIDE 35
  • 2. Compositionality

We piece meaning together from parts

  • Inputs are compositional

– characters form words, which form phrases, clauses, sentences, and entire documents

  • Outputs are also compositional

– Several NLP tasks produce structures

  • Outputs are trees or graphs (e.g., parse trees)

– Or they produce language

  • E.g., translation, generation
  • Both cases:

– Inputs/outputs are compositional

34

slide-36
SLIDE 36

Discrete + compositional = sparse

  • Compositionality allows us to construct infinite

combinations of symbols

– Think of linguistic creativity – How many words, phrases, sentences have you encountered that you have never seen before

  • No dataset has all inputs/outputs possible
  • NLP has to generalize to novel inputs and also generate

novel outputs

35

slide-37
SLIDE 37

Machine learning to the rescue

36

slide-38
SLIDE 38

Modeling language: Power to the data

  • Understanding and generating language are

challenging computational problems

  • Supervised machine learning offers perhaps the best

known methods

– Essentially teases apart patterns from labeled data

37

slide-39
SLIDE 39

Example: The company words keep

I would like to eat a _______ of cake peace or piece? An idea

  • Train a binary classifier to make this decision
  • Use indicators for neighboring words as features

Works surprisingly well!

Data + features + learning algorithm = Profit!

38

slide-40
SLIDE 40

Example: The company words keep

I would like to eat a _______ of cake peace or piece? An idea

  • Train a binary classifier to make this decision
  • Use indicators for neighboring words as features

Works surprisingly well!

Data + features + learning algorithm = Profit!

39

What features?

slide-41
SLIDE 41

The problem of representations

  • “Traditional NLP”

– Hand designed features: words, parts-of-speech, etc – Linear models

  • Manually designed features could be incomplete or
  • vercomplete
  • Deep learning

– Promises the ability to learn good representations (i.e. features) for the task at hand – Typically vectors, also called distributed representations

40

slide-42
SLIDE 42

Several successes of deep learning

  • Word embeddings

– A general purpose feature representation layer for words

  • Syntactic parsing

– Chen and Manning, 2014, Durrett and Klein, 2015, Weiss et al., 2015]

  • Language modeling

– Starting with Bengio, 2003, several advances since then

41

slide-43
SLIDE 43

More successes

  • Machine translation

– Neural machine translation is the de facto now – Sequence-to-sequence networks [eg. Sutskever 2014]

  • Sentences in one language converted to a vector using a neural

network

  • That vector converted to a sentence in another language
  • Text understanding tasks

– Natural language inference [eg. Parikh et al 2016] – Reading comprehension [eg. Seo et al 2016]

42

slide-44
SLIDE 44

Deep learning for NLP

Techniques that integrate

  • 1. Neural networks for NLP, trained end-to-end
  • 2. Learned features providing distributed

representations

  • 3. Ability to handle varying input/output sizes

43

Note: Some ideas that are advertised as deep learning only involve shallow neural networks. For example, training word embeddings. But we will use the umbrella term anyway with this caveat.

slide-45
SLIDE 45

What we will see in this semester

44

slide-46
SLIDE 46

What we will see

  • A general overview of underlying concepts that

pervade deep learning for NLP tasks

  • A collection of successful design ideas to handle

sparse, compositional varying sized inputs and

  • utputs

45

slide-47
SLIDE 47

Semester overview

Part 1: Introduction

– Review of key concepts in supervised learning – Review of neural networks – The computation graph abstraction and gradient-based learning

46

slide-48
SLIDE 48

Semester overview

Part 2: Representing words

– Distributed representations of words, i.e. word embeddings – Training word embeddings using the distributional hypothesis and feed-forward networks – Evaluating word embeddings

47

slide-49
SLIDE 49

Semester overview

Part 3: Recurrent neural networks

– Sequence prediction using neural networks – LSTMs and their variants – Applications – Word embeddings revisited

48

slide-50
SLIDE 50

Semester overview

Part 4: Composing word embeddings into sentence/phrase features

  • Convolutional Neural Networks for NLP
  • Recurrent neural networks revisited
  • (Recursive neural networks)

49

slide-51
SLIDE 51

Semester overview

Part 5: Advanced topics

– The encoder-decoder architecture – Attention – The transformer architecture – Neural networks and structures

50

slide-52
SLIDE 52

Class objective

At the end of the course, you should be able to:

  • 1. Define deep neural networks for new NLP

problems,

  • 2. Implement and train such models using off-the-

shelf libraries, and

  • 3. Be able to critically read, evaluate and perhaps

replicate current literature in the field.

51