Deep Learning Research for NLP Graham Neubig Language Processing - - PowerPoint PPT Presentation

deep learning research for nlp
SMART_READER_LITE
LIVE PREVIEW

Deep Learning Research for NLP Graham Neubig Language Processing - - PowerPoint PPT Presentation

LTI Orientation Deep Learning Research for NLP Graham Neubig Language Processing Mary prevents Peter from scoring a goal. John passes the ball upfield to Peter, who shoots for the goal. The shot is deflected by Mary and the ball goes


slide-1
SLIDE 1

LTI Orientation

Deep Learning Research
 for NLP

Graham Neubig

slide-2
SLIDE 2

Language Processing

John passes the ball upfield to Peter, who shoots for the

  • goal. The shot is deflected by

Mary and the ball goes out

  • f bounds.

Mary prevents Peter from scoring a goal.

slide-3
SLIDE 3

Structured Prediction

X Y

slide-4
SLIDE 4

Supervised Learning

X Y

X Y

slide-5
SLIDE 5

Supervised Learning w/ Neural Nets

Learning

θ

X Y

Y X

slide-6
SLIDE 6

Structured Prediction w/ Neural Nets

slide-7
SLIDE 7

Neural Structured Prediction

X

Model Loss

w

Y

slide-8
SLIDE 8

Neural Structured Prediction

X

Model Loss

w

Y

Search

slide-9
SLIDE 9

The Problem of Discrete Decisions

w

g(w)

ˆ yi(w) = ‘dog’ ˆ yi(w) = ‘the’ ˆ yi(w) = ‘cat’

slide-10
SLIDE 10

Soft Search [Goyal+18]


(Faculty: Neubig)

{

e(dog) e(the) e(cat)

si(cat) si(the) si(dog)

hi+1 hi

˜ ei

α-soft argmax

X

y

e(y) · exp[α · si(y)] Z

{

peaked softmax

ˆ yi = ‘the’

argmax

slide-11
SLIDE 11

Smoothed Surface

w

g(w)

α = 1 α = 10

ˆ yi(w) = ‘dog’ ˆ yi(w) = ‘the’ ˆ yi(w) = ‘cat’

slide-12
SLIDE 12

Prediction over Word Embedding Space [Kumar+18]


(Faculty: Tsvetkov)

slide-13
SLIDE 13

Structured Modeling w/ Neural Nets

slide-14
SLIDE 14

Why Structured Neural Nets

  • In pre-neural NLP we did feature engineering to

capture the salient features of text

  • Now, neural nets capture features for us
  • But given too much freedom they will not learn or
  • verfit
  • So we do architecture engineering to add bias
slide-15
SLIDE 15

Structure in Language

Phrases Words Sentences

Alice gave a message to Bob

PP NP VP VP S

Documents

This film was completely unbelievable. The characters were wooden and the plot was absurd. That being said, I liked it.

slide-16
SLIDE 16

BiLSTM Conditional Random Fields [Ma+15]


(Faculty: Hovy)

  • Add an additional layer that ensures consistency between tags

I hate this movie <s> <s> PRP VBP DT NN

  • Training and prediction use dynamic programming
slide-17
SLIDE 17

Neural Factor Graph Models [Malaviya+18]

(Faculty: Gormley, Neubig)

  • Problem: Neural CRFs can only handle single tag/word
  • Idea: Expand to multiple tags using graphical models
slide-18
SLIDE 18

Stack LSTM [Dyer+15]

(Faculty: Dyer (now DeepMind))

  • verhasty

an decision was

amod

REDUCE-LEFT(amod) SHIFT

| {z }

| {z }

| {z }

SHIFT RED-L(amod)

made S B A ∅ pt root

TOP TOP TOP

REDUCE_L REDUCE_R SHIFT

Compositional Representation

slide-19
SLIDE 19

Morphological Language Models [Matthews+18]


(Faculty: Neubig, Dyer)

  • Problem: Language

modeling for morphologically rich languages is hard

  • Idea: Specifically

decompose input and output using morphological structure

slide-20
SLIDE 20

Neural-Symbolic Integration

slide-21
SLIDE 21

Neural-Symbolic Hybrids

  • Neural and symbolic models better at different

things

  • Neural: smoothing over differences using

similarity

  • Symbolic: remembering individual single-shot

events

  • How can we combine the two?
slide-22
SLIDE 22

Discrete Lexicons in Neural Seq2seq [Arthur+15]

(Faculty: Neubig)

slide-23
SLIDE 23

NNs + Logic Rules [Hu+16]

(Faculty: Hovy, Xing)

  • Problem: It is difficult to explicitly incorporate knowledge into

neural-net-based models

  • Idea: Use logical rules to constrain space of predicted probabilities
slide-24
SLIDE 24

Latent Variable Models

slide-25
SLIDE 25

Latent Variable Models

X Y

slide-26
SLIDE 26

Latent Variable Models

X Y

?

Z

? ?

Y X

Z

slide-27
SLIDE 27

Neural Latent Variable Models X

Z

slide-28
SLIDE 28

Generating Text from Latent Space

Z

X X

slide-29
SLIDE 29

Example: Discourse Level Modeling with VAE [Zhao+17]

(Faculty: Eskenazi)

  • Use latent variable as a way to represent entire

discourse in dialog

slide-30
SLIDE 30

Handling Discrete Latent Variables [Zhou+17]

(Faculty: Neubig)

slide-31
SLIDE 31

Structured Latent Variables [Yin+18]

(Faculty: Neubig)

  • Problem: Paucity of training data for structured prediction

problems

  • Idea: Treat the structure as a latent variable in a VAE model
slide-32
SLIDE 32

Better Learning Algorithms for Latent Variable Models [He+2019]

(Faculty: Neubig)

  • Problem: When learning latent variable models,

predicting the latent variables can be difficult

  • Solution: Perform aggressive update of the part of the

model that predicts these variables

slide-33
SLIDE 33

Any Questions?