NLP with recurrent networks Chapter 9 in Martin/Jurafsky - - PDF document

nlp with recurrent networks
SMART_READER_LITE
LIVE PREVIEW

NLP with recurrent networks Chapter 9 in Martin/Jurafsky - - PDF document

12/10/19 NLP with recurrent networks Chapter 9 in Martin/Jurafsky Feed-forward networks for text processing Consider a fixed window of previous words Lump all words in a sentence/document into a bag-of-words representation. y


slide-1
SLIDE 1

12/10/19 1

NLP with recurrent networks

Chapter 9 in Martin/Jurafsky

Feed-forward networks for text processing

  • Consider a fixed window of previous words
  • Lump all words in a sentence/document into a bag-of-words representation.

h1 h2

y1

h3

hdh

… …

U W y42 y|V|

Projection layer 1⨉3d

concatenated embeddings for context words

Hidden layer Output layer P(w|u)

in the hole

... ...

ground there lived

word 42 embedding for word 35 embedding for word 9925 embedding for word 45180 wt-1 wt-2 wt wt-3 dh⨉3d 1⨉dh |V|⨉dh P(wt=V42|wt-3,wt-2,wt-3) 1⨉|V|

slide-2
SLIDE 2

12/10/19 2

Limitations of feed-forward networks for text processing

  • In general this is an insufficient model of language

– because language has long-distance dependencies: “The computer(s) which I had just put into the machine room on the fourth floor is (are) crashing.”

  • Alternative: recurrent networks

Simple recurrent networks

  • Recurrent networks deal with sequences of inputs / outputs
  • xt, yt, ht – input/output/state at step t

ht yt xt

slide-3
SLIDE 3

12/10/19 3

Recurrent networks

  • Recurrent network illustrated as a feed forward network

U V W yt xt ht ht-1

ht = g(Uht−1 +Wxt) yt = f(Vht)

Generating output from a recurrent network

U V W U V W U V W x1 x2 x3 y1 y2 y3 h1 h3 h2 h0

function FORWARDRNN(x, network) returns output sequence y h0 ←0 for i←1 to LENGTH(x) do hi ←g(U hi−1 + W xi) yi ← f(V hi) return y

slide-4
SLIDE 4

12/10/19 4

U V W U V W U V W x1 x2 x3 y1 y2 y3 h1 h3 h2 h0 t1 t2 t3

Training recurrent networks

If we "unroll" a recurrent network, we can treat it like a feed-forward network. This approach is called backpropagation through time

Applications of RNNs

Part of speech tagging

Janet will back RNN the bill

slide-5
SLIDE 5

12/10/19 5

x1 x2 x3 xn RNN hn Softmax

Labeling sequences with RNNs

You can classify a sequence using RNNs: Useful for sentiment analysis

Multi-layer RNNs

You can stack multiple recurrent layers:

y1 y2 y3 yn x1 x2 x3 xn RNN 1 RNN 3 RNN 2

slide-6
SLIDE 6

12/10/19 6

Bi-directional RNNs

A sequence is processed in both directions using a bi-directional RNN:

y1 x1 x2 x3 xn RNN 1 (Left to Right) RNN 2 (Right to Left) + y2 + y3 + yn +

Bi-directional RNNs

A bi-directional RNN for sequence classification:

x1 x2 x3 xn RNN 1 (Left to Right) RNN 2 (Right to Left) + hn_forw h1_back Softmax

slide-7
SLIDE 7

12/10/19 7

Long-term memory in RNNs

Although in principle RNNs can retain memory of past inputs it has seen, they tend to focus on the most recent history. Several RNN architectures have been proposed to address this issue and maintain contextual information over time (LSTMs, GRUs).

Other applications of RNNs

Machine translation

Cho, K., van Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. EMNLP 2014.

Sutskever, I. Vinyals, O. & Le. Q. V. Sequence to sequence learning with neural

  • networks. In Proc. Advances in Neural Information Processing Systems 27

3104–3112 (2014). Voice recognition Analysis of DNA sequences

slide-8
SLIDE 8

12/10/19 8

Machine translation using encoder/decoder networks

Composed of two phases:

  • Encoder: learns a representation of the

meaning of the whole sentence

  • Decoder: translates the encoded

representation of the sentence into individual words.

https://arxiv.org/pdf/1406.1078.pdf

Deep learning for DNA sequences

RNNs and convolutional networks have been shown to be useful for predicting various properties of DNA sequence and discovering biological signals that are relevant to a phenomenon of interest

Ameni Trabelsi, Mohamed Chaabane, Asa Ben-Hur. Comprehensive evaluation of deep learning architectures for prediction of DNA/RNA sequence binding specificities. Bioinformatics, 35:14, i269–i277, 2019 (ISMB 2019 special issue).

slide-9
SLIDE 9

12/10/19 9

Deep learning for DNA sequences

Using word2vec helps improve accuracy!