Discriminative Language Models Prof. Sameer Singh CS 295: - - PowerPoint PPT Presentation

discriminative language models
SMART_READER_LITE
LIVE PREVIEW

Discriminative Language Models Prof. Sameer Singh CS 295: - - PowerPoint PPT Presentation

Discriminative Language Models Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 January 26, 2017 Based on slides from Noah Smith, Richard Socher, and everyone else they copied from. Language Models Probability of a Sentence Is a


slide-1
SLIDE 1

Discriminative Language Models

  • Prof. Sameer Singh

CS 295: STATISTICAL NLP WINTER 2017

January 26, 2017

Based on slides from Noah Smith, Richard Socher, and everyone else they copied from.

slide-2
SLIDE 2

Language Models

CS 295: STATISTICAL NLP (WINTER 2017) 2

Probability of a Sentence

  • Is a given sentence something you would expect to see?
  • Syntactically (grammar) and Semantically (meaning)

Probability of the Next Word

  • Predict what comes next for a given sequence of words.
  • Think of it as V‐way classification
slide-3
SLIDE 3

Outline

CS 295: STATISTICAL NLP (WINTER 2017) 3

Discriminative Language Models Feed‐forward Neural Networks Recurrent Neural Networks Upcoming..

slide-4
SLIDE 4

Outline

CS 295: STATISTICAL NLP (WINTER 2017) 4

Discriminative Language Models Feed‐forward Neural Networks Recurrent Neural Networks Upcoming..

slide-5
SLIDE 5

Logistic Regression Model

CS 295: STATISTICAL NLP (WINTER 2017) 5

slide-6
SLIDE 6

N‐Grams as Logistic Reg.

CS 295: STATISTICAL NLP (WINTER 2017) 6

slide-7
SLIDE 7

Other features…

CS 295: STATISTICAL NLP (WINTER 2017) 7

slide-8
SLIDE 8

Outline

CS 295: STATISTICAL NLP (WINTER 2017) 8

Discriminative Language Models Feed‐forward Neural Networks Recurrent Neural Networks Upcoming..

slide-9
SLIDE 9

Logistic Reg. w/ Embeddings

CS 295: STATISTICAL NLP (WINTER 2017) 9

slide-10
SLIDE 10

Neural Networks

CS 295: STATISTICAL NLP (WINTER 2017) 10

slide-11
SLIDE 11

Activation Functions

CS 295: STATISTICAL NLP (WINTER 2017) 11

And many others… ReLUs, PReLUs, ELU, step, max, and so on.. sigmoid tanh softmax

slide-12
SLIDE 12

Why do they work?

CS 295: STATISTICAL NLP (WINTER 2017) 12

https://colah.github.io

slide-13
SLIDE 13

Why do they work?

CS 295: STATISTICAL NLP (WINTER 2017) 13

x1 x2 y z

slide-14
SLIDE 14

Simulated Example

CS 295: STATISTICAL NLP (WINTER 2017) 14

https://github.com/clab/cnn/blob/master/examples/xor.cc

slide-15
SLIDE 15

Simple Feedforward NN LM

CS 295: STATISTICAL NLP (WINTER 2017) 15

Bigram Model

slide-16
SLIDE 16

Simple Feedforward NN LM

CS 295: STATISTICAL NLP (WINTER 2017) 16

N‐gram Model

slide-17
SLIDE 17

Deep Feedforward NN LM

CS 295: STATISTICAL NLP (WINTER 2017) 17

Bengio et al. 2003

slide-18
SLIDE 18

Outline

CS 295: STATISTICAL NLP (WINTER 2017) 18

Discriminative Language Models Feed‐forward Neural Networks Recurrent Neural Networks Upcoming..

slide-19
SLIDE 19

Sequence View of Simple NNs

CS 295: STATISTICAL NLP (WINTER 2017) 19

slide-20
SLIDE 20

Recurrent Neural Networks

CS 295: STATISTICAL NLP (WINTER 2017) 20

slide-21
SLIDE 21

Example: “I love food”

CS 295: STATISTICAL NLP (WINTER 2017) 21

I love food love food <eos>

slide-22
SLIDE 22

Power of RNNs: Characters!

CS 295: STATISTICAL NLP (WINTER 2017) 22

http://karpathy.github.io/2015/05/21/rnn‐effectiveness/

slide-23
SLIDE 23

Char‐RNNs: Shakespeare!

CS 295: STATISTICAL NLP (WINTER 2017) 23

slide-24
SLIDE 24

Char‐RNNs: Wikipedia!

CS 295: STATISTICAL NLP (WINTER 2017) 24

slide-25
SLIDE 25

Char‐RNNs: Linux Code!

CS 295: STATISTICAL NLP (WINTER 2017) 25

slide-26
SLIDE 26

Extension: Stacking

CS 295: STATISTICAL NLP (WINTER 2017) 26

slide-27
SLIDE 27

Extension: Bidirectional RNNs

CS 295: STATISTICAL NLP (WINTER 2017) 27

slide-28
SLIDE 28

Deep Bidirectional RNNs

CS 295: STATISTICAL NLP (WINTER 2017) 28

slide-29
SLIDE 29

Extension: GRUs

CS 295: STATISTICAL NLP (WINTER 2017) 29

Gated Recurrent Units

slide-30
SLIDE 30

Extension: GRUs

CS 295: STATISTICAL NLP (WINTER 2017) 30

Gated Recurrent Units

slide-31
SLIDE 31

Estimating Parameters

CS 295: STATISTICAL NLP (WINTER 2017) 31

Beyond the scope of the course

  • Lots of tricks, heuristics, “domain knowledge”
  • Lot of engineering for efficiency, e.g. GPUs
  • New training algorithms being proposed every year
  • sometimes, architecture‐specific
  • Lots of available tools you can use!
  • Tensorflow, Torch, Keras, MxNET, etc.
slide-32
SLIDE 32

Outline

CS 295: STATISTICAL NLP (WINTER 2017) 32

Discriminative Language Models Feed‐forward Neural Networks Recurrent Neural Networks Upcoming..

slide-33
SLIDE 33

Homework 1 so far…

CS 295: STATISTICAL NLP (WINTER 2017) 33

Public Private

slide-34
SLIDE 34

Ruslan Salakhutdinov

CS 295: STATISTICAL NLP (WINTER 2017) 34

Professor at Carnegie Mellon University Director of Artificial Intelligence, Apple Inc.

Learning Deep Unsupervised and Multimodal Models

Location: DBH 6011 Time: 11am ‐ 12pm Date: January 27, 2017 Meeting with PhD students, will post on Piazza

slide-35
SLIDE 35

Upcoming…

CS 295: STATISTICAL NLP (WINTER 2017) 35

  • Homework 1 is due tonight: January 26, 2017
  • Write‐up, data, and code for Homework 2 is up
  • Homework 2 is due: February 9, 2017

Homework

  • Proposal is due: February 7, 2017 (~2 weeks)
  • Only 2 pages

Project