Mikolovs Language Models: Distributed Representations of Sentences - - PowerPoint PPT Presentation

mikolov s language models
SMART_READER_LITE
LIVE PREVIEW

Mikolovs Language Models: Distributed Representations of Sentences - - PowerPoint PPT Presentation

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations Mikolovs Language Models: Distributed Representations of Sentences and Documents Recurrent Neural


slide-1
SLIDE 1

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations

Mikolov’s Language Models:

Distributed Representations of Sentences and Documents Recurrent Neural Language Model Tomas Mikolov1 May 16, 2014

1Google Inc1 Tomas Mikolov2 Mikolov’s Language Models:

slide-2
SLIDE 2

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations

Table of contents

1 Motivation 2 Introduction and Background 3 Paragraph Embeddings 4 Performance 5 Linguistic Regularities in Continuous Space Word

Representations

Tomas Mikolov3 Mikolov’s Language Models:

slide-3
SLIDE 3

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations

Motivation

Quoth Tomas Mikolov, http://www.fit.vutbr.cz/ imikolov/rnnlm/google.pdf Statistical language models assign probabilities to word sequences Meaningful sentences should be more likely than ambiguous

  • nes

Language modeling is an artificial intelligence problem.

Tomas Mikolov4 Mikolov’s Language Models:

slide-4
SLIDE 4

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations

Classical Ngram Models

Figure: Text Modeling using Markov Chains, Claude Shannon (1984)

max P(wi|wi−1, ...) (1) Where each wi representation is a 1-N encoding.

Tomas Mikolov5 Mikolov’s Language Models:

slide-5
SLIDE 5

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations

Neural Representation of Words

Neural Language Model Bengio et al, 2006.

Figure: Word2Vec, Tomas Mikolov

Tomas Mikolov6 Mikolov’s Language Models:

slide-6
SLIDE 6

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations

Beyond Word Embeddings

Recursive Deep Tensor Models Socher et. al.

Figure: Recursive Tree Structure, Richard Socher 2013

Tomas Mikolov7 Mikolov’s Language Models:

slide-7
SLIDE 7

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations

Beyond Word Embeddings

Recurrent Neural Network Language Model Mikolov et. al.

Figure: Recurrent NN, Tomas Mikolov 2010

Tomas Mikolov8 Mikolov’s Language Models:

slide-8
SLIDE 8

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations

Beyond Word Embeddings

Character-Level Recognition

Figure: Text Understanding from Scratch, Zhang, LeCun 2015

Tomas Mikolov9 Mikolov’s Language Models:

slide-9
SLIDE 9

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations

Algorithm Overview

Figure: Paragraph Embedding, Learning Model, Tomas Mikolov 2013

Tomas Mikolov10 Mikolov’s Language Models:

slide-10
SLIDE 10

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations

Algorithmic Overview

Part 1. Word embeddings. Given sentence w1, w2, w3...: max 1 T

T−k

X

t=k

log p(wt|wt−k, ..., wt+k) (2) where p(wt|wt−k, ..., wt+k) = eywt P

i eyi

(3)

Tomas Mikolov11 Mikolov’s Language Models:

slide-11
SLIDE 11

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations

Algorithmic Overview

Parameters for Step 1: U, b y = b + Uh(wt−k, ..., wt+k; W ) (4)

Tomas Mikolov12 Mikolov’s Language Models:

slide-12
SLIDE 12

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations

Algorithmic Overview

Part II. Joint Word and Paragraph y = b + Uh(wt−k, ..., wt+k; W , D) (5) W ∈ Rp×N D ∈ Rp×M p × (M + N)

Tomas Mikolov13 Mikolov’s Language Models:

slide-13
SLIDE 13

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations

Algorithm Overview

Figure: Distributed Memory Model

Tomas Mikolov14 Mikolov’s Language Models:

slide-14
SLIDE 14

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations

Algorithm Overview

Figure: Distributed Bag of Words Model Model

Tomas Mikolov15 Mikolov’s Language Models:

slide-15
SLIDE 15

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations

Sentiment Analysis

Figure: Stanford Sentiment Treebank Dataset

Tomas Mikolov16 Mikolov’s Language Models:

slide-16
SLIDE 16

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations

Sentiment Analysis

Figure: iMDB Dataset

Tomas Mikolov17 Mikolov’s Language Models:

slide-17
SLIDE 17

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations

Model

Figure: Recurrent NN, Tomas Mikolov 2010

Tomas Mikolov18 Mikolov’s Language Models:

slide-18
SLIDE 18

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations

Components:

input : x(t) = w(t) + s(t − 1) hidden : sj(t) = f ⇣ X

i

xi(t) ∗ uji ⌘

  • utput : yk(t) = g

⇣ X

j

sj(t) ∗ vkj ⌘ where f is sigmoid and g is softmax.

Tomas Mikolov19 Mikolov’s Language Models:

slide-19
SLIDE 19

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations

Spatial Meaning:

Vector Offset Method for Running Linguistic Analogy Questions: y = xb − xa + xc w∗ = arg max

w

xwy ||xw||||y||

Tomas Mikolov20 Mikolov’s Language Models:

slide-20
SLIDE 20

Motivation Introduction and Background Paragraph Embeddings Performance Linguistic Regularities in Continuous Space Word Representations

Results

Tomas Mikolov21 Mikolov’s Language Models: