Lesson 4 Deep learning for NLP: Word Representa7on Learning
October 20, 2016
EPFL Doctoral Course EE-724 Nikolaos Pappas Idiap Research Ins7tute, Mar7gny
Lesson 4 Deep learning for NLP: Word Representa7on Learning - - PowerPoint PPT Presentation
Human Language Technology: Applica7on to Informa7on Access Lesson 4 Deep learning for NLP: Word Representa7on Learning October 20, 2016 EPFL Doctoral Course EE-724 Nikolaos Pappas Idiap Research Ins7tute, Mar7gny Outline of the talk 1.
October 20, 2016
EPFL Doctoral Course EE-724 Nikolaos Pappas Idiap Research Ins7tute, Mar7gny
Nikolaos Pappas
/59
2
Nikolaos Pappas
/59
func7on to increase task performance
➡ Representa)on Learning: a[empts to learn
automa7cally good features or representa7ons
➡ Deep Learning: machine learning algorithms based on
mul7ple levels of representa7on or abstrac7on
3
Nikolaos Pappas
/59
4
Nikolaos Pappas
/59
and oYen incomplete
learnable framework that can handle a variety of input, such as vision, speech, and language.
5
Nikolaos Pappas
/59
6
Nikolaos Pappas
/59
7
Nikolaos Pappas
/59
by Krizhevsky et al. 2012
reduc7on at top 1 and 5
8
Nikolaos Pappas
/59
levels (phonology, morphology, syntax, seman7cs) and applica7ons in NLP
9
S7ll a lot of work to be done… e.g. metrics (beyond “basic” recogni7on - a[en7on, reasoning, planning)
Nikolaos Pappas
/59
10
to each input posi7on — essen7ally parametric pooling
Nikolaos Pappas
/59
2014, Jean et al. 2014, Gulcehre et al. 2015
11
Nikolaos Pappas
/59
temporal / sequen7al
12
Nikolaos Pappas
/59
13
how the human brain works
learning algorithm
response to inputs and produce excite other neurons
Nikolaos Pappas
/59
14
Nikolaos Pappas
/59
15
Nikolaos Pappas
/59
16
Nikolaos Pappas
/59
17
are ini7ally unknown
ahead of 7me what values the logis7c regressions are trying to predict
Nikolaos Pappas
/59
18
are learned directly based
job at predic7ng the target for the next layer
lineari7es in the data!
Nikolaos Pappas
/59
19
Nikolaos Pappas
/59
20
Nikolaos Pappas
/59
21
Nikolaos Pappas
/59
22
that minimizes loss with respect to these parameters
small step towards the direc7on of the nega7ve gradient
Nikolaos Pappas
/59
23
examples instead of en7re training set
Nikolaos Pappas
/59
24
strategies for decaying learning rate of an
according to valida7on set performance
Nikolaos Pappas
/59
25
“backpropagate” the errors to all the hidden layers
Typically, backprop computation is implemented in popular libraries: Theano, Torch, Tensorflow
Nikolaos Pappas
/59
26
Nikolaos Pappas
/59
27
need to build deep neural networks
➡ Advanced neural networks are able to deal with different
arrangements of the input
Nikolaos Pappas
/59
28
Nikolaos Pappas
/59
each one passes on informa7on to its successor
29
* Diagram from Christopher Olah’s blog.
Nikolaos Pappas
/59
30
* Diagram from Christopher Olah’s blog.
term dependencies: Hochreiter and Schmidhuber 1997
input gates into a single “update gate.”
Nikolaos Pappas
/59
31
* Diagram from Christopher Olah’s blog.
memory bank: Graves et al. 2014, Weston et.al 2014
Nikolaos Pappas
/59
32
cap7oning
classifica7on
transla7on
* Diagram from Karpathy’s Stanford CS231n course.
Nikolaos Pappas
/59
temporal / sequen7al
33
* image from Lebret's thesis (2016).
Nikolaos Pappas
/59
34
screwdriver —?—> wrench very similar screwdriver —?—> hammer li[le similar screwdriver —?—> technician related screwdriver —?—> fruit unrelated
The boss fired the worker The supervisor let the employee go very similar The boss reprimanded the worker li[le similar The boss promoted the worker related The boss went for jogging today unrelated
Nikolaos Pappas
/59
35
paragraphs, documents
heart vs surgeon wheel vs bike
doctor vs surgeon bike vs bicycle
Nikolaos Pappas
/59
36
*Image from D. Jurgens’ NAACL 2016 tutorial.
Nikolaos Pappas
/59
37
Nikolaos Pappas
/59
38
➡ Seman7c similarity is not the end-task
Nikolaos Pappas
/59
Massive text corpora
39
Seman)c resources and knowledge bases
Nikolaos Pappas
/59
specific linguis7c items
7ed to explicit concepts
represented linguis7c items
40
Nikolaos Pappas
/59
discrete or “k-hot” vector representa7ons france = [0, 0, 0, 1, 0, 0] england = [0, 1, 0, 0, 0, 0] france is near spain = [1, 0, 0, 1, 1, 1]
41 A
B θ
Nikolaos Pappas
/59
Follow the distribu7onal hypothesis:
“You shall know a word by the company it keeps”, Firth 1957
The value of the central bank increased by 10%. She oYen goes to the bank to withdraw cash. She went to the river bank to have picnic with her child.
42
financial ins)tu)on geographical term
Nikolaos Pappas
/59
subsequent models less robust)
43
words context document
Nikolaos Pappas
/59
co-occurence matrix that we saw previously
44
*Image from D. Jurgens’ NAACL 2016 tutorial.
Nikolaos Pappas
/59
45
*Plots from Rohde et al. 2005
Nikolaos Pappas
/59
46
h[ps://github.com/rlebret/hpca
Nikolaos Pappas
/59
the co-occurence matrix:
incorporate new words
equivalent tuning it is not the case: Levy and Goldberg 2015
47
h[p://nlp.stanford.edu/projects/glove/
Nikolaos Pappas
/59
representa7ons from data
48
Nikolaos Pappas
/59
49
window of words (maximize log likelihood)
Nikolaos Pappas
/59
50
binary logis7c regression of word w and history h:
Nikolaos Pappas
/59
matrix: Levy and Goldberg 2014
methods (new decomp.)
relatedness, categoriza7on and analogy: Baroni et al 2014, Schnabel et al 2015
51
does not influence the projec7on
Nikolaos Pappas
/59
52
between words: present—past tense, singular—plural, male— female, capital—country
arithme7c opera7ons between vectors (+, -) king - man + woman ≈ queen
Nikolaos Pappas
/59
word2vec has the edge over alterna7ves
➡ Several extensions
53
Nikolaos Pappas
/59
can i watch 4od bbc iplayer etc with 10GB useage allowence?
We need to sort out the problem We need to sort the problem out
Man bites dog | Dog bites man A woman: without her, man is nothing.
54
Nikolaos Pappas
/59
Prius A fuel-efficient hybrid car An automobile powered by both an internal combus7on (…)
The boss fired his worker.
This was a good day. | This was a bad day.
55
Nikolaos Pappas
/59
phrases, sentences and documents
dimension means? It depends. We may need to compromise.
56
Nikolaos Pappas
/59
and phrases and their composi7onality.” In NIPS 2013.
Representa7on.” In EMNLP, 2014.
vectors to seman7c lexicons.”, In ACL 2014.
word embeddings." In EMNLP, 2015.
embeddings.” TACL, 2015.
Embeddings Using Word Similarity Tasks.” In RepEval 2016.
representa7ons.” ACL 2015.
1510.00726, 2015.
57
Nikolaos Pappas
/59
➡ Online courses
h[ps://www.coursera.org/learn/neural-networks
h[ps://www.coursera.org/learn/machine-learning
h[p://cs224d.stanford.edu/
➡ Conference tutorials
tutorial. h[p://nlp.stanford.edu/courses/NAACL2013/
Concepts to Documents”, EMNLP 2015 tutorial. h[p://www.emnlp2015.org/tutorials.html#t1
Processing”, NAACL 2016 tutorial. h[p://naacl.org/naacl-hlt-2016/t2.html
58
Nikolaos Pappas
/59
➡ Deep learning toolkits
➡ Pre-trained word vectors and codes
h[ps://code.google.com/p/word2vec/
h[p://nlp.stanford.edu/projects/glove/
h[ps://github.com/rlebret/hpca
h[p://wordvectors.org/
59