CS365 Course Project
Billion Word Imputation
Guide: Prof. Amitabha Mukherjee Group 20: Aayush Mudgal [12008] Shruti Bhargava [13671]
Billion Word Imputation Guide: Prof. Amitabha Mukherjee Group 20: - - PowerPoint PPT Presentation
CS365 Course Project Billion Word Imputation Guide: Prof. Amitabha Mukherjee Group 20: Aayush Mudgal [12008] Shruti Bhargava [13671] Problem Statement Problem Description : https://www.kaggle.com/c/billion-word-imputation Examples : 1.
CS365 Course Project
Guide: Prof. Amitabha Mukherjee Group 20: Aayush Mudgal [12008] Shruti Bhargava [13671]
Problem Description : https://www.kaggle.com/c/billion-word-imputation
1. “Michael described Sarah to a at the shelter .”
2. “He added that people should not mess with mother nature , and let sharks be .”
Location ? Word ? ?
Word as Atomic Units Distributed Representation
Two architectures
words in a sentence or a document
Given a sequence of training words w1, w2, w3, . . . , wT , the objective of the Skip-gram model is to maximize the average log probability : c is the size of the training context (which can be a function of the center word wt)
The basic Skip-gram formulation defines p(𝑥𝑢+𝑘 |𝑥𝑢) using the softmax function where 𝑤𝑥 and 𝑤𝑥
′
are the “input” and “output” vector representations of w W is the number of words in the vocabulary. IMPRACTICAL because the cost of computing ∇ log p(wO|wI ) is proportional to W, which is often large (105–107 terms).
“in”, “the”, and “a”).
million examples
compositions of the individual words.
Therefore, using vectors to represent the whole phrases makes the Skip-gram model considerably more expressive.
compositionality." Advances in Neural Information Processing Systems. 2013.
contrastive estimation." Advances in Neural Information Processing Systems. 2013.
and Yorick Wilks. Proceedings of the Fifth international Conference on Language Resources and Evaluation (LREC-2006), Genoa, Italy, (2006)
preprint arXiv:1301.3781 (2013). Challenge Description and Data : https://www.kaggle.com/c/billion-word-imputation
{ 𝑌 1 , 𝑌 2 , … . , 𝑌 𝑤 } is 1 and others 0
by a VxN matrix W
′ 𝑘 𝑈. ℎ
′
is the j-th column