machine learning for computational linguistics
play

Machine Learning for Computational Linguistics Distributed - PowerPoint PPT Presentation

Machine Learning for Computational Linguistics Distributed representations ar ltekin University of Tbingen Seminar fr Sprachwissenschaft June 14, 2016 Introduction SVD June 14, 2016 SfS / University of Tbingen .


  1. Machine Learning for Computational Linguistics Distributed representations Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft June 14, 2016

  2. Introduction SVD June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, methods used 1 / 24 objects of interest, such as Representations of linguistic units Summary Embeddings ▶ Most ML methods we use depend on how we represent the ▶ words, morphemes ▶ sentences, phrases ▶ letters, phonemes ▶ documents ▶ speakers, authors ▶ … ▶ The way we represent these objects interacts with the ML ▶ They also afgect what can be learned

  3. Introduction SVD June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, hand-annotated data WordNets, but they will still be categorical/hard distinctions ‘story’ and ‘tale’ units and their relationships are categorical meaning of the words or their relation to each other treat them as individual symbols Symbolic representations Summary Embeddings 2 / 24 ▶ A common way to represent words (and other units) is to w 1 = ‘cat’, w 2 = ‘dog’, w 3 = ‘book’ ▶ The symbols do not include any information about the use or ▶ They are useful in many NLP tasks, but distinctions between ▶ ‘cat’ as difgerent from ‘dog’ as it is from ‘book’ ▶ The relationship between ‘cat’ and ‘dog’ is not difgerent from ▶ Some of these can be extracted from conventional lexicons or ▶ The similarity/difgerence decisions are typically made based on

  4. Introduction correspond to distances in the high-dimensional vector space June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, useful. The distances in symbolic/one-hot representation are not vectors SVD the word vectors live Summary Vector representations Embeddings 3 / 24 ▶ The idea is to represent the linguistic objects as vectors cat = ( 0 . 1 , 0 . 3 , 0 . 5 , . . . , 0 . 4 ) dog = ( 0 . 2 , 0 . 3 , 0 . 4 , . . . , 0 . 3 ) book = ( 0 . 9 , 0 . 1 , 0 . 8 , . . . , 0 . 3 ) ▶ The (syntactic/semantic) difgerences between the words ▶ Symbolic representations are equivalent to 1-of-K or one-hot cat = ( 0 , . . . , 1 , 0 , 0 , . . . , 0 ) dog = ( 0 , . . . , 0 , 1 , 0 , . . . , 0 ) book = ( 0 , . . . , 0 , 0 , 1 , . . . , 0 )

  5. Introduction SVD June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, target word to a complete document similar representations appear to determine their representations 4 / 24 —Firth (1957) You shall know a word by the company it keeps. Where does the vector representations come from? Summary Embeddings ▶ The vectors are (almost certainly) learned from the data ▶ The idea goes back to, ▶ In practice, we make use of the contexts where the words ▶ The words that appear in similar contexts are mapped to ▶ Context varies from a small window of words around the

  6. Introduction SVD June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, prediction error (word2vec, GloVe, …) target words), and update the vectors to minimize the covariances are assigned to similar vectors (LSA/LSI) How to calculate word vectors techniques like SVD to assign vectors: the words with high Summary Embeddings 5 / 24 ▶ Typically we use unsupervised (or self-supervised) methods ▶ Common approaches: ▶ Obtain global counts of words in each context, and use ▶ Predict the words from their context (or the context from the ▶ Model each word as a mixture of latent variables (LDA)

  7. Introduction 0 0 0 1 cats 1 1 0 0 dogs 1 1 0 books reads 0 0 1 1 and 1 1 0 0 Ç. Çöltekin, SfS / University of Tübingen June 14, 2016 0 0 SVD S3 Embeddings Summary A toy example A four-sentence corpus with bag of words (BOW) model. The corpus: S1: She likes cats and dogs S2: He likes dogs and cats S3: She likes books S4: He reads books Term-document (sentence) matrix S1 S2 S4 1 she 1 0 1 0 he 0 1 0 1 likes 1 1 6 / 24

  8. Introduction 0 dogs 1 0 0 0 0 1 0 0 0 cats 0 0 0 0 0 0 1 0 0 reads 0 0 0 0 0 0 1 0 0 0 0 June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, 0 0 1 1 0 0 0 0 0 and 0 1 0 0 1 1 0 0 0 books 1 0 0 0 0 2 likes SVD 0 Embeddings Summary A toy example A four-sentence corpus with bag of words (BOW) model. The corpus: S1: She likes cats and dogs S2: He likes dogs and cats S3: She likes books S4: He reads books Term-term (left-context) matrix 0 she 2 0 6 / 24 0 0 2 0 0 0 0 0 0 he 0 0 0 0 0 s s k d s s s g o d e e a t e k o o n # h a e h i c d b a s r l

  9. Introduction 1 0 reads 0 0 0 1 cats 1 1 0 0 dogs 1 0 1 0 books 0 0 1 1 and 1 1 0 0 Ç. Çöltekin, SfS / University of Tübingen June 14, 2016 1 1 SVD S1 Embeddings Summary Term-document matrices terms: similar terms appear in similar contexts the context: similar contexts contain similar words matrices are typically sparse and large likes Term-document (sentence) matrix S2 0 1 0 1 0 he 7 / 24 S3 1 0 1 she S4 ▶ The rows are about the ▶ The columns are about ▶ The term-context

  10. Introduction can be decomposed as June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, dimensionality of rows (terms) and columns (documents) SVD 8 / 24 SVD (again) algebra Summary Embeddings ▶ Singular value decomposition is a well-known method in linear ▶ An n × m ( n terms m documents) term-document matrix X X = UΣV T U is a n × r unitary matrix, where r is the rank of X ( r ⩽ min ( n , m ) ). Columns of U are the eigenvectors of XX T Σ is a r × r diagonal matrix of singular values (square root of eigenvalues of XX T and X T X ) V T is a r × m unitary matrix. Columns of V are the eigenvectors of X T X ▶ One can consider U and V as PCA performed for reducing

  11. Introduction of the data with minimum loss June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, minimum SVD 9 / 24 Summary Truncated SVD Embeddings X = UΣV T ▶ Using eigenvectors (from U and V ) that correspond to k largest singular values ( k < r ), allows reducing dimensionality ▶ The approximation, ˆ X = U k Σ k V k results in the best approximation of X , such that ∥ ˆ X − X ∥ F is

  12. Introduction of the data with minimum loss June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, minimum SVD 10 / 24 Summary Truncated SVD Embeddings X = UΣV T ▶ Using eigenvectors (from U and V ) that correspond to k largest singular values ( k < r ), allows reducing dimensionality ▶ The approximation, ˆ X = U k Σ k V k results in the best approximation of X , such that ∥ ˆ X − X ∥ F is ▶ Note that r may easily be millions (of words or contexts), while we choose k much smaller (at most a few hundreds)

  13. Introduction . . . . . ... . . . SVD . . . ... . . . ... . ... June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, . . . . . . . . . . . . . 11 / 24 . Embeddings . Summary Truncated SVD (2) . . .   x 1 , 1 x 1 , 2 x 1 , 3 x 1 , m . . . x 1 , 1 x 1 , 2 x 1 , 3 . . . x 1 , m     x 2 , 1 x 2 , 2 x 2 , 3 x 2 , m . . .   =   x 3 , 1 x 3 , 2 x 3 , 3 x 3 , m . . .         x n , 1 x n , 2 x n , 3 x n , m . . .   u 1 , 1 u 1 , k . . .     u 2 , 1 u 2 , k σ 1 0 u 1 , 1 u 1 , 2 u 1 , m . . . . . . . . .     u 3 , 1 u 3 , k . . . ×  ×              0 σ k u k , 1 u k , 2 u n , m . . . . . .   u n , 1 u n , k . . .

  14. Introduction . . . . . ... . . . SVD . . . ... . . . ... . ... June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, . . . . . . . . . . . . . 11 / 24 . Embeddings . . . . Truncated SVD (2) Summary   x 1 , 1 x 1 , 2 x 1 , 3 x 1 , m . . . x 1 , 1 x 1 , 2 x 1 , 3 x 1 , m . . .     x 2 , 1 x 2 , 2 x 2 , 3 x 2 , m . . .   =   x 3 , 1 x 3 , 2 x 3 , 3 x 3 , m . . .         x n , 1 x n , 2 x n , 3 x n , m . . .   u 1 , 1 u 1 , k . . .     u 2 , 1 u 2 , k σ 1 0 u 1 , 1 u 1 , 2 u 1 , m . . . . . . . . .     u 3 , 1 u 3 , k . . . ×  ×              0 σ k u k , 1 u k , 2 u n , m . . . . . .   u n , 1 u n , k . . . The term 1 can be represented using the fjrst row of U k

  15. Introduction . . . . . ... . . . SVD . . . ... . . . ... . ... June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, . . . . . . . . . . . . . 11 / 24 . Embeddings . . . . Truncated SVD (2) Summary   x 1 , 1 x 1 , 2 x 1 , 3 . . . x 1 , m x 1 , 1 x 1 , 2 x 1 , 3 x 1 , m . . .     x 2 , 1 x 2 , 2 x 2 , 3 . . . x 2 , m   =   x 3 , 1 x 3 , 2 x 3 , 3 x 3 , m . . .         x n , 1 x n , 2 x n , 3 . . . x n , m   u 1 , 1 u 1 , k . . .     u 2 , 1 u 2 , k σ 1 0 u 1 , 1 u 1 , 2 u 1 , m . . . . . . . . .     u 3 , 1 u 3 , k . . . ×  ×              0 σ k u k , 1 u k , 2 u n , m . . . . . .   u n , 1 u n , k . . . The document 1 can be represented using the fjrst column of V T k

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend