Machine Learning for Computational Linguistics Distributed - PowerPoint PPT Presentation

Machine Learning for Computational Linguistics Distributed representations Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft June 14, 2016

Introduction SVD June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, methods used 1 / 24 objects of interest, such as Representations of linguistic units Summary Embeddings ▶ Most ML methods we use depend on how we represent the ▶ words, morphemes ▶ sentences, phrases ▶ letters, phonemes ▶ documents ▶ speakers, authors ▶ … ▶ The way we represent these objects interacts with the ML ▶ They also afgect what can be learned

Introduction SVD June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, hand-annotated data WordNets, but they will still be categorical/hard distinctions ‘story’ and ‘tale’ units and their relationships are categorical meaning of the words or their relation to each other treat them as individual symbols Symbolic representations Summary Embeddings 2 / 24 ▶ A common way to represent words (and other units) is to w 1 = ‘cat’, w 2 = ‘dog’, w 3 = ‘book’ ▶ The symbols do not include any information about the use or ▶ They are useful in many NLP tasks, but distinctions between ▶ ‘cat’ as difgerent from ‘dog’ as it is from ‘book’ ▶ The relationship between ‘cat’ and ‘dog’ is not difgerent from ▶ Some of these can be extracted from conventional lexicons or ▶ The similarity/difgerence decisions are typically made based on

Introduction correspond to distances in the high-dimensional vector space June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, useful. The distances in symbolic/one-hot representation are not vectors SVD the word vectors live Summary Vector representations Embeddings 3 / 24 ▶ The idea is to represent the linguistic objects as vectors cat = ( 0 . 1 , 0 . 3 , 0 . 5 , . . . , 0 . 4 ) dog = ( 0 . 2 , 0 . 3 , 0 . 4 , . . . , 0 . 3 ) book = ( 0 . 9 , 0 . 1 , 0 . 8 , . . . , 0 . 3 ) ▶ The (syntactic/semantic) difgerences between the words ▶ Symbolic representations are equivalent to 1-of-K or one-hot cat = ( 0 , . . . , 1 , 0 , 0 , . . . , 0 ) dog = ( 0 , . . . , 0 , 1 , 0 , . . . , 0 ) book = ( 0 , . . . , 0 , 0 , 1 , . . . , 0 )

Introduction SVD June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, target word to a complete document similar representations appear to determine their representations 4 / 24 —Firth (1957) You shall know a word by the company it keeps. Where does the vector representations come from? Summary Embeddings ▶ The vectors are (almost certainly) learned from the data ▶ The idea goes back to, ▶ In practice, we make use of the contexts where the words ▶ The words that appear in similar contexts are mapped to ▶ Context varies from a small window of words around the

Introduction SVD June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, prediction error (word2vec, GloVe, …) target words), and update the vectors to minimize the covariances are assigned to similar vectors (LSA/LSI) How to calculate word vectors techniques like SVD to assign vectors: the words with high Summary Embeddings 5 / 24 ▶ Typically we use unsupervised (or self-supervised) methods ▶ Common approaches: ▶ Obtain global counts of words in each context, and use ▶ Predict the words from their context (or the context from the ▶ Model each word as a mixture of latent variables (LDA)

Introduction 0 0 0 1 cats 1 1 0 0 dogs 1 1 0 books reads 0 0 1 1 and 1 1 0 0 Ç. Çöltekin, SfS / University of Tübingen June 14, 2016 0 0 SVD S3 Embeddings Summary A toy example A four-sentence corpus with bag of words (BOW) model. The corpus: S1: She likes cats and dogs S2: He likes dogs and cats S3: She likes books S4: He reads books Term-document (sentence) matrix S1 S2 S4 1 she 1 0 1 0 he 0 1 0 1 likes 1 1 6 / 24

Introduction 0 dogs 1 0 0 0 0 1 0 0 0 cats 0 0 0 0 0 0 1 0 0 reads 0 0 0 0 0 0 1 0 0 0 0 June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, 0 0 1 1 0 0 0 0 0 and 0 1 0 0 1 1 0 0 0 books 1 0 0 0 0 2 likes SVD 0 Embeddings Summary A toy example A four-sentence corpus with bag of words (BOW) model. The corpus: S1: She likes cats and dogs S2: He likes dogs and cats S3: She likes books S4: He reads books Term-term (left-context) matrix 0 she 2 0 6 / 24 0 0 2 0 0 0 0 0 0 he 0 0 0 0 0 s s k d s s s g o d e e a t e k o o n # h a e h i c d b a s r l

Introduction 1 0 reads 0 0 0 1 cats 1 1 0 0 dogs 1 0 1 0 books 0 0 1 1 and 1 1 0 0 Ç. Çöltekin, SfS / University of Tübingen June 14, 2016 1 1 SVD S1 Embeddings Summary Term-document matrices terms: similar terms appear in similar contexts the context: similar contexts contain similar words matrices are typically sparse and large likes Term-document (sentence) matrix S2 0 1 0 1 0 he 7 / 24 S3 1 0 1 she S4 ▶ The rows are about the ▶ The columns are about ▶ The term-context

Introduction can be decomposed as June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, dimensionality of rows (terms) and columns (documents) SVD 8 / 24 SVD (again) algebra Summary Embeddings ▶ Singular value decomposition is a well-known method in linear ▶ An n × m ( n terms m documents) term-document matrix X X = UΣV T U is a n × r unitary matrix, where r is the rank of X ( r ⩽ min ( n , m ) ). Columns of U are the eigenvectors of XX T Σ is a r × r diagonal matrix of singular values (square root of eigenvalues of XX T and X T X ) V T is a r × m unitary matrix. Columns of V are the eigenvectors of X T X ▶ One can consider U and V as PCA performed for reducing

Introduction of the data with minimum loss June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, minimum SVD 9 / 24 Summary Truncated SVD Embeddings X = UΣV T ▶ Using eigenvectors (from U and V ) that correspond to k largest singular values ( k < r ), allows reducing dimensionality ▶ The approximation, ˆ X = U k Σ k V k results in the best approximation of X , such that ∥ ˆ X − X ∥ F is

Introduction of the data with minimum loss June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, minimum SVD 10 / 24 Summary Truncated SVD Embeddings X = UΣV T ▶ Using eigenvectors (from U and V ) that correspond to k largest singular values ( k < r ), allows reducing dimensionality ▶ The approximation, ˆ X = U k Σ k V k results in the best approximation of X , such that ∥ ˆ X − X ∥ F is ▶ Note that r may easily be millions (of words or contexts), while we choose k much smaller (at most a few hundreds)

Introduction . . . . . ... . . . SVD . . . ... . . . ... . ... June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, . . . . . . . . . . . . . 11 / 24 . Embeddings . Summary Truncated SVD (2) . . .   x 1 , 1 x 1 , 2 x 1 , 3 x 1 , m . . . x 1 , 1 x 1 , 2 x 1 , 3 . . . x 1 , m     x 2 , 1 x 2 , 2 x 2 , 3 x 2 , m . . .   =   x 3 , 1 x 3 , 2 x 3 , 3 x 3 , m . . .         x n , 1 x n , 2 x n , 3 x n , m . . .   u 1 , 1 u 1 , k . . .     u 2 , 1 u 2 , k σ 1 0 u 1 , 1 u 1 , 2 u 1 , m . . . . . . . . .     u 3 , 1 u 3 , k . . . ×  ×              0 σ k u k , 1 u k , 2 u n , m . . . . . .   u n , 1 u n , k . . .

Introduction . . . . . ... . . . SVD . . . ... . . . ... . ... June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, . . . . . . . . . . . . . 11 / 24 . Embeddings . . . . Truncated SVD (2) Summary   x 1 , 1 x 1 , 2 x 1 , 3 x 1 , m . . . x 1 , 1 x 1 , 2 x 1 , 3 x 1 , m . . .     x 2 , 1 x 2 , 2 x 2 , 3 x 2 , m . . .   =   x 3 , 1 x 3 , 2 x 3 , 3 x 3 , m . . .         x n , 1 x n , 2 x n , 3 x n , m . . .   u 1 , 1 u 1 , k . . .     u 2 , 1 u 2 , k σ 1 0 u 1 , 1 u 1 , 2 u 1 , m . . . . . . . . .     u 3 , 1 u 3 , k . . . ×  ×              0 σ k u k , 1 u k , 2 u n , m . . . . . .   u n , 1 u n , k . . . The term 1 can be represented using the fjrst row of U k

Introduction . . . . . ... . . . SVD . . . ... . . . ... . ... June 14, 2016 SfS / University of Tübingen Ç. Çöltekin, . . . . . . . . . . . . . 11 / 24 . Embeddings . . . . Truncated SVD (2) Summary   x 1 , 1 x 1 , 2 x 1 , 3 . . . x 1 , m x 1 , 1 x 1 , 2 x 1 , 3 x 1 , m . . .     x 2 , 1 x 2 , 2 x 2 , 3 . . . x 2 , m   =   x 3 , 1 x 3 , 2 x 3 , 3 x 3 , m . . .         x n , 1 x n , 2 x n , 3 . . . x n , m   u 1 , 1 u 1 , k . . .     u 2 , 1 u 2 , k σ 1 0 u 1 , 1 u 1 , 2 u 1 , m . . . . . . . . .     u 3 , 1 u 3 , k . . . ×  ×              0 σ k u k , 1 u k , 2 u n , m . . . . . .   u n , 1 u n , k . . . The document 1 can be represented using the fjrst column of V T k

Machine Learning for Computational Linguistics Distributed - PowerPoint PPT Presentation

Machine Learning for Computational Linguistics Distributed representations ar ltekin University of Tbingen Seminar fr Sprachwissenschaft June 14, 2016 Introduction SVD June 14, 2016 SfS / University of Tbingen .

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Introduction to Linguistics Darrell Larsen Linguistics 101 Darrell Larsen Introduction to

Outline zipfR zipfR (Computational) linguistics Evert & Baroni Evert & Baroni

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Topics in Computational Linguistics Topics in Computational Linguistics March 28, 2014 GIL,

One-Shot Learning: Language Acquisition for Machine SS16 Computational Linguistics for

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Linguistics 201 Personnel Introduction to Linguistics General Course Description Syllabus

Automated Extraction of Threat Signatures from Network Flows Piotr Kijewski CERT Polska/NASK

Fast Multi-Level Locks for Java Khilan Gudka Imperial College London Supervised by Susan

Evaluation & Systems Ling573 Systems & Applications April 7, 2016 Roadmap

Conventional Facilities Steve Dixon DOE Independent Project Review of PIP-II 15 November 2016

ADC Stuck Code Feature Jonathan Insler LSU July 29, 2015 ADC Stuck Code Issue 1 Linearity

Formal Verification and Computer Architecture A Validated Formal Model of the x86 ISA for

CS640: Introduction to Computer Networks Aditya Akella Lecture 15 TCP II - Connection

Statistics and Steganalysis CSM25 Secure Information Hiding Dr Hans Georg Schaathun University

Machine Learning for Computational Linguistics Distributed - PowerPoint PPT Presentation

Machine Learning for Computational Linguistics Distributed representations ar ltekin University of Tbingen Seminar fr Sprachwissenschaft June 14, 2016 Introduction SVD June 14, 2016 SfS / University of Tbingen .

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Introduction to Linguistics Darrell Larsen Linguistics 101 Darrell Larsen Introduction to

Outline zipfR zipfR (Computational) linguistics Evert &amp; Baroni Evert &amp; Baroni

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Topics in Computational Linguistics Topics in Computational Linguistics March 28, 2014 GIL,

One-Shot Learning: Language Acquisition for Machine SS16 Computational Linguistics for

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Linguistics 201 Personnel Introduction to Linguistics General Course Description Syllabus

Automated Extraction of Threat Signatures from Network Flows Piotr Kijewski CERT Polska/NASK

Fast Multi-Level Locks for Java Khilan Gudka Imperial College London Supervised by Susan

Evaluation &amp; Systems Ling573 Systems &amp; Applications April 7, 2016 Roadmap

Conventional Facilities Steve Dixon DOE Independent Project Review of PIP-II 15 November 2016

ADC Stuck Code Feature Jonathan Insler LSU July 29, 2015 ADC Stuck Code Issue 1 Linearity

Formal Verification and Computer Architecture A Validated Formal Model of the x86 ISA for

CS640: Introduction to Computer Networks Aditya Akella Lecture 15 TCP II - Connection

Statistics and Steganalysis CSM25 Secure Information Hiding Dr Hans Georg Schaathun University

Outline zipfR zipfR (Computational) linguistics Evert & Baroni Evert & Baroni

Evaluation & Systems Ling573 Systems & Applications April 7, 2016 Roadmap