Vector Semantics and Embeddings CSE354 - Spring 2020 Natural - PowerPoint PPT Presentation

Vector Semantics and Embeddings CSE354 - Spring 2020 Natural Language Processing

Tasks ● Dimensionality Reduction ● Vectors which represent words ● Recurrent Neural Network and how? or sequences Sequence Models

Objective To embed: convert a token (or sequence) to a vector that represents meaning .

Objective To embed: convert a token (or sequence) to a vector that represents meaning, or is useful to perform downstream NLP application.

Objective embed port

Objective 0 … embed port 0 1 … 0

Objective Prefer dense vectors ● Less parameters (weights) for one-hot is sparse vector machine learning model. ● May generalize better implicitly. 0 ● May capture synonyms … embed port 0 For deep learning, in practice, they work 1 better. Why? Roughly, less parameters … 0 becomes increasingly important when you are learning multiple layers of weights rather than just a single layer.

Objective Prefer dense vectors ● Less parameters (weights) for one-hot is sparse vector machine learning model. ● May generalize better implicitly. 0 ● May capture synonyms … embed port 0 For deep learning, in practice, they work 1 better. Why? Roughly, less parameters … 0 becomes increasingly important when you are learning multiple layers of weights rather than just a single layer. (Jurafsky, 2012)

Objective Prefer dense vectors ● Less parameters (weights) for one-hot is sparse vector machine learning model. ● May generalize better implicitly. 0 10 ● May capture synonyms … embed port 0 For deep learning, in practice, they work 1 18 better. Why? Roughly, less parameters … 10 10 0 becomes increasingly important when you are 2 3 5 learning multiple layers of weights rather than 9 just a single layer. 0 (Jurafsky, 2012) 0 5 10 15 20

Objective To embed: convert a token (or sequence) to a vector that represents meaning.

Objective To embed: convert a token (or sequence) to a vector that represents meaning. Wittgenstein, 1945: “ The meaning of a word is its use in the language ” Distributional hypothesis -- A word’s meaning is defined by all the different contexts it appears in (i.e. how it is “distributed” in natural language). Firth, 1957: “ You shall know a word by the company it keeps ”

Objective To embed: convert a token (or sequence) to a vector that represents meaning. Wittgenstein, 1945: “ The meaning of a word is its use in the language ” Distributional hypothesis -- A word’s meaning is defined by all the different contexts it appears in (i.e. how it is “distributed” in natural language). Firth, 1957: “ You shall know a word by the company it keeps ” The nail hit the beam behind the wall.

Distributional Hypothesis The nail hit the beam behind the wall.

Objective 0.53 embed 1.5 port 3.21 -2.3 .76

Objective port .n.1 (a place (seaport or airport) where people and merchandise can enter or leave a country) port .n.2 port wine (sweet dark-red dessert wine originally from Portugal) 0.53 port .n.3, embrasure, porthole (an opening (in a embed 1.5 wall or ship or armored vehicle) for firing port 3.21 through) -2.3 larboard, port .n.4 (the left side of a ship or .76 aircraft to someone who is aboard and facing the bow or nose) interface, port .n.5 ((computer science) computer circuit consisting of the hardware and associated circuitry that links one device with another (especially a computer and a hard disk drive or other peripherals))

How? 1. One-hot representation 2. Selectors (represent context by “multi-hot” representation) 3. From PCA/Singular Value Decomposition (Know as “Latent Semantic Analysis” in some circumstances) Tf-IDF: Term Frequency, Inverse Document Frequency, PMI: Point-wise mutual information, ...etc…

How? 1. One-hot representation 2. Selectors (represent context by “multi-hot” representation) 3. From PCA/Singular Value Decomposition (Know as “Latent Semantic Analysis” in some circumstances) “Neural Embeddings”: 4. Word2vec 5. Fasttext 6. Glove 7. Bert

How? 1. One-hot represent 2. Selectors (represent context by “multi-hot” representation) 3. From PCA/Singular Value Decomposition (Know as “Latent Semantic Analysis” in some circumstances) “Neural Embeddings”: 0 …, word1, word2, bill , word3, word4, ... 1 4. Word2vec 0 5. Fasttext 0 0 6. Glove 1 7. Bert 0 0 0 ...

How? 1. One-hot represent 2. Selectors (represent context by “multi-hot” representation) 3. From PCA/Singular Value Decomposition (Know as “Latent Semantic Analysis” in some circumstances) “Neural Embeddings”: 4. Word2vec 5. Fasttext 6. Glove 7. Bert

SVD-Based Embeddings Singular Value Decomposition...

Concept, In Matrix Form: columns: p features f1, f2, f3, f4, … fp o1 o2 o3 … rows: n observations o n

SVD-Based Embeddings f1, f2, f3, f4, … fp o1 o2 o3 … o n

Dimensionality reduction SVD-Based Embeddings -- try to represent with only p’ dimensions f1, f2, f3, f4, … fp c1, c2, c3, c4, … cp’ o1 o1 o2 o2 o3 o3 … … o n o n

Concept: Dimensionality Reduction in 3-D, 2-D, and 1-D P = 2 P’ = 1 Data (or, at least, what we want from the data) may be accurately represented with less dimensions.

Concept: Dimensionality Reduction in 3-D, 2-D, and 1-D P = 2 P = 3 P’ = 1 P’ = 2 Data (or, at least, what we want from the data) may be accurately represented with less dimensions.

Concept: Dimensionality Reduction Rank: Number of linearly independent columns of A. (i.e. columns that can’t be derived from the other columns through addition). 1 -2 3 Q: What is the rank of this matrix? 2 -3 5 1 1 0

Concept: Dimensionality Reduction Rank: Number of linearly independent columns of A. (i.e. columns that can’t be derived from the other columns through addition). 1 -2 3 Q: What is the rank of this matrix? 2 -3 5 1 1 0 A: 2. The 1st is just the sum of the second two columns 1 -2 … we can represent as linear combination of 2 vectors: 2 -3 1 1

Dimensionality reduction SVD-Based Embeddings -- try to represent with only p’ dimensions context words are features f1, f2, f3, f4, … fp c1, c2, c3, c4, … cp’ o1 o1 o2 o2 o3 o3 … … co-occurence counts are cells. o n o n target words are observations

Dimensionality Reduction - PCA Linear approximates of data in r dimensions. Found via Singular Value Decomposition: T X [nxp] = U [nxr] D [rxr] V [pxr] X: original matrix, U: “left singular vectors”, D: “singular values” (diagonal), V: “right singular vectors”

Dimensionality Reduction - PCA Linear approximates of data in r dimensions. Found via Singular Value Decomposition: T X [nxp] = U [nxr] D [rxr] V [pxr] X: original matrix, U: “left singular vectors”, D: “singular values” (diagonal), V: “right singular vectors” p p ≈ n X n

Dimensionality Reduction - PCA - Example T X [nxp] = U [nxr] D [rxr] V [pxr] Word co-occurrence counts:

Dimensionality Reduction - PCA - Example T X [nxp] ≅ U [nxr] D [rxr] V [pxr] target co-occ count with “nail” Observation: “beam.” count(beam, hit) = 100 -- horizontal dimension count(beam, nail) = 80 -- vertical dimension target co-occurence count with “hit”

Dimensionality Reduction - PCA Linear approximates of data in r dimensions. Found via Singular Value Decomposition: T X [nxp] ≅ U [nxr] D [rxr] V [pxr] X: original matrix, U: “left singular vectors”, D: “singular values” (diagonal), V: “right singular vectors” Projection (dimensionality reduced space) in 3 dimensions: T ) (U [nx3] D [3x3] V [px3] To reduce features in new dataset, A: A [ m x p ] VD = A small[ m x 3 ]

Dimensionality Reduction - PCA Linear approximates of data in r dimensions. Found via Singular Value Decomposition: T X [nxp] ≅ U [nxr] D [rxr] V [pxr] X: original matrix, U: “left singular vectors”, D: “singular values” (diagonal), V: “right singular vectors” To check how well the original matrix can be reproduced: Z [nxp] = U D V T , How does Z compare to original X? To reduce features in new dataset: A [ m x p ] VD = A small[ m x 3 ]

Vector Semantics and Embeddings CSE354 - Spring 2020 Natural - PowerPoint PPT Presentation

Vector Semantics and Embeddings CSE354 - Spring 2020 Natural Language Processing Tasks Dimensionality Reduction Vectors which represent words Recurrent Neural Network and how? or sequences Sequence Models Objective To

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

Word, Sense and Contextualized Embeddings: Vector Representations of Meaning in NLP Jose

"To a man with a hammer, everything looks like a nail" -Mark Twain Underneath the

MeanSum : A Neural Model for Unsupervised Multi-Document Abstractive Summarization Eric Chu *

The Final Nail in WEPs Coffin Andrea Bittau 1 Mark Handley 1 Joshua Lackey 2 May 24, 2006 1

Data Assimilaon of Satellite Acve Fire Detec*on in Coupled

Introduction to embodiment Language has function Language is situated Interpreting

Advice-Based Exploration in Model-Based Reinforcement Learning Rodrigo Toro Icarte 1 , 2 Toryn Q.

Nailgun: Breaking the Privilege Isolation on ARM Zhenyu Ning COMPASS Lab Wayne State University

Envelopes and String Art Gregory Quenell 1 Activity: 1 Draw line segments connecting 0 . 8 (0

Vector Semantics and Embeddings CSE354 - Spring 2020 Natural - PowerPoint PPT Presentation

Vector Semantics and Embeddings CSE354 - Spring 2020 Natural Language Processing Tasks Dimensionality Reduction Vectors which represent words Recurrent Neural Network and how? or sequences Sequence Models Objective To

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

Word, Sense and Contextualized Embeddings: Vector Representations of Meaning in NLP Jose

&quot;To a man with a hammer, everything looks like a nail&quot; -Mark Twain Underneath the

MeanSum : A Neural Model for Unsupervised Multi-Document Abstractive Summarization Eric Chu *

The Final Nail in WEPs Coffin Andrea Bittau 1 Mark Handley 1 Joshua Lackey 2 May 24, 2006 1

Data Assimila*on of Satellite Ac*ve Fire Detec*on in Coupled

Introduction to embodiment Language has function Language is situated Interpreting

Advice-Based Exploration in Model-Based Reinforcement Learning Rodrigo Toro Icarte 1 , 2 Toryn Q.

Nailgun: Breaking the Privilege Isolation on ARM Zhenyu Ning COMPASS Lab Wayne State University

Envelopes and String Art Gregory Quenell 1 Activity: 1 Draw line segments connecting 0 . 8 (0

"To a man with a hammer, everything looks like a nail" -Mark Twain Underneath the

Data Assimilaon of Satellite Acve Fire Detec*on in Coupled