Distributional Semantics LING 571 Deep Processing Methods in NLP - PowerPoint PPT Presentation

Distributional Semantics LING 571 — Deep Processing Methods in NLP November 4, 2019 Shane Steinert-Threlkeld 1

Walking the Walk Ski Chomp = Chomsky! 2

Punny Department 3

Recap: What is a word? ● Acoustically or orthographically similar → can have different meanings! ● Acoustically or orthographically different → can have similar meanings! 4

Recap: What is a word? ● Words can also have relationships that cover: ● Different shades of meaning ● Part-Whole relationships 5

Recap: What is a word? ● For now, we will set aside homonyms ● (Specifically, homographs ) ● Investigate word meaning as we can model it as (dis-) similarity 6

Distributional Similarity 7

Distributional Similarity ● “You shall know a word by the company it keeps!” ( Firth, 1957 ) ● A bottle of tezgüino is on the table. ● Everybody likes tezgüino . ● T ezgüino makes you drunk. ● We make tezgüino from corn. ● Tezguino; corn-based alcoholic beverage. (From Lin, 1998a ) 8

Distributional Similarity ● How can we represent the “company” of a word? ● How can we make similar words have similar representations? 9

Vectors: A Refresher ● A vector is a list of numbers ● Each number can be thought of as representing a “dimension” ● a ⃗ = 〈 2,4 〉 y-axis 6 ● b ⃗ = 〈 -4,3 〉 5 4 ● What if we thought of each dimension as   3 a “quantity” of a word, rather than an arbitrary   2 b x-axis 1 dimension? -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 -2 -3 -4 -5 -6 10

Vectors: A Refresher ● A vector is a list of numbers ● Each number can be thought of as representing a “dimension” ● a ⃗ = 〈 2,4 〉 “up” -ness y-axis 6 ● b ⃗ = 〈 -4,3 〉 5 4 ● What if we thought of each dimension as   3 a “quantity” of a word, rather than an arbitrary   2 b 1 “long” -ness dimension? -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 -2 -3 -4 -5 -6 11

Vectors: A Refresher ● A vector is a list of numbers ● Each number can be thought of as representing a “dimension” ● a ⃗ = 〈 2,4 〉 “up” -ness y-axis Skyscraper 6 ● b ⃗ = 〈 -4,3 〉 5 4 ● What if we thought of each dimension as   Bridge 3 “quantity” of a word, rather than an arbitrary   2 1 “long” -ness dimension? -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 Highway -2 -3 -4 -5 -6 12

Vectors: A Refresher xkcd.com/388 ● A vector is a list of numbers ● Each number can be thought of as representing a “dimension” ● a ⃗ = 〈 2,4 〉 ● b ⃗ = 〈 -4,3 〉 ● What if we thought of each dimension as   “quantity” of a word, rather than an arbitrary   dimension? 13

Vectors: A Refresher xkcd.com/388 ● A vector is a list of numbers ● Each number can be thought of as representing a “dimension” ● a ⃗ = 〈 2,4 〉 ● b ⃗ = 〈 -4,3 〉 ● What if we thought of each dimension as   “quantity” of a word, rather than an arbitrary   dimension? WTF, Grapefruit? 14

Vector Space: Documents ● We can represent documents as vectors, with each dimension being a count of a particular word Shakespeare Plays x Counts of Words As You Twelfth Julius Henry Like It Night Caesar V battle 1 1 8 15 soldier 2 2 12 36 fool 37 58 1 5 clown 5 117 0 0 15

Vector Space: Documents ● We can represent documents as vectors, with each dimension being a count of a particular word Shakespeare Plays x Counts of Words Julius Henry As You Twelfth Caesar V Like It Night battle 1 1 8 15 soldier 2 2 12 36 fool 37 58 1 5 clown 5 117 0 0 16

Vector Space: Documents ● We can represent documents as vectors, with each dimension being a count of a particular word Dramatic 40 Henry V [5,15] 15 battle 10 Julius Caesar [1,8] Comedic 5 Twelfth Night [58,1] As You Like It [37,1] 5 10 15 20 25 30 35 40 45 50 55 60 fool J&M 3 rd ed, 6.3.1 [link] 17

Vector Space: Words ● Find thematic clusters for words based on words that occur around them. 18

Distributional Similarity ● Represent ‘company’ of word such that similar words will have similar representations ● ‘Company’ = context ● Word represented by context feature vector ● Many alternatives for vector ● Initial representation: ● ‘Bag of words’ feature vector ● Feature vector length N , where N is size of vocabulary ● f i +=1 if word i within window size w of word 19

There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in the rainforest that we have not yet discovered. Biological Example The Paulus company was founded in 1938. Since those days the product range has been the subject of constant expansions and is brought up continuously to correspond with the state of the art. We’re engineering, manufacturing and commissioning worldwide ready-to-run plants packed with our comprehensive know-how. Our Product Range includes pneumatic conveying systems for carbon, carbide, sand, lime and many others. We use reagent injection in molten metal for the… Industrial Example Label the First Use of “Plant” 20

-1 +1 There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in the rainforest that we have not yet discovered. plant: (and: 1, of: 1) 21

-2 +2 There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in the rainforest that we have not yet discovered. plant: (and: 1, animal: 1, kind: 1, of: 1) 22

-3 +3 There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in the rainforest that we have not yet discovered. plant: (and: 1, animal: 1, in: 1, kind: 1, more: 1, of: 1) 23

-4 +4 There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in the rainforest that we have not yet discovered. plant: (and: 1, animal: 1, are: 1, in: 1, kind: 1, more: 1, of: 1, the: 1) 24

-5 +5 There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in the rainforest that we have not yet discovered. plant: (and: 1, animal: 1, are: 1, in: 1, kind: 1, more: 1, of: 1, rainforest: 1, the: 1, there: 1) 25

There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in the rainforest that we have not yet discovered. plant: (and: 1, animal: 2, are: 1, in: 1, kind: 1, more: 1, of: 1, rainforest: 1, the: 1, there: 1, species: 1) 26

There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in the rainforest that we have not yet discovered. plant: (and: 1, animal: 3, are: 2, in: 1, kind: 1, more: 1, of: 1, rainforest: 1, the: 1, there: 1, species: 1) 27

There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in the rainforest that we have not yet discovered. plant: (and: 1, animal: 3, are: 2, in: 1, kind: 1, more: 1, of: 1, rainforest: 2, the: 1, there: 1, species: 1, nowhere: 1) 28

There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in the rainforest that we have not yet discovered. plant: (and: 1, animal: 3, are: 2, in: 1, kind: 1, more: 1, of: 1, rainforest: 2, the: 1, there: 1, species: 1, nowhere: 1) 29

Context Feature Vector aardvark … computer data pinch result sugar apricot 0 … 0 0 1 0 1 pineapple 0 … 0 0 1 0 1 digital 0 … 2 1 0 1 0 information 0 … 1 6 0 4 0 30

Distributional Similarity Questions What is the right neighborhood? How should we weight the features? How can we compute the similarity between vectors? 31

Distributional Semantics LING 571 Deep Processing Methods in NLP - PowerPoint PPT Presentation

Distributional Semantics LING 571 Deep Processing Methods in NLP November 4, 2019 Shane Steinert-Threlkeld 1 Walking the Walk Ski Chomp = Chomsky! 2 Punny Department 3 Recap: What is a word? Acoustically or orthographically

Distributional Semantics The unsupervised modeling of meaning on a large scale Tim Van de Cruys

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Distributional Compositionality Intro to Distributional Semantics Raffaella Bernardi University

Logic and Natural Language Semantics: Distributional Semantics R affaella B ernardi DISI, U

Modelling constructional change with distributional semantics Florent Perek Overview o Applying

Synonymy in an approach to combined distributional and compositional semantics Ann Copestake and

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Distributional Semantics Crash Course September 11, 2018 CSCI 2952C: Computational Semantics

Distributional Semantics Joo Sedoc IntroHLT class November 4, 2019 Intuition of

JoBimText Framework for Distributional Semantics Alexander Panchenko TU Darmstadt FG

Natural Language Processing (CSEP 517): Distributional Semantics Roy Schwartz 2017 c

Combining distributional semantics and structured data to study lexical change Astrid van Aggelen ,

Linear mixed models with improper priors and flexible distributional assumptions for longitudinal

Statistics and Samples in Distributional Reinforcement Learning Mark Rowland, Robert Dadashi,

Statistics and Samples in Distributional Reinforcement Learning Rowland, Dadashi, Kumar, Munos,

1 Depreciation Straight Line Method Cost of Bowing Machine 78,124 Estimated Residual Value

Foreign Competition and Banking Industry Dynamics: An Application to Mexico Pablo DErasmo 1

Introduction. Historically, there are many episodes/cases of financial turmoil. The outcome of

DK-update Emc2 Malaga 17th of October 2006 The project, DK-AAI Steering commitee Niels

UJ Cluster workshop Introduction About me Ben Clifford University of Chicago Computation

On the Higgs potential in the Minimal S 3 -Invariant Extension of the Standard Model U. J.

10/2/20 After the Goldrush: Testing Medical Cannabis and CBD in Chronic Pain Patients Douglas

1 SLANG WORLDWIDE IS AN INDUSTRY LEADER IN CANNABIS CONSUMER P ACKAGED GOODS 2 OUR PRODUCTS

Sambuz

Useful Links

Newsletter

Mail Us