Modelling fine-grained Change in Word Meaning over centuries from - - PowerPoint PPT Presentation

modelling fine grained change in word meaning over
SMART_READER_LITE
LIVE PREVIEW

Modelling fine-grained Change in Word Meaning over centuries from - - PowerPoint PPT Presentation

Modelling fine-grained Change in Word Meaning over centuries from Large Collections of Unstructured Text Lea Frermann Keynote at Drift-a-LOD Workshop, November 20, 2016 Institute for Language, Cognition, and Computation The University of


slide-1
SLIDE 1

Modelling fine-grained Change in Word Meaning over centuries from Large Collections

  • f Unstructured Text

Lea Frermann Keynote at Drift-a-LOD Workshop, November 20, 2016

Institute for Language, Cognition, and Computation The University of Edinburgh

lea@frermann.de www.frermann.de

1 / 23

slide-2
SLIDE 2

The Dynamic Nature of Meaning I

Language is inherently ambiguous Words have different meanings (or senses), e.g. mouse animal shy person computing device

2 / 23

slide-3
SLIDE 3

The Dynamic Nature of Meaning I

Language is inherently ambiguous Words have different meanings (or senses), e.g. mouse animal shy person computing device ... and the relevant sense depends on the context or situation

2 / 23

slide-4
SLIDE 4

The dynamic Nature of Meaning II

Language is a dynamic system The meaning of words is constantly shaped by users and their environment

3 / 23

slide-5
SLIDE 5

The dynamic Nature of Meaning II

Language is a dynamic system The meaning of words is constantly shaped by users and their environment

3 / 23

slide-6
SLIDE 6

The dynamic Nature of Meaning II

Language is a dynamic system The meaning of words is constantly shaped by users and their environment

3 / 23

slide-7
SLIDE 7

The dynamic Nature of Meaning II

Language is a dynamic system The meaning of words is constantly shaped by users and their environment Meaning changes smoothly (in written language, across societies)

3 / 23

slide-8
SLIDE 8

The distributional Hypothesis

“You shall know a word by the company it keeps.” John R. Firth (1957) “The meaning of a word is its use in the language.” Ludwig Wittgenstein (1953)

4 / 23

slide-9
SLIDE 9

The distributional Hypothesis

Distributional Semantics: Take large collections of texts and look at the contexts in which a target word occurs

left context target right context finance director used the mouse and expanded a window nose twitching like a mouse ’s , but Douggie ’s There ’s been a mouse in the pantry , ” she said using the mouse , and learning how to type She can see the mouse rolling that pearl to its hole She was quiet as a mouse most of the time 2000 · · · · · · · · ·

→ characterize senses and their prevalence

5 / 23

slide-10
SLIDE 10

The distributional Hypothesis

Distributional Semantics: Take large collections of texts and look at the contexts in which a target word occurs

left context target right context finance director used the and expanded a window nose twitching like a ’s , but Douggie ’s There ’s been a in the pantry , ” she said using the , and learning how to type She can see the rolling that pearl to its hole She was quiet as a most of the time 2000 · · · · · · · · ·

→ characterize senses and their prevalence

5 / 23

slide-11
SLIDE 11

The distributional Hypothesis

Distributional Semantics: Take large collections of texts and look at the contexts in which a target word occurs

left context target right context finance director used the and expanded a window nose twitching like a ’s , but Douggie ’s There ’s been a in the pantry , ” she said using the , and learning how to type She can see the rolling that pearl to its hole She was quiet as a most of the time 2000 · · · · · · · · ·

nose tail roll cheese cat hole keyboard expand file

  • pen

klick computer quiet shy still timid

→ characterize senses and their prevalence

5 / 23

slide-12
SLIDE 12

The distributional Hypothesis

Distributional Semantics: Take large collections of texts and look at the contexts in which a target word occurs

left context target right context year What teaches the little to hide , with its glimmering 1823 you couldn’t hide a here without its being 1849 Laura thinks she sees a , an’ she trembles an’ she 1915 caused cancer in a

  • r a hamster

1972 she ’s such a quiet little and everyone ’s in love 1982 finance director used the and expanded a window 2000 nose twitching like a ’s , but Douggie ’s 2000 She was quiet as a most of the time 2000 using the , and learning how to type 2000 she clicked the until her fingers tingled 2008

→ characterize senses and their prevalence over time

6 / 23

slide-13
SLIDE 13

Motivation

We want to understand, model, and predict word meaning change at scale

7 / 23

slide-14
SLIDE 14

Motivation

We want to understand, model, and predict word meaning change at scale Why is this an important problem?

  • aid historical sociolinguistic research
  • improve historical text mining and information retrieval
  • aid onthology construction / updating

Can we build task-agnostic models?

  • learn time-specific meaning representations which
  • are interpretable and
  • are useful across tasks

7 / 23

slide-15
SLIDE 15

Data

slide-16
SLIDE 16

DATE – A DiAchronic TExt Corpus

We use three historical corpora

8 / 23

slide-17
SLIDE 17

DATE – A DiAchronic TExt Corpus

We use three historical corpora

8 / 23

slide-18
SLIDE 18

DATE – A DiAchronic TExt Corpus

We use three historical corpora

8 / 23

slide-19
SLIDE 19

DATE – A DiAchronic TExt Corpus

We use three historical corpora Why not Google Books? → only provides up to 5-grams.

8 / 23

slide-20
SLIDE 20

DATE – A DiAchronic TExt Corpus

Data Preprocessing

  • 1. Text Processing
  • riginal →

she clicked the mouse, until her fingers tickled. tokenize → she clicked the mouse , until her fingers tickled . lemmatize → she click the mouse , until she finger tickle . remove stopwords → click mouse finger tickle POS-tag → clickV mouseN fingerN tickleV

  • 2. Cluster texts from 3 corpora by year of publication

→ Create target word-specific training corpora

9 / 23

slide-21
SLIDE 21

DATE – A DiAchronic TExt Corpus

Target word-specific training corpora All mentions of target word with context of ± 5 surrounding words tagged with year of origin

text snippet year

fortitude time woman shrieks mouse rat capable poisoning husband 1749 rabbit lived hole small grey mouse made nest pocket coat 1915 ralph nervous hand twitch computer mouse keyboard pull image file online 1998 scooted chair clicking button wireless mouse hibernate computer stealthy exit 2009

· · ·

10 / 23

slide-22
SLIDE 22

Scan: A Dynamic Model of Sense change

slide-23
SLIDE 23

Model Input and Assumptions

  • target word-specific corpus

text snippet year

fortitude time woman shrieks

mouse

rat capable poisoning husband 1749 rabbit lived hole small grey

mouse

made nest pocket coat 1915 ralph nervous hand twitch computer

mouse

keyboard pull image file online 1998 scooted chair clicking button wireless

mouse

hibernate computer stealthy exit 2009

· · ·

  • number of word senses (K)
  • granularity of temporal intervals (∆T)

(e.g., a year, decade, or century)

11 / 23

slide-24
SLIDE 24

Model Overview

A Bayesian and knowledge-lean model of meaning change of individual words (e.g., “mouse”)

12 / 23

slide-25
SLIDE 25

Model Overview

A Bayesian and knowledge-lean model of meaning change of individual words (e.g., “mouse”)

12 / 23

slide-26
SLIDE 26

Model Overview

A Bayesian and knowledge-lean model of meaning change of individual words (e.g., “mouse”)

12 / 23

slide-27
SLIDE 27

Model Overview

A Bayesian and knowledge-lean model of meaning change of individual words (e.g., “mouse”)

12 / 23

slide-28
SLIDE 28

Model Description: Generative Story

13 / 23

slide-29
SLIDE 29

Model Description: Generative Story

κφ a, b κψ

I Dt−1 I Dt I Dt+1 K

  • 1. Extent of meaning change

Generate temporal sense flexibility parameter κφ ∼ Gamma(a, b)

13 / 23

slide-30
SLIDE 30

Model Description: Generative Story

φt−1 φt φt+1 κφ a, b ψt−1 ψt ψt+1 κψ

I Dt−1 I Dt I Dt+1 K

  • 1. Extent of meaning change

Generate temporal sense flexibility parameter κφ ∼ Gamma(a, b)

  • 2. Time-specific representations

Generate sense distributions φt Generate sense-word distributions ψk,t

13 / 23

slide-31
SLIDE 31

Model Description: Generative Story

φt−1 φt φt+1 κφ a, b ψt−1 ψt ψt+1 κψ

I Dt−1 I Dt I Dt+1 K

  • 1. Extent of meaning change

Generate temporal sense flexibility parameter κφ ∼ Gamma(a, b)

  • 2. Time-specific representations

Generate sense distributions φt Generate sense-word distributions ψk,t

  • 3. Text generation given time t

13 / 23

slide-32
SLIDE 32

Model Description: Generative Story

z z z φt−1 φt φt+1 κφ a, b ψt−1 ψt ψt+1 κψ

I Dt−1 I Dt I Dt+1 K

  • 1. Extent of meaning change

Generate temporal sense flexibility parameter κφ ∼ Gamma(a, b)

  • 2. Time-specific representations

Generate sense distributions φt Generate sense-word distributions ψk,t

  • 3. Text generation given time t

Generate sense z ∼ Mult(φt)

13 / 23

slide-33
SLIDE 33

Model Description: Generative Story

w z z w z w φt−1 φt φt+1 κφ a, b ψt−1 ψt ψt+1 κψ

I Dt−1 I Dt I Dt+1 K

  • 1. Extent of meaning change

Generate temporal sense flexibility parameter κφ ∼ Gamma(a, b)

  • 2. Time-specific representations

Generate sense distributions φt Generate sense-word distributions ψk,t

  • 3. Text generation given time t

Generate sense z ∼ Mult(φt) Generate context words wi ∼ Mult(ψt,k=z)

13 / 23

slide-34
SLIDE 34

Scan: The Prior

First-order random walk model

intrinsic Gaussian Markov Random Field (Rue, 2005; Mimno, 2009)

φ1 φt−1 φt φt+1 φT

draw local changes from a normal distribution mean temporally neighboring parameters variance meaning flexibility parameter κφ

14 / 23

slide-35
SLIDE 35

Learning

Blocked Gibbs sampling

Details in Frermann and Lapata (2016)

15 / 23

slide-36
SLIDE 36

Related Work

slide-37
SLIDE 37

Related work

Word meaning change

Gulordava (2011), Popescu (2013), Kim (2014) , Kulkarni (2015)

✗ word-level meaning ✗ two time intervals ✗ representations are independent ✓ knowledge-lean

16 / 23

slide-38
SLIDE 38

Related work

Word meaning change

Gulordava (2011), Popescu (2013), Kim (2014) , Kulkarni (2015)

✗ word-level meaning ✗ two time intervals ✗ representations are independent ✓ knowledge-lean

Graph-based tracking of word sense change

Mitra (2014, 2015)

✓ sense-level meaning ✓ multiple time intervals ✗ representations are independent ✗ knowledge-heavy

16 / 23

slide-39
SLIDE 39

Evaluation

slide-40
SLIDE 40

Evaluation: Overview

✗ no gold standard test set or benchmark corpora ✗ small-scale evaluation with hand-picked test examples DATE: Diachronic text Corpus (years 1710 – 2010)

  • 1. Coha Corpus (Davies, 2010)
  • 2. SemEval DTE Task Training Data (Popescu, 2015)
  • 3. parts of the CLMET3.0 corpus (Diller, 2011)

17 / 23

slide-41
SLIDE 41

Evaluation: Overview

✗ no gold standard test set or benchmark corpora ✗ small-scale evaluation with hand-picked test examples We evaluate on various previously proposed tasks and metrics

  • 1. qualitative evaluation
  • 2. perceived word novelty (Gulordava, 2011)
  • 3. temporal text classification SemEval DTE (Popescu, 2015)

17 / 23

slide-42
SLIDE 42
  • 1. Qualitative Evaluation

1700 1740 1780 1820 1860 1900 1940 1980 battery wire battery current electric plate power cell electricity light power battery run charge hour life gun artillery battery infantry regiment cavalry fire fire gun enemy fort shore time position shell

18 / 23

slide-43
SLIDE 43
  • 1. Qualitative Evaluation

1700 1740 1780 1820 1860 1900 1940 1980 power

power time company water force line electric plant day run power country government nation war increase world political people europe mind power time life friend woman nature love world reason love power life time woman heart god tell little day 18 / 23

slide-44
SLIDE 44
  • 1. Qualitative Evaluation

1780 2010 time p(w|k, t) water water company company power power company power power power nuclear power force power water company company power company plant nuclear power line company time power force force force plant nuclear plant plant time power force force water time plant electric electric time utility force line water time electric water water time time company company company time steam day day plant day force company utility time run steam electric line time day time day run run people electric day run steam steam electric electric run utility electric energy steam electric day purchase line steam run water day cost cost day run plant run plant run line people force people run

18 / 23

slide-45
SLIDE 45
  • 2. Human-perceived Word Meaning Change (Gulordava (2011))

Task: Rank 100 target words by meaning change. How much did        baseball network ... change between the 1960s and the 1990s? 4-point scale 0: no change ... 3: significant change

19 / 23

slide-46
SLIDE 46
  • 2. Human-perceived Word Meaning Change (Gulordava (2011))

Task: Rank 100 target words by meaning change. How much did        baseball network ... change between the 1960s and the 1990s? 4-point scale 0: no change ... 3: significant change

0.2 0.25 0.3 0.35 0.4 Spearman’s ρ Scan Scan-not Gulordava

19 / 23

slide-47
SLIDE 47
  • 2. Human-perceived Word Meaning Change (Gulordava (2011))

Task: Rank 100 target words by meaning change. How much did        baseball network ... change between the 1960s and the 1990s? 4-point scale 0: no change ... 3: significant change

most changed target words according to SCAN environmental supra note law protection id agency impact policy factor virtual reality virtual computer center experience week community disk hard disk drive program computer file store ram business users computer window information software system wireless drive web building available

19 / 23

slide-48
SLIDE 48
  • 3. Diachronic Text Evaluation (DTE) (SemEval, 2015)

Task: predict the time frame of origin of a given text snippet President de Gaulle favors an independent European nuclear striking force [...] (1962) Prediction granularity fine 2-year intervals

{1700–1702, ..., 1961–1963, ..., 2012–2014}

medium 6-year intervals

{1699–1706, ..., 1959–1965, ..., 2008–2014}

coarse 12-year intervals

{1696–1708, ..., 1956–1968, ..., 2008–2020}

20 / 23

slide-49
SLIDE 49
  • 3. Diachronic Text Evaluation (DTE) (SemEval, 2015)

Scan temporal word representations

  • 883 nouns and verbs from the DTE development dataset
  • ∆T = 5 years
  • K = 8 senses

→ predict time of a test snippet using Scan representations

21 / 23

slide-50
SLIDE 50
  • 3. Diachronic Text Evaluation (DTE) (SemEval, 2015)

fine medium coarse 0.2 0.4 0.6 0.8 accuracy Random baseline Scan-not Scan IXA AMBRA accuracy: precision measure discounted by distance from true time

22 / 23

slide-51
SLIDE 51

Conclusions

A dynamic Bayesian model of diachronic meaning change ✓ sense-level meaning change ✓ arbitrary time spans and intervals ✓ knowledge lean ✓ explicit model of smooth temporal dynamics

23 / 23

slide-52
SLIDE 52

Conclusions

A dynamic Bayesian model of diachronic meaning change ✓ sense-level meaning change ✓ arbitrary time spans and intervals ✓ knowledge lean ✓ explicit model of smooth temporal dynamics Our work opens up avenues for a variety of applications

  • aiding historical text mining or QA
  • building and updating onthologies
  • modeling short term opinion change from twitter data

23 / 23

slide-53
SLIDE 53

Thank you!

lea@frermann.de www.frermann.de 23 / 23

slide-54
SLIDE 54

References I

References

David M. Blei and John D. Lafferty. 2006a. Correlated Topic

  • Models. In Advances in Neural Information Processing Systems

18, pages 147–154. Vancouver, BC, Canada. David M. Blei and John D. Lafferty. 2006b. Dynamic Topic

  • Models. In Proceedings of the 23rd International Conference on

Machine Learning, pages 113–120. Pittsburgh, PA, USA.

23 / 23

slide-55
SLIDE 55

References II

Paul Cook, Jey Han Lau, Diana McCarthy, and Timothy Baldwin.

  • 2014. Novel Word-sense Identification. In Proceedings of the

25th International Conference on Computational Linguistics: Technical Papers, pages 1624–1635. Dublin, Ireland. URL http://www.aclweb.org/anthology/C14-1154. Mark Davies. 2010. The Corpus of Historical American English: 400 million words, 1810-2009. Available online at http://corpus.byu.edu/coha/. Hans-J¨ urgen Diller, Hendrik de Smet, and Jukka Tyrkk¨

  • . 2011. A

European database of descriptors of english electronic texts. The European English messenger, 19(2):29–35. Pieter C. N. Groenewald and Lucky Mokgatlhe. 2005. Bayesian Computation for Logistic Regression. Computational Statistics & Data Analysis, 48(4):857–868.

24 / 23

slide-56
SLIDE 56

References III

Kristina Gulordava and Marco Baroni. 2011. A Distributional Similarity Approach to the Detection of Semantic Change in the Google Books Ngram Corpus. In Proceedings of the Workshop

  • n GEometrical Models of Natural Language Semantics, pages

67–71. Edinburgh, Scotland. Adam Kilgarriff. 2009. Simple maths for keywords. In Proceedings

  • f the Corpus Linguistics Conference.

Yoon Kim, Yi-I Chiu, Kentaro Hanaki, Darshan Hegde, and Slav

  • Petrov. 2014. Temporal Analysis of Language through Neural

Language Models. In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science, pages 61–65. Baltimore, MD, USA.

25 / 23

slide-57
SLIDE 57

References IV

Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, and Steven Skiena.

  • 2015. Statistically Significant Detection of Linguistic Change. In

Proceedings of the 24th International Conference on World Wide Web, pages 625–635. Geneva, Switzerland. Han Jey Lau, Paul Cook, Diana McCarthy, Spandana Gella, and Timothy Baldwin. 2014. Learning Word Sense Distributions, Detecting Unattested Senses and Identifying Novel Senses using Topic Models. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 259–270. Baltimore, MD, USA.

26 / 23

slide-58
SLIDE 58

References V

Jey Han Lau, Paul Cook, Diana McCarthy, David Newman, and Timothy Baldwin. 2012. Word Sense Induction for Novel Sense

  • Detection. In Proceedings of the 13th Conference of the

European Chapter of the Association for Computational Linguistics, pages 591–601. Avignon, France. David Mimno, Hanna Wallach, and Andrew McCallum. 2008. Gibbs Sampling for Logistic Normal Topic Models with Graph-Based Priors. In NIPS Workshop on Analyzing Graphs. Vancouver, Canada. URL http://www.cs.umass.edu/ ~mimno/papers/sampledlgstnorm.pdf. Sunny Mitra, Ritwik Mitra, Suman Kalyan Maity, Martin Riedl, Chris Biemann, Pawan Goyal, and Animesh Mukherjee. 2015. An automatic approach to identify word sense changes in text media across timescales. Natural Language Engineering, 21:773–798.

27 / 23

slide-59
SLIDE 59

References VI

Sunny Mitra, Ritwik Mitra, Martin Riedl, Chris Biemann, Animesh Mukherjee, and Pawan Goyal. 2014. That’s sick dude!: Automatic identification of word sense change across different

  • timescales. In Proceedings of the 52nd Annual Meeting of the

Association for Computational Linguistics, pages 1020–1029. Baltimore, MD, USA. Octavian Popescu and Carlo Strapparava. 2013. Behind the Times: Detecting Epoch Changes using Large Corpora. In Proceedings

  • f the Sixth International Joint Conference on Natural Language

Processing, pages 347–355. Nagoya, Japan.

28 / 23

slide-60
SLIDE 60

References VII

Octavian Popescu and Carlo Strapparava. 2015. SemEval 2015, Task 7: Diachronic Text Evaluation. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 869–877. Denver, CO, USA. URL http://www.aclweb.org/anthology/S15-2147.

29 / 23