1
Taylor’s law for Human Linguistic Sequences
Tatsuru Kobayashi Kumiko Tanaka-Ishii
Research Center for Advanced Science Technology The University of Tokyo
Taylors law for Human Linguistic Sequences Tatsuru Kobayashi - - PowerPoint PPT Presentation
Taylors law for Human Linguistic Sequences Tatsuru Kobayashi Kumiko Tanaka-Ishii Research Center for Advanced Science Technology The University of Tokyo 1 Power laws of natural language 1. Vocabulary Population Zipfs law
1
Research Center for Advanced Science Technology The University of Tokyo
Words occur in clusters Occurrences of words fluctuate
For “Moby Dick”
These can be analyzed through power laws Today’s talk is about quantifying the degree of fluctuation. How these could be useful will be presented at the end.
2
Any words (any word, any set of words) occur in clusters Occurrences of rare words in Moby Dick (below 3162th)
2500th 2000th
3
Any words (any word, any set of words) occur in clusters Occurrences of rare words in Moby Dick (below 3162th)
variance w.r.t. Δ𝑢
variance w.r.t. mean Our achievements
Variance is larger when events are clustered vs. random
4
Power law between standard deviation and mean of event occurrences within (space or) time Δ𝑢
Empirically 0.5 ≤ 𝛽 ≤ 1.0 (but 𝛽 < 0.5 is of course possible, too) Empirically known to hold in vast fields (Eisler, 2007) ecology, life science, physics, finance, human dynamics … The only application to language is Gerlach & Altmann (2014) ← not really Taylor analysis We devised a new method based on the original concept of Taylor’s law
5
𝑥0
𝑥0
𝑥1 𝑥1 𝑑̂, 𝛽 5 = argmin=,'𝜗 𝑑, 𝛽 , 𝜗 𝑑, 𝛽 = 1 𝑋 @ log 𝜏C − log 𝑑𝜈C
' 1 E CF0
1 For every word kind 𝑥C ∈ 𝑋 count its number of occurrence within given length Δ𝑢. 2 Obtain mean 𝜈C and standard deviation 𝜏C of 𝑥C. 3 Plot 𝜈C and 𝜏C for all words. 4 Estimate 𝛽 using the least squares method in log scale
6
exponent 𝛽 = 0.57.
corresponds to gradient of log 𝜈-log 𝜏 plot.
‘Moby Dick’ English, 250k words, vocabulary size 20k words Taylor’s law in log scale
7
‘Moby Dick’ (English)’s Taylor’s law in log scale
8
Empirically 0.5 ≤ 𝛽 ≤ 1.0 𝛽 = 0.5 if all words are independent and identically distributed (i.i.d.). Taylor Exponent 𝛽 = 0.5 because shuffled text is equivalent to i.i.d. process. Shuffled ‘Moby Dick’ Δ𝑢 ≈ 5000.
9
𝛽 = 1.0 if words always co-occur with the same proportion. ex) Suppose that 𝑋 = {𝑥0, 𝑥1}, and 𝑥1 occurs always twice as 𝑥0 ⟹ 𝜈1 = 2𝜈0, 𝜏1 = 2𝜏0 ⟹ 𝜏 ∝ 𝜈 𝑥0: 3, 𝑥1: 6
𝐦𝐩𝐡 𝝂 𝐦𝐩𝐡 𝝉
gradient 𝛽 = 1
log 2 log 2
𝑥0: 17, 𝑥1: 34
10
up things hand this dear truck xload insert platform let unless and
11
Kind Languages Number
Average size Example Gutenberg & Aozora (Long, single author) 14(En, Fr, …) 1142 311,483 ‘Moby Dick’ ‘Les Miserables’ Newspapers 3 (En,Zh,Ja) 4 580,488,956 WSJ Tagged Wiki 1 (En+tag) 1 14,637,848 enwiki8 CHILDES 10(En, Fr, …) 10 193,434 Thomas (English) Music
135,993 Matthäus (Bach) Program Codes 4 4 34,161,018 C++, Lisp, Haskell, Python
12
Single author texts
Written Texts mean 𝛽 = 0.58 Random Texts 𝛽 = 0.50 Other data 𝛽 ≥ 0.63
None of the real texts showed the exponent 0.5
0.63 0.68 0.79 0.79 0.80 0.70 0.60 0.50
13
(No such quality for Zipf’s law, Heaps’ law)
14
15
Learning: Shakespeare by naive setting Generation: Probabilistic generation
(2 million characters)
128 preceding characters
256 nodes Stacked LSTM (3 LSTM layers)
Distribution of following character State-of the art models present different results (in another paper)
16
Fluctuation that derives from the context is provided by the source text
17
(No such quality for Zipf’s law, Heaps’ law)
18
19