Zipfs Law Robert Fernholz INTECH Joint research with Ricardo - - PowerPoint PPT Presentation

zipf s law
SMART_READER_LITE
LIVE PREVIEW

Zipfs Law Robert Fernholz INTECH Joint research with Ricardo - - PowerPoint PPT Presentation

Zipfs Law Robert Fernholz INTECH Joint research with Ricardo Fernholz Thera Stochastics Santorini, Greece May 31 June 2, 2017 1 / 39 This talk is dedicated to Ioannis Karatzas on the occasion of his 65th birthday. 2 / 39


slide-1
SLIDE 1

Zipf’s Law

Robert Fernholz INTECH Joint research with Ricardo Fernholz Thera Stochastics Santorini, Greece May 31 – June 2, 2017

1 / 39

slide-2
SLIDE 2

This talk is dedicated to Ioannis Karatzas

  • n the occasion of his 65th birthday.

2 / 39

slide-3
SLIDE 3

Introduction

“Zipf’s law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. The law is named after the American linguist George Kingsley Zipf (1902–1950), who popularized it and sought to explain it (Zipf (1935, 1949)), though he did not claim to have originated it.” (From Wikipedia (2017).)

3 / 39

slide-4
SLIDE 4

Introduction

“Zipf’s law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. The law is named after the American linguist George Kingsley Zipf (1902–1950), who popularized it and sought to explain it (Zipf (1935, 1949)), though he did not claim to have originated it.” (From Wikipedia (2017).)

3 / 39

slide-5
SLIDE 5

Word count from Wikipedia

4 / 39

slide-6
SLIDE 6

Power laws and the Pareto distribution

Data follow a power law or Pareto distribution if a log-log plot

  • f the data versus rank is approximately a straight line. Pareto

distributions can result from self-organized criticality or from time-dependent systems. A Pareto distribution follows Zipf’s law if the slope of the log-log plot is −1. Zipf’s law is a form of universality, since many classes of data seem to follow this distribution. Specifically, certain time-dependent, rank-based systems seem to follow Zipf’s law, and we shall try to characterize these systems.

5 / 39

slide-7
SLIDE 7

Examples of Pareto distributions

Log-log slopes in blue (From Newman (2006)). −.83 −.49 −.71 −.40 −.82 −.49

6 / 39

slide-8
SLIDE 8

Examples of Pareto distributions

Log-log slopes in blue (From Newman (2006)). −.47 −1.20 −1.25 −.92 −1.06 −.77

7 / 39

slide-9
SLIDE 9

Members and families

We wish to model systems of positive-valued, time-dependent data {Ξ1(t), Ξ2(t), . . .} of indefinite size. These data represent two classes of objects, members and families. The members are contained within the families, and Ξi(t) indicates the number of members contained within the ith family at time t. Examples of members within families are:

◮ people within cities; ◮ occurrences within words; ◮ dollars within family fortunes; ◮ individuals within surnames; ◮ dollars within company capitalizations; ◮ birds within species.

8 / 39

slide-10
SLIDE 10

Trends and sampling

The data we consider {Ξ1(t), Ξ2(t), . . .} might have a common global trend of the form G(t)dt, e.g., population growth, Wikipedia growth, GDP growth, etc. We shall study log-differences, so a global trend does not affect us, and it is convenient to assume it to be zero. Alternatively, we can sample the total population with a constant number of people, words, dollars, etc., in our sample over

  • time. This could introduce sampling error but should not

materially affect the shape of the distribution curve. In any case, to simplify the exposition, we shall assume henceforth that the total population we observe is free of trends.

9 / 39

slide-11
SLIDE 11

Continuous semimartingales

To model the data {Ξ1(t), Ξ2(t), . . .} we shall use continuous semimartingales X1, X2, . . . of the form d log Xi(t) = γi(t)dt + σi(t)dWi(t), where W is a Brownian motion and the processes γi and σi are measurable and adapted to the Brownian filtration. A model of this form might be reasonable if, e.g.,

  • 1. the changes dΞi(t) are proportional to the values Ξi(t);
  • 2. the log-changes d log Ξi(t) are composed of many small,

independent perturbations;

  • 3. the changes in the different Ξi are independent.

10 / 39

slide-12
SLIDE 12

Rank processes

For a system of positive continuous semimartingales X1, . . . , Xn we define the rank function to be the random permutation rt ∈ Σn such that rt(i) < rt(j) if Xi(t) > Xj(t) or if Xi(t) = Xj(t) and i < j. The rank processes X(1) ≥ · · · ≥ X(n) are defined by X(rt(i))(t) = Xi(t). If the Xi satisfy certain regularity conditions, e.g., they spend no local time at triple points, then the rank processes satisfy, d log X(k)(t) =

n

  • i=1

✶{rt(i)=k}d log Xi(t) + 1 2dΛX

k,k+1(t)

− 1 2dΛX

k−1,k(t),

a.s., where ΛX

k,k+1 is the local time at the origin for log(X(k)/X(k+1)),

with ΛX

0,1 = ΛX n,n+1 ≡ 0 (Fernholz (2002)).

11 / 39

slide-13
SLIDE 13

Asymptotic stability

A system of positive continuous semimartingales X1, . . . , Xn is asymptotically stable if

  • 1. lim

t→∞ t−1

log X(1)(t) − log X(n)(t)

  • = 0,

a.s. (coherence);

  • 2. lim

t→∞ t−1ΛX k,k+1(t) = λk,k+1 > 0,

a.s.;

  • 3. lim

t→∞ t−1log X(k) − log X(k+1)t = σ2 k,k+1 > 0,

a.s.; for k = 1, . . . , n − 1, where λk,k+1 and σ2

k,k+1 are constants.

The systems of continuous semimartingales we consider will be asymptotically stable and will also satisfy (∗) lim

T→∞

1 T T

  • log X(k)(t) − log X(k+1)(t)
  • dt =

σ2

k,k+1

2λk,k+1 , a.s, for k = 1, . . . , n − 1.

12 / 39

slide-14
SLIDE 14

U.S. Capital Distribution, 1929 to 1999

1 5 10 50 100 500 1000 5000 WEIGHT RANK 1e07 1e05 1e03 1e01

Market weight curves (From Fernholz (2002)).

13 / 39

slide-15
SLIDE 15

Conservation of ‘mass’

Suppose that for the data {Ξ1(t), Ξ2(t), . . .} the “total mass” Ξ(1)(t) + Ξ(2)(t) + · · · remains constant. The mass of the top n ranks Ξ(1), . . . , Ξ(n) is defined by Ξ[n](t) Ξ(1)(t) + · · · + Ξ(n)(t), and since the sample has constant total mass, for large enough n the mass of the top n ranks should also be approximately constant. Hence, we impose the condition on the model X1, . . . , Xn that (A) lim

n→∞ E

dX[n](t) X[n](t)

  • = 0.

14 / 39

slide-16
SLIDE 16

Behavior of ranked systems

Let us suppose for the moment that the data processes Ξi are continuous semimartingales that spend no local time at triple

  • points. In this case, the rank processes Ξ(k) will satisfy

d log Ξ(k)(t) =

  • i=1

✶{rt(i)=k}d log Ξi(t) + 1 2dΛΞ

k,k+1(t)

− 1 2dΛΞ

k−1,k(t),

a.s., for all k. By Itˆ

  • ’s rule, for all k, a.s.,

dΞ(k)(t) Ξ(k)(t) =

  • i=1

✶{rt(i)=k} dΞi(t) Ξi(t) + 1 2dΛΞ

k,k+1(t) − 1

2dΛΞ

k−1,k(t)

=

  • i=1

✶{rt(i)=k} dΞi(t) Ξ(k)(t) + 1 2dΛΞ

k,k+1(t) − 1

2dΛΞ

k−1,k(t).

15 / 39

slide-17
SLIDE 17

Behavior of ranked systems

Hence, dΞ(k)(t) =

  • i=1

✶{rt(i)=k}dΞi(t) + 1 2Ξ(k)(t)dΛΞ

k,k+1(t)

− 1 2Ξ(k)(t)dΛΞ

k−1,k(t)

=

  • i=1

✶{rt(i)=k}dΞi(t) + 1 2Ξ(k)(t)dΛΞ

k,k+1(t)

− 1 2Ξ(k−1)(t)dΛΞ

k−1,k(t),

a.s., so we can add up the dΞ(k)(t) to obtain dΞ[n](t) =

  • i=1

✶{rt(i)≤n}dΞi(t) + 1 2Ξ(n)(t)dΛΞ

n,n+1(t),

a.s. This serves to define the local time ΛΞ

n,n+1(t) for the data.

16 / 39

slide-18
SLIDE 18

ΛΞ

k,k+1(t) for U.S. capital distribution

k = 10, 20, 40, . . . , 5120 (From Fernholz (2002)).

17 / 39

slide-19
SLIDE 19

Leakage

For the data {Ξ1(t), Ξ2(t), . . .} we have the representation dΞ[n](t) =

  • i=1

✶{rt(i)≤n}dΞi(t) + 1 2Ξ(n)(t)dΛΞ

n,n+1(t).

The final term compensates for the “leakage” from Ξ[n]. In order that the system not depend on mass replenished from

  • utside, we impose the condition that the (relative) leakage tends

to zero: (B) lim

n→∞ E

X(n)(t) X[n](t) dΛX

n,n+1(t)

  • = 0.

18 / 39

slide-20
SLIDE 20

A conservation law

Conditions (A) and (B) together are a form of conservation law that ensures that the total mass of the system is autonomously maintained: (A) lim

n→∞ E

dX[n](t) X[n](t)

  • = 0,

and (B) lim

n→∞ E

X(n)(t) X[n](t) dΛX

n,n+1(t)

  • = 0.

We shall now study the effects of conditions (A) and (B) on our continuous semimartingale model X1, . . . , Xn.

19 / 39

slide-21
SLIDE 21

Atlas models

Perhaps the simplest model for the systems we consider is an Atlas model, a system of positive continuous semimartingales X1, . . . , Xn defined by d log Xi(t) =

  • − g + ng✶{rt(i)=n}
  • dt + σ dWi(t),

where g and σ are positive constants, and (W1, . . . , Wn) is a Brownian motion. Atlas models are asymptotically stable, and since the processes Xi are exchangeable, they asymptotically spend equal time in each rank. Hence, each of the Xi has zero asymptotic log-drift, so the entire system has zero asymptotic log-drift (Fernholz (2002), Banner et al. (2005)). We shall assume that Atlas models are in their steady-state distributions.

20 / 39

slide-22
SLIDE 22

The asymptotic distribution of Atlas models

The asymptotic parameters for Atlas models are λk,k+1 = 2kg and σ2

k,k+1 = 2σ2,

a.s., and these models satisfy (∗) lim

T→∞

1 T T

  • log X(k)(t) − log X(k+1)(t)
  • dt =

σ2

k,k+1

2λk,k+1 , a.s., for k = 1, . . . , n − 1. Hence, for large enough k, lim

T→∞

1 T T log X(k)(t) − log X(k+1)(t) log(k) − log(k + 1) dt ∼ = −σ2 2g , a.s., so Atlas models follow Pareto distributions, and Zipf’s law is equivalent to σ2/2 = g. We wish to characterize this in terms of conditions (A) and (B).

21 / 39

slide-23
SLIDE 23

The behavior of Atlas models

For an Atlas model, Itˆ

  • ’s rule implies that, a.s.,

dXi(t) = σ2 2 − g + ng✶{rt(i)=n}

  • Xi(t)dt + σXi(t)dWi(t).

For the total mass X[n] = X1 + · · · + Xn we have dX[n](t) = σ2 2 −g

  • X[n](t)dt+X[n](t)dM(t)+ngX(n)(t)dt,

a.s., where M is a martingale incorporating all the σWi, so dX[n](t) X[n](t) = σ2 2 − g

  • dt + dM(t) + ngX(n)(t)

X[n](t) dt, a.s. where the last term plays the same role as leakage in the system Ξ. Hence, E dX[n](t) X[n](t)

  • =

σ2 2 − g

  • dt + E

ngX(n)(t) X[n](t)

  • dt.

22 / 39

slide-24
SLIDE 24

Zipf’s law for Atlas models

For an Atlas model we have E dX[n](t) X[n](t)

  • =

σ2 2 − g

  • dt + E

ngX(n)(t) X[n](t)

  • dt,

and we can calculate E ngX(n)(t) X[n](t)

  • =

       O

  • 1
  • for σ2/2 < g,

O

  • 1/ log n
  • for σ2/2 = g,

O

  • n(1−σ2/2g)

for σ2/2 > g. Hence, (A) lim

n→∞ E

dX[n](t) X[n](t)

  • = 0

plus (B) lim

n→∞ E

ngX(n)(t) X[n](t)

  • = 0

is equivalent to σ2/2 = g, and this is equivalent to Zipf.

23 / 39

slide-25
SLIDE 25

Examples of Pareto distributions

Log-log slopes in blue (From Newman (2006)). −.83 −.49 −.71 −.40 −.82 −.49

24 / 39

slide-26
SLIDE 26

Examples of Pareto distributions

Log-log slopes in blue (From Newman (2006)). −.47 −1.20 −1.25 −.92 −1.06 −.77

25 / 39

slide-27
SLIDE 27

First-order models

A first-order model is a system of continuous semimartingales X1, . . . , Xn with d log Xi(t) = grt(i)dt + σrt(i)dWi(t), where the gk and σk are constants such that σ2

k > 0, with

g1 + · · · + gn = 0 and g1 + · · · + gk < 0 for k < n (Fernholz (2002), Banner et al. (2005)). As usual, (W1, . . . , Wn) is a Brownian motion. First-order models are asymptotically stable with λk,k+1 = −2

  • g1 + · · · + gk
  • ,

a.s., and σ2

k,k+1 = σ2 k + σ2 k+1,

a.s.

26 / 39

slide-28
SLIDE 28

First-order approximation

Suppose that {Ξ1(t), Ξ2(t), . . .} is an asymptotically stable system of time-dependent data of indefinite size with parameters λk,k+1 and σ2

k,k+1. Then the first-order approximation for the top

n ranks of this system is the first-order model X1, . . . , Xn with parameters gk = 1 2λk−1,k − 1 2λk,k+1, for k = 1, . . . , n − 1 gn = 1 2λn,n+1 σ2

1 = 1

2σ2

1,2

σ2

k = 1

4

  • σ2

k−1,k + σ2 k,k+1

  • ,

for k = 2, . . . , n. In this manner, we can construct a first-order approximation for any asymptotically stable system.

27 / 39

slide-29
SLIDE 29

First-order approximation

The first-order approximation X1, . . . , Xn satisfies (∗) lim

T→∞

1 T T

  • log X(k)(t) − log X(k+1)(t)
  • dt = −σ2

k + σ2 k+1

2λk,k+1 , a.s., with parameters λk,k+1 = λk,k+1, σ2

1 = 1

2σ2

1,2,

σ2

k = 1

4

  • σ2

k−1,k + σ2 k,k+1

  • .

Let us suppose that the data {Ξ1(t), Ξ2(t), . . .} satisfy (∗) lim

T→∞

1 T T

  • log Ξ(k)(t) − log Ξ(k+1)(t)
  • dt = −

σ2

k,k+1

2λk,k+1 , so the X distribution is a smoothed version of the Ξ distribution.

28 / 39

slide-30
SLIDE 30

Parameters gk for U.S. capital distribution

1000 2000 3000 4000 5000

  • 0.08
  • 0.06
  • 0.04
  • 0.02

0.00 Rank Growth Rate

29 / 39

slide-31
SLIDE 31

Parameters σ2

k for U.S. capital distribution

1000 2000 3000 4000 5000 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Rank Variance Rate

30 / 39

slide-32
SLIDE 32

U.S. capital distribution, 1990 to 1999

1 5 10 50 100 500 1000 5000 5e-06 5e-05 5e-04 5e-03 Rank Weight

Actual (black). First-order (red).

31 / 39

slide-33
SLIDE 33

First-order approximation

Perhaps the simplest first-order model is of the form d log Xi(t) =

  • − g + ng✶{rt(i)=n}
  • dt + σrt(i)dWi(t),

where the σ2

k increase with rank, σ2 1 ≤ · · · ≤ σ2

  • n. Indeed, this

increasing variance would probably have occurred with the original Brownian motion, where the small pollen particles would vibrate more vigorously than the big ones. In this case, the slope of the tangent lim

T→∞

1 T T log X(k)(t) − log X(k+1)(t) log(k) − log(k + 1) dt ∼ = −σ2

k + σ2 k+1

4g , a.s., will be increasingly negative, so the distribution curve will be concave.

32 / 39

slide-34
SLIDE 34

Weakly Zipfian systems

Suppose our model is of the form d log Xi(t) =

  • − g + ng✶{rt(i)=n}
  • dt + σrt(i) dWi(t),

with σ2

1 ≤ · · · ≤ σ2

  • n. Then

E dX[n](t) X[n](t)

  • =
  • n
  • k=1

E X(k)(t) X[n](t) σ2

k

2 − g

  • dt + E

ngX(n)(t) X[n](t)

  • dt,

Hence, if (A) lim

n→∞ E

dX[n](t) X[n](t)

  • = 0

and (B) lim

n→∞ E

ngX(n)(t) X[n](t)

  • = 0,

then, lim

n→∞ n

  • k=1

E X(k)(t) X[n](t) σ2

k

2 = g. This system will be weakly Zipfian, having a distribution that is concave with tangent slope −1 somewhere in the middle ranks.

33 / 39

slide-35
SLIDE 35

U.S. capital distribution, 1990 to 1999

1 5 10 50 100 500 1000 5000 5e-06 5e-05 5e-04 5e-03 Rank Weight

Actual (black). First-order (red).

34 / 39

slide-36
SLIDE 36

Birds

North American Bird Survey 2003 (From Newman (2006)).

35 / 39

slide-37
SLIDE 37

Birds

North American Bird Survey 2003 (From Newman (2006)).

36 / 39

slide-38
SLIDE 38

Word count from Wikipedia

37 / 39

slide-39
SLIDE 39

References

◮ Banner, A., R. Fernholz, and I. Karatzas (2005). Atlas models

  • f equity markets. Annals of Applied Probability 15(4),

2296–2330.

◮ Fernholz, R. (2002). Stochastic Portfolio Theory. Springer. ◮ Newman, M. E. J. (2006). Power laws, Pareto distributions

and Zipf’s law. ArXiv 04120004v3, 1–28.

◮ Zipf, G. K. (1935). The Psychobiology of Language.

Houghton-Mifflin.

◮ Zipf, G. K. (1949). Human Behavior and the Principle of

Least Effort. Addison-Wesley.

38 / 39

slide-40
SLIDE 40

Thera Stochastics

Thank you!

39 / 39