Zipfs Law Robert Fernholz INTECH Joint research with Ricardo - PowerPoint PPT Presentation

Zipf’s Law Robert Fernholz INTECH Joint research with Ricardo Fernholz Thera Stochastics Santorini, Greece May 31 – June 2, 2017 1 / 39

This talk is dedicated to Ioannis Karatzas on the occasion of his 65th birthday. 2 / 39

Introduction “ Zipf’s law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. The law is named after the American linguist George Kingsley Zipf (1902–1950), who popularized it and sought to explain it (Zipf (1935, 1949)), though he did not claim to have originated it.” (From Wikipedia (2017).) 3 / 39

Word count from Wikipedia 4 / 39

Power laws and the Pareto distribution Data follow a power law or Pareto distribution if a log-log plot of the data versus rank is approximately a straight line. Pareto distributions can result from self-organized criticality or from time-dependent systems. A Pareto distribution follows Zipf’s law if the slope of the log-log plot is − 1 . Zipf’s law is a form of universality, since many classes of data seem to follow this distribution. Specifically, certain time-dependent, rank-based systems seem to follow Zipf’s law, and we shall try to characterize these systems. 5 / 39

Examples of Pareto distributions − .83 − .49 − .71 − .40 − .82 − .49 Log-log slopes in blue (From Newman (2006)). 6 / 39

Examples of Pareto distributions − .47 − 1.20 − 1.25 − .92 − 1.06 − .77 Log-log slopes in blue (From Newman (2006)). 7 / 39

Members and families We wish to model systems of positive-valued, time-dependent data { Ξ 1 ( t ) , Ξ 2 ( t ) , . . . } of indefinite size. These data represent two classes of objects, members and families. The members are contained within the families, and Ξ i ( t ) indicates the number of members contained within the i th family at time t . Examples of members within families are: ◮ people within cities; ◮ occurrences within words; ◮ dollars within family fortunes; ◮ individuals within surnames; ◮ dollars within company capitalizations; ◮ birds within species. 8 / 39

Trends and sampling The data we consider { Ξ 1 ( t ) , Ξ 2 ( t ) , . . . } might have a common global trend of the form G ( t ) dt , e.g., population growth, Wikipedia growth, GDP growth, etc. We shall study log-differences, so a global trend does not affect us, and it is convenient to assume it to be zero. Alternatively, we can sample the total population with a constant number of people, words, dollars, etc., in our sample over time. This could introduce sampling error but should not materially affect the shape of the distribution curve. In any case, to simplify the exposition, we shall assume henceforth that the total population we observe is free of trends. 9 / 39

Continuous semimartingales To model the data { Ξ 1 ( t ) , Ξ 2 ( t ) , . . . } we shall use continuous semimartingales X 1 , X 2 , . . . of the form d log X i ( t ) = γ i ( t ) dt + σ i ( t ) dW i ( t ) , where W is a Brownian motion and the processes γ i and σ i are measurable and adapted to the Brownian filtration. A model of this form might be reasonable if, e.g., 1. the changes d Ξ i ( t ) are proportional to the values Ξ i ( t ) ; 2. the log-changes d log Ξ i ( t ) are composed of many small, independent perturbations; 3. the changes in the different Ξ i are independent. 10 / 39

Rank processes For a system of positive continuous semimartingales X 1 , . . . , X n we define the rank function to be the random permutation r t ∈ Σ n such that r t ( i ) < r t ( j ) if X i ( t ) > X j ( t ) or if X i ( t ) = X j ( t ) and i < j . The rank processes X (1) ≥ · · · ≥ X ( n ) are defined by X ( r t ( i )) ( t ) = X i ( t ) . If the X i satisfy certain regularity conditions, e.g., they spend no local time at triple points, then the rank processes satisfy, n ✶ { r t ( i )= k } d log X i ( t ) + 1 � 2 d Λ X d log X ( k ) ( t ) = k,k +1 ( t ) i =1 − 1 2 d Λ X k − 1 ,k ( t ) , a.s. , where Λ X k,k +1 is the local time at the origin for log( X ( k ) /X ( k +1) ) , with Λ X 0 , 1 = Λ X n,n +1 ≡ 0 (Fernholz (2002)). 11 / 39

Asymptotic stability A system of positive continuous semimartingales X 1 , . . . , X n is asymptotically stable if t →∞ t − 1 � � 1. lim log X (1) ( t ) − log X ( n ) ( t ) = 0 , a.s. ( coherence ); t →∞ t − 1 Λ X 2. lim k,k +1 ( t ) = λ k,k +1 > 0 , a.s.; t →∞ t − 1 � log X ( k ) − log X ( k +1) � t = σ 2 3. lim k,k +1 > 0 , a.s.; for k = 1 , . . . , n − 1 , where λ k,k +1 and σ 2 k,k +1 are constants. The systems of continuous semimartingales we consider will be asymptotically stable and will also satisfy � T σ 2 1 k,k +1 � � ( ∗ ) lim log X ( k ) ( t ) − log X ( k +1) ( t ) dt = , T 2 λ k,k +1 T →∞ 0 a.s, for k = 1 , . . . , n − 1 . 12 / 39

U.S. Capital Distribution, 1929 to 1999 1e � 01 1e � 03 WEIGHT 1e � 05 1e � 07 1 5 10 50 100 500 1000 5000 RANK Market weight curves (From Fernholz (2002)). 13 / 39

Conservation of ‘mass’ Suppose that for the data { Ξ 1 ( t ) , Ξ 2 ( t ) , . . . } the “total mass” Ξ (1) ( t ) + Ξ (2) ( t ) + · · · remains constant. The mass of the top n ranks Ξ (1) , . . . , Ξ ( n ) is defined by Ξ [ n ] ( t ) � Ξ (1) ( t ) + · · · + Ξ ( n ) ( t ) , and since the sample has constant total mass, for large enough n the mass of the top n ranks should also be approximately constant. Hence, we impose the condition on the model X 1 , . . . , X n that � dX [ n ] ( t ) � (A) lim = 0 . n →∞ E X [ n ] ( t ) 14 / 39

Behavior of ranked systems Let us suppose for the moment that the data processes Ξ i are continuous semimartingales that spend no local time at triple points. In this case, the rank processes Ξ ( k ) will satisfy ∞ ✶ { r t ( i )= k } d log Ξ i ( t ) + 1 � 2 d Λ Ξ d log Ξ ( k ) ( t ) = k,k +1 ( t ) i =1 − 1 2 d Λ Ξ k − 1 ,k ( t ) , a.s. , for all k . By Itˆ o’s rule, for all k , a.s., ∞ d Ξ ( k ) ( t ) d Ξ i ( t ) Ξ i ( t ) + 1 k,k +1 ( t ) − 1 � 2 d Λ Ξ 2 d Λ Ξ Ξ ( k ) ( t ) = k − 1 ,k ( t ) ✶ { r t ( i )= k } i =1 ∞ Ξ ( k ) ( t ) + 1 d Ξ i ( t ) k,k +1 ( t ) − 1 � 2 d Λ Ξ 2 d Λ Ξ = k − 1 ,k ( t ) . ✶ { r t ( i )= k } i =1 15 / 39

Behavior of ranked systems Hence, ∞ ✶ { r t ( i )= k } d Ξ i ( t ) + 1 � 2Ξ ( k ) ( t ) d Λ Ξ d Ξ ( k ) ( t ) = k,k +1 ( t ) i =1 − 1 2Ξ ( k ) ( t ) d Λ Ξ k − 1 ,k ( t ) ∞ ✶ { r t ( i )= k } d Ξ i ( t ) + 1 � 2Ξ ( k ) ( t ) d Λ Ξ = k,k +1 ( t ) i =1 − 1 2Ξ ( k − 1) ( t ) d Λ Ξ k − 1 ,k ( t ) , a.s. , so we can add up the d Ξ ( k ) ( t ) to obtain ∞ ✶ { r t ( i ) ≤ n } d Ξ i ( t ) + 1 � 2Ξ ( n ) ( t ) d Λ Ξ d Ξ [ n ] ( t ) = n,n +1 ( t ) , a.s. i =1 This serves to define the local time Λ Ξ n,n +1 ( t ) for the data. 16 / 39

Λ Ξ k,k +1 ( t ) for U.S. capital distribution k = 10 , 20 , 40 , . . . , 5120 (From Fernholz (2002)). 17 / 39

Leakage For the data { Ξ 1 ( t ) , Ξ 2 ( t ) , . . . } we have the representation ∞ ✶ { r t ( i ) ≤ n } d Ξ i ( t ) + 1 � 2Ξ ( n ) ( t ) d Λ Ξ d Ξ [ n ] ( t ) = n,n +1 ( t ) . i =1 The final term compensates for the “leakage” from Ξ [ n ] . In order that the system not depend on mass replenished from outside, we impose the condition that the (relative) leakage tends to zero: � X ( n ) ( t ) � X [ n ] ( t ) d Λ X (B) lim n,n +1 ( t ) = 0 . n →∞ E 18 / 39

A conservation law Conditions (A) and (B) together are a form of conservation law that ensures that the total mass of the system is autonomously maintained: � dX [ n ] ( t ) � (A) lim = 0 , n →∞ E X [ n ] ( t ) and � X ( n ) ( t ) � X [ n ] ( t ) d Λ X (B) lim n,n +1 ( t ) = 0 . n →∞ E We shall now study the effects of conditions (A) and (B) on our continuous semimartingale model X 1 , . . . , X n . 19 / 39

Atlas models Perhaps the simplest model for the systems we consider is an Atlas model, a system of positive continuous semimartingales X 1 , . . . , X n defined by � � d log X i ( t ) = − g + ng ✶ { r t ( i )= n } dt + σ dW i ( t ) , where g and σ are positive constants, and ( W 1 , . . . , W n ) is a Brownian motion. Atlas models are asymptotically stable, and since the processes X i are exchangeable, they asymptotically spend equal time in each rank. Hence, each of the X i has zero asymptotic log-drift, so the entire system has zero asymptotic log-drift (Fernholz (2002), Banner et al. (2005)). We shall assume that Atlas models are in their steady-state distributions. 20 / 39

Zipfs Law Robert Fernholz INTECH Joint research with Ricardo - PowerPoint PPT Presentation

Zipfs Law Robert Fernholz INTECH Joint research with Ricardo Fernholz Thera Stochastics Santorini, Greece May 31 June 2, 2017 1 / 39 This talk is dedicated to Ioannis Karatzas on the occasion of his 65th birthday. 2 / 39

Data from our man Zipf Zipf in brief Principles of Complex Systems Zipfian empirics Course 300,

Commonsense Explanations Zipfs Law: A Brief . . . of Sparsity, Zipf Law, and Main Idea Behind

Is It Legitimate Statistics Let Us Prepare to . . . or Is It Sexism: Zipf Law Case of Inclusive

Institute of Law Institute of Law Institute of Law Institute of Law Law Made Simple

Statement of Ohms Law Circuit diagram of Ohms Law Formula of Ohms Law Ohms law in

of OpenStreetMap Datasets Christopher Barron , Pascal Neis, Alexander Zipf Geoinformatics Research

Virtual Memory 3 / I/O 1 last time working set, Zipf usage models LRU page replacement

Web Caching based on: Web Caching , Geoff Huston Web Caching and Zipf-like Distributions:

1 Web Traffic Characterization Zipf Web Traffic Characterization Zipf [Breslau/Cao99] and

Studying Law at Salford Presented by: Ian King (Law UG Programme Leader) and Emma Clarke (Final

Martin Law Firm Martin Law Firm Martin Law Firm Martin Law Firm 1- -800 800- -633 633-

LL.M. in French and European Law specialization in Taxation Law, Business Law and Compliance

Guardianship and the Law Guardianship and the Law p Exercise of authority by guardian

LL.M. in French and European Union Law specialization in Taxation Law, Business Law and

Stark Law Stark Law Stark Law Stark Law Making the Confusion Understandable Making the

ANALYSE A CASE LAW Acelegal (Education Series) 1/38 ACELEGAL AGENDA What is a Case Law?

From the Classroom to the Law Firm January 7, 2017 American Association of Law Schools San

Law & CS http://en.wikipedia.org/wiki/File:CourtEqualJustice.JPG Which building is

Global Perspectives on Law, Policy, and Mobility Innovation 9:00 AM - 6:30 PM UNIVERSITY OF

The Law and Economics of Blockholder Disclosure Lucian Bebchuk & Robert J. Jackson, Jr.

The Human Rights Act Mechanics + key concepts (October 2019) UNCLASSIFIED Take Away Message

Foundations for Evidence-Based Policymaking Act of 2018 Evidence Act Background Begins to

Understanding the CARES Act Programs for Employers The CARES Act $250 billion in $260 billion

Wednesday, October 10, 2018 AUGUST 2018-2019 REVENUE VARIANCE -$1.0 $0.2 $0.0 $0.1 -$1.3

Sambuz

Useful Links

Newsletter

Mail Us

Zipfs Law Robert Fernholz INTECH Joint research with Ricardo - PowerPoint PPT Presentation

Zipfs Law Robert Fernholz INTECH Joint research with Ricardo Fernholz Thera Stochastics Santorini, Greece May 31 June 2, 2017 1 / 39 This talk is dedicated to Ioannis Karatzas on the occasion of his 65th birthday. 2 / 39

Data from our man Zipf Zipf in brief Principles of Complex Systems Zipfian empirics Course 300,

Commonsense Explanations Zipfs Law: A Brief . . . of Sparsity, Zipf Law, and Main Idea Behind

Is It Legitimate Statistics Let Us Prepare to . . . or Is It Sexism: Zipf Law Case of Inclusive

Institute of Law Institute of Law Institute of Law Institute of Law Law Made Simple

Statement of Ohms Law Circuit diagram of Ohms Law Formula of Ohms Law Ohms law in

of OpenStreetMap Datasets Christopher Barron , Pascal Neis, Alexander Zipf Geoinformatics Research

Virtual Memory 3 / I/O 1 last time working set, Zipf usage models LRU page replacement

Web Caching based on: Web Caching , Geoff Huston Web Caching and Zipf-like Distributions:

1 Web Traffic Characterization Zipf Web Traffic Characterization Zipf [Breslau/Cao99] and

Studying Law at Salford Presented by: Ian King (Law UG Programme Leader) and Emma Clarke (Final

Martin Law Firm Martin Law Firm Martin Law Firm Martin Law Firm 1- -800 800- -633 633-

LL.M. in French and European Law specialization in Taxation Law, Business Law and Compliance

Guardianship and the Law Guardianship and the Law p Exercise of authority by guardian

LL.M. in French and European Union Law specialization in Taxation Law, Business Law and

Stark Law Stark Law Stark Law Stark Law Making the Confusion Understandable Making the

ANALYSE A CASE LAW Acelegal (Education Series) 1/38 ACELEGAL AGENDA What is a Case Law?

From the Classroom to the Law Firm January 7, 2017 American Association of Law Schools San

Law &amp; CS http://en.wikipedia.org/wiki/File:CourtEqualJustice.JPG Which building is

Global Perspectives on Law, Policy, and Mobility Innovation 9:00 AM - 6:30 PM UNIVERSITY OF

The Law and Economics of Blockholder Disclosure Lucian Bebchuk &amp; Robert J. Jackson, Jr.

The Human Rights Act Mechanics + key concepts (October 2019) UNCLASSIFIED Take Away Message

Foundations for Evidence-Based Policymaking Act of 2018 Evidence Act Background Begins to

Understanding the CARES Act Programs for Employers The CARES Act $250 billion in $260 billion

Wednesday, October 10, 2018 AUGUST 2018-2019 REVENUE VARIANCE -$1.0 $0.2 $0.0 $0.1 -$1.3

Sambuz

Useful Links

Newsletter

Mail Us

Law & CS http://en.wikipedia.org/wiki/File:CourtEqualJustice.JPG Which building is

The Law and Economics of Blockholder Disclosure Lucian Bebchuk & Robert J. Jackson, Jr.