lion people,
Aaron Clauset @aaronclauset Assistant Professor, Computer Science and BioFrontiers Institute, University of Colorado Boulder External Faculty, Santa Fe Institute
Challenges of forecasting with fat tailed data
15 October 2013 1
Challenges of forecasting with fat tailed data Aaron Clauset - - PowerPoint PPT Presentation
Challenges of forecasting with fat tailed data Aaron Clauset @aaronclauset Assistant Professor, Computer Science and BioFrontiers Institute, University of Colorado Boulder External Faculty, Santa Fe Institute 15 October 2013 lion people, 1
15 October 2013 1
2
3
4
10 10
2
10
4
10
5
10
4
10
3
10
2
10
1
10 P(x) (a) 10 10
1
10
2
10
4
10
3
10
2
10
1
10 (b) 10 10
1
10
2
10
3
10
4
10
2
10 (c) metabolic 10 10
2
10
4
10
5
10
4
10
3
10
2
10
1
10 P(x) (d) Internet 10 10
2
10
4
10
6
10
6
10
4
10
2
10 (e) calls words proteins 10 10
1
10
2
10
3
10
2
10
1
10 (f) wars 10 10
2
10
4
10
5
10
4
10
3
10
2
10
1
10 P(x) (g) terrorism 10
2
10
4
10
6
10
8
10
6
10
4
10
2
10 (h) 10 10
1
10
2
10
3
10
2
10
1
10 (i) species 10 10
2
10
4
10
6
10
3
10
2
10
1
10 x P(x) (j) 10
3
10
4
10
5
10
6
10
7
10
3
10
2
10
1
10 x (k) 10
6
10
7
10
3
10
2
10
1
10 x (l) HTTP birds blackouts book sales 10 10
2
10
4
10
6
10
8
10
5
10
4
10
3
10
2
10
1
10 P(x) (m) 10 10
1
10
2
10
3
10
4
10
3
10
2
10
1
10 (n) 10 10
2
10
4
10 10
6
10
5
10
4
10
3
10
2
10
1
10 (o) cities email fires 10
1 10 2 10 3 10 4 10 5 10 6
10
5
10
4
10
3
10
2
10
1
10 P(x) (p) flares 10 10
2
10
4
10
6
10
8
10
5
10
4
10
3
10
2
10
1
10 (q) 10
6
10
7
10
8
10 10
2
10
1
10 (r) religions 10
4
10
5
10
6
10
7
10
4
10
3
10
2
10
1
10 P(x) (s) surnames 10
8
10
9
10
10
10
11
10
3
10
2
10
1
10 (t) 10 10
1
10
2
10
3
10 10
6
10
5
10
4
10
3
10
2
10
1
10 (u) citations 10 10
1
10
2
10
3
10
6
10
5
10
4
10
3
10
2
10
1
10 x P(x) (v) authors 10
0 10 1 10 2 10 3 10 4 10 5
10
5
10
4
10
3
10
2
10
1
10 x (w) web hits 10 10
2
10
4
10
6
10
10
10
8
10
6
10
4
10
2
10 x (x) web links quakes wealth
5
6
7
50 100 150 200 250 300 1 2 3 4 5 6 7 8 9 Magnitude, M Earthquake number 1 2 3 4 5 6 0.001 0.01 0.1 1 Proportion M
8
20 40 60 80 10
3
10
4
10
5
10
6
10
7
10
8
10
9
10
10
Interstate war number (1816 − 2007) Battle deaths, S (severity) WWI WWII
10
3
10
4
10
5
10
6
10
7
10
8
10
−2
10
−1
10 Proportion S
9
2000 4000 6000 8000 10000 10
1
10
2
10
3
10
4
Severity, S (deaths) Attack number (Jan 1998−2008)
9−11
1 2 3 4 10
−4
10
−3
10
−2
10
−1
10 Proportion S
10
11
12
13
12280 957 36 1
RAND-MIPT event database
14
12280 957 36 1
RAND-MIPT event database
15
16
17
18
19
10 10
1
10
2
10
3
10
4
10
−5
10
−4
10
−3
10
−2
10
−1
10 Pr(Xx) severity, x (deaths)
9/11
Terrorism event data from RAND-MIPT Terrorism Knowledge Base (2008). 40 years of data (1968-2007) Worldwide (~200 countries) 13,274 deadly events Each event is localized in time and space, and MIPT records its severity (deaths). 9/11 recorded as three events; the NYC event records 2749 deaths.
20
Choose such that is minimized. Here, we let be the KS-statistic.
10 10
1
10
2
10
3
10
4
10
−5
10
−4
10
−3
10
−2
10
−1
10 Pr(Xx) severity, x (deaths)
9/11
d h S(x ≥ y), F(x ≥ y | ˆ θ) i xmin = y d[·, ·]
21
10 10
1
10
2
10
3
10
4
10
−5
10
−4
10
−3
10
−2
10
−1
10 Pr(Xx) severity, x (deaths)
9/11
Let for values . For the empirical data, we estimate , with . This yields 994 tail events (7.5%). A Monte Carlo hypothesis test finds , meaning the power law cannot be rejected as a model of these data. A likelihood ratio test finds the stretched exponential and log- normal distributions also plausible.
Pr(x) ∝ x−α x ≥ xmin ˆ α = 2.4 ± 0.1 p = 0.68 ± 0.03 xmin = 10
22
10 10
1
10
2
10
3
10
4
10
−5
10
−4
10
−3
10
−2
10
−1
10 Pr(X x) severity, x (deaths)
9/11
2.2 2.4 2.6 Pr()
Given observed event sizes, generate by drawing , , uniformly at random, with replacement from the observed events. For each tail model the MLE parameter choice is deterministic. The produces a bootstrap distribution that capture the statistical uncertainty within this model.
Y n yj j = 1, . . . , n X = {xi} Pr(x | θ, xmin) θ(Y, xmin) Pr(θ, xmin)
23
10 10
1
10
2
10
3
10
4
10
−5
10
−4
10
−3
10
−2
10
−1
10 severity, x (deaths) Pr(Xx)
9/11
Repeat the above steps, but with additional tail models. Here, we choose: Stretched exponential Log-normal Both of which cannot be rejected, under a LRT, as a model of events . Multiple tail models better represents model uncertainty.
Pr(x) ∝ xβ−1e−λx−β Pr(x) ∝ 1 xe− (ln x−µ)2
2σ2
x ≥ xmin = 10
24
0.2 0.4 0.6 0.8 1 Probability(1+ catastrophic event, 1968−2007) Density
power law (1) power law (2) stretched exp. log−normal
In each sample, the probability of a tail event is And the number of such events will be The probability that at least one is “catastrophic” (size ) is thus where is the fitted cdf. Via Monte Carlo, we may construct a distribution over these estimates for each tail model. If , call the event an outlier.
ntail ∼ Binomial(n, ptail) ptail = #{xi ≥ xmin}/n Pr(ρ) ≥ x ρ = 1−F(x | θ(Y, xmin))ntail F(x | θ) hρi < 0.01
25
0.2 0.4 0.6 0.8 1 Probability(at least one catastrophic event, 1968−2007) Density
power law (1) power law (2) stretched exp. log−normal
26
27
28
29
30
1999 2001 2003 2005 2007 2009 10
2
10
3
10
4
Year Deadly events per year
status quo pessimistic
past future World World − {Iraq,Afghan.} 1999 2001 2003 2005 2007 ... 2021 0.1 Pr(X10)
31
32
33
34
35
10
−2
10
−1
10 10
1
10
2
10
3
10
−5
10
−4
10
−3
10
−2
10
−1
10 Return size / xmin Pr(X x) 2 3 4 5
scaling exponent, Density
For each sample, estimate power-law tail parameters and Plot shows normalized return exceedance distribution Inset shows distribution of estimated power-law exponents , whose mean is . But, only about half of these are plausible power laws, with
xmin α Pr(X ≥ x/xmin) Pr(α) hαi = 3.77
AL, AZN, BAA, BLT, BOC, BOOT, BSY, CPI, GUS, HAS, HG, III, ISYS, LLOY, PRU, PSON, RB, REED, RIO, RTO, RTR, SBRY, SHEL, SSE, TSCO, UU, VOD
p ≥ 0.1
36
37
10 10
1
10
2
10
3
10
4
10
−5
10
−4
10
−3
10
−2
10
−1
10 Pr(X x) severity, x (deaths)
9/11
2.2 2.4 2.6 Pr()
38
10 10
1
10
2
10
3
10
4
10
−5
10
−4
10
−3
10
−2
10
−1
10 Pr(X x) severity, x (deaths)
9/11
2.2 2.4 2.6 Pr()
39
10 10
1
10
2
10
3
10
4
10
−5
10
−4
10
−3
10
−2
10
−1
10 Pr(X x) severity, x (deaths)
9/11
2.2 2.4 2.6 Pr()
40
10
−2
10
−1
10 10
1
10
2
10
3
10
−5
10
−4
10
−3
10
−2
10
−1
10 Return size / xmin Pr(X x) 2 3 4 5
scaling exponent, Density
41
http://arxiv.org/abs/1103.0949 http://arxiv.org/abs/0706.1062
10
−2
10
−1
10 10
1
10
2
10
3
10
−5
10
−4
10
−3
10
−2
10
−1
10 Return size / xmin Pr(X x) 2 3 4 5
scaling exponent, Density
42
with R. Woodard Annals of Applied Statistics (2013) arxiv:1209.0089
with C.R. Shalizi, A.Z. Jacobs & K.L. Klinkner arxiv:1103.0949 (2011)
with M. Young & K.S. Gleditsch Peace Economics, Peace Science and Public Policy (2010)
with C.R. Shalizi & M.E.J. Newman SIAM Review (2009)
with M. Young & K.S. Gleditsch Journal of Conflict Resolution (2007)
43
44
45
46
47
48
49
50
51
52
53
54
−4
−3
−2
−1
−2
−1
1
55
56
10 10
1
10
2
10
3
10
4
10
−5
10
−4
10
−3
10
−2
10
−1
10 Pr(X x) severity, x (deaths)
9/11
Global Terrorism Database power−law models
0.2 0.4 0.6 0.8 1 Pr(p)
57
10 10
1
10
2
10
3
10
4
10
−5
10
−4
10
−3
10
−2
10
−1
10 Pr(X x) severity, x (deaths) chemical/biological power−law models
0.05 0.1 0.15 0.2 Pr(p)
10 10
1
10
2
10
3
10
4
10
−5
10
−4
10
−3
10
−2
10
−1
10 Pr(X x) severity, x (deaths) explosives power−law models
0.2 0.4 0.6 0.8 1 Pr(p)
10 10
1
10
2
10
3
10
4
10
−5
10
−4
10
−3
10
−2
10
−1
10 Pr(X x) severity, x (deaths) fire power−law models
0.1 0.2 0.3 0.4 0.5 Pr(p)
10 10
1
10
2
10
3
10
4
10
−5
10
−4
10
−3
10
−2
10
−1
10 Pr(X x) severity, x (deaths) firearms power−law models
0.1 0.2 0.3 0.4 0.5 Pr(p)
10 10
1
10
2
10
3
10
4
10
−5
10
−4
10
−3
10
−2
10
−1
10 Pr(X x) severity, x (deaths) knives power−law models
0.015 0.03 0.045 Pr(p)
10 10
1
10
2
10
3
10
4
10
−5
10
−4
10
−3
10
−2
10
−1
10 Pr(X x) severity, x (deaths)
power−law models
0.1 0.2 0.3 0.4 0.5 Pr(p)
weapon type historical ˆ p 90% CI
0.023 [0.000, 0.085] explosives 0.374 [0.167, 0.766] fire 0.137 [0.012, 0.339] firearms 0.118 [0.015, 0.320] knives 0.009 [0.001, 0.021]
0.055 [0.000, 0.236] any 0.564 [0.338, 0.839]
58
10 10
1
10
2
10
3
10
4
10
−4
10
−3
10
−2
10
−1
10 Pr(X x) severity, x (deaths)
9/11
international events power−law models
0.2 0.4 0.6 0.8 1 Pr(p)
10 10
1
10
2
10
3
10
4
10
−3
10
−2
10
−1
10 Pr(X x) severity, x (deaths)
9/11
OECD events power−law models
0.03 0.06 0.09 Pr(p)
59
Deaths per million people, 1998-2007
60
10 30 50 70 90
Wars, x1000 Wars, x7061 1823 1853 1883 1913 1943 1973 2003 0.5 Year Pr(X7061)
61
10
3
10
4
10
5
10
6
10
7
10
8
10
−2
10
−1
10 Pr(X x) x war severities power−law models
1.2 1.4 1.6 1.8 2 Pr()
62