[PPT] - Practical Issues in Applications of Multivariate Extreme Values PowerPoint Presentation

SLIDE 1

Practical Issues in Applications of Multivariate Extreme Values Jonathan Tawn with Caroline Keef and Mark Latham Lancaster, UK

SLIDE 2

Two Applications

Sea-surge data

Modelling of surge process over space for joint flood risk assessment for coastal sites and for offshore sites needed for insurance industry

SLIDE 3

Two Applications

Sea-surge data

Modelling of surge process over space for joint flood risk assessment for coastal sites and for offshore sites needed for insurance industry

River flow data

Modelling of river flow for network for joint flood risk assessment for planning purposes and insurance

SLIDE 4

Surge Data Hindcast output from the CSX model, a 2d numerical surge model for the European Continental Shelf forced by DNMI pressure data for the period 1955-2000 Data are: hourly maxima over 5-day blocks for 46 years at 259 sites

SLIDE 5

River Flow Data Daily river flows for a network of sites in River Thames catchment in UK

Altitude < 100m Altitude > 100m River Flow gauge Rain gauge 150 200 250 Northing (km) 400 450 500 550 Easting (km) Great Britain

SLIDE 6

Marginal Standardisation and Notation X: univariate variable of most interest Y: d-dimensional variable Transform marginals to Gumbel distributions Pr(X > x) = Pr(Yi > x) ∼ exp(−x) as x → ∞ for i = 1, . . . , d Lack of Memory Property Pr(X > t + x) ∼ exp(−t) Pr(X > x) for large x Allows focus on dependence structure

SLIDE 7

Standardisation for Surge Data A large surge event on the Danish coast in original and transformed margins

East North

+

−0.8 0.688 2.175 3.663 4.317 5.15 East North

+

−2.2 0.4 3 5.6 6.744 8.2

SLIDE 8

What is the Aim of Analysis?

Sea-surge data

Simulation of surge events large at a given location Estimation of spatial risk measure E(#{Y > x} | X > x) Dimension reduction for physical understanding

SLIDE 9

What is the Aim of Analysis?

Sea-surge data

Simulation of surge events large at a given location Estimation of spatial risk measure E(#{Y > x} | X > x) Dimension reduction for physical understanding

River flow data

Estimation of Pr(Y > x | X > x)

SLIDE 10

Schematic of Threshold Approach Under assumption of asymptotic dependence lim

x→∞ Pr(Y > x | X > x) > 0

the following homogeneity property holds for all sets A extreme in at least one variable Pr((X, Y) ∈ t + A) ≈ exp(−t) Pr((X, Y) ∈ A)

A A X Y u u t t+

x x x x x x xx x x x x x x x x x x x x x x x x x xx xx x x x x x x x xx xx x xx x x x x x x x xx xx xxx x x x x x x x x x x x xx x x x x x x x x x x x x x xx x x x x x xx x x x

SLIDE 11

Is Surge Process Asymptotically Dependent? X: Danish Site

East North

+

East North

+

East North

+

−2.2 0.4 3 5.6 6.744 8.2

SLIDE 12

Is Surge Process Asymptotically Dependent? X: UK Site

East North

+

East North

+

East North

+

−2.2 0.4 3 5.6 6.744 8.2

SLIDE 13

Sites Significant on Testing for Asymptotic Dependence X: Danish Site

East North * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O *

SLIDE 14

Sites Significant on Testing for Asymptotic Dependence X: UK Site

East North * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O *

SLIDE 15

Problems for River Flow Application Plot of data availability for Thames catchment sites

Year 1960 1969 1980 1990 2000 dmf39001 dmf39008 dmf39016 dmf39019 dmf39020 dmf39025 dmf39046 dmf39081 dmf39130

SLIDE 16

Regression Interpretation of Threshold Method For X > u Y = X + Z where Z is independent of X ˆ Pr((X, Y) ∈ t + A) = exp(−v) ∞

v

1 m

m

i=1

1{(x,x+zi)∈t+A} exp(−x)dx

A A X Y u u t t+

x x x x x x xx x x x x x x x x x x x x x x x x x xx xx x x x x x x x xx xx x xx x x x x x x x xx xx xxx x x x x x x x x x x x xx x x x x x x x x x x x x x xx x x x x x xx x x x

SLIDE 17

Extension of Regression/Conditional Method Heffernan and Tawn (2004,JRSS B) For X > u Y = aX + X bZ where Z is independent of X d-dimensional parameters 0 ≤ a ≤ 1 and b Nonparametric model for Z

X Y u u

x x x x x x xx x x x x x x x x x x x x x x x x x xx xx x x x x x x x xx xx x xx x x x x x x x xx xx xxx x x x x x x x x x x x xx x x x x x x x x x x x x x xx x x x x x xx x x x

Z

SLIDE 18

Theoretical Examples Y = aX + X bZ Asymptotic Dependence a = 1 and b = 0 Asymptotic Independence with Yj aj < 1 Multivariate Normal Copula aj = ρ2

j and bj = 1

2 for j = 1, . . . , d

SLIDE 19

Estimates of a X: Danish Site

East North 0.25 0.5 0.75 0.86 1

SLIDE 20

Estimates of a X: UK Site

East North 0.25 0.5 0.75 0.86 1

SLIDE 21

Which Sites are Asymptotically Dependent? Test aj = 1, bj = 0 X: Danish Site

East North * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O *

SLIDE 22

Search for Parsimonious Model Dimension of model parameters currently 259 × 258 × 2 Dimension Reduction helpful/insightful

SLIDE 23

Search for Parsimonious Model Dimension of model parameters currently 259 × 258 × 2 Dimension Reduction helpful/insightful How many sites do we need to condition on to get all sites asymptotically dependent on a conditioning site?

SLIDE 24

Search for Parsimonious Model Dimension of model parameters currently 259 × 258 × 2 Dimension Reduction helpful/insightful How many sites do we need to condition on to get all sites asymptotically dependent on a conditioning site?

East North

* * * * * *

SLIDE 25

Parsimonious Spatial Model Partition (X, Y) = (XC, YC) where XC the six conditioning sites YC the remaining sites Then [XC, YC] = [XC][YC | XC] where [XC] is low dimensional, and [YC | XC] is simpler due to asymptotic dependence property Extremes for [YC] only arise when [XC] is extreme in at least

nly component

SLIDE 26

Spatial Risk Measure E(#{Y > x} | X > x) where x is the 97% quantile Comparison of empirical, global model, parsimonious model

East North East North East North 29 69.25 109.5 149.75 167.46 190

SLIDE 27

Extrapolation of Spatial Risk Measure E(#{Y > x} | X > x) where x is the 97% and 99.9% quantiles for global model

East North East North 29 69.25 109.5 149.75 167.46 190

SLIDE 28

Simulated Fields on Original Scale Exceeds 1000 year level on Danish coast site

East North

+

East North

+

East North

+

0.4 2.05 3.7 5.35 6.076 7

SLIDE 29

Simulated Fields on Original Scale Exceeds 1000 year level on UK coast site

East North

+

East North

+

East North

+

0.4 2.05 3.7 5.35 6.076 7

SLIDE 30

Handling Missing Data for River Flows Partition Y = (YM, YO) where YM missing; YO observed Also Z = (ZM, ZO) Then need to model [ZM | ZO] Approach is: empty

SLIDE 31

Handling Missing Data for River Flows Partition Y = (YM, YO) where YM missing; YO observed Also Z = (ZM, ZO) Then need to model [ZM | ZO] Approach is: empty

Transform margins

ZN = T(Z) = Φ−1(ˆ F(Z))

SLIDE 32

Handling Missing Data for River Flows Partition Y = (YM, YO) where YM missing; YO observed Also Z = (ZM, ZO) Then need to model [ZM | ZO] Approach is: empty

Transform margins

ZN = T(Z) = Φ−1(ˆ F(Z))

Model dependence by MVN copula
ZN

M

ZN

O

∼ MVN
,
Σ11

Σ12 Σ21 Σ22

SLIDE 33

Handling Missing Data for River Flows Partition Y = (YM, YO) where YM missing; YO observed Also Z = (ZM, ZO) Then need to model [ZM | ZO] Approach is: empty

Transform margins

ZN = T(Z) = Φ−1(ˆ F(Z))

Model dependence by MVN copula
ZN

M

ZN

O

∼ MVN
,
Σ11

Σ12 Σ21 Σ22

Take a sample from this conditional distribution

[ˆ ZN

M | ZN O]

SLIDE 34

Handling Missing Data for River Flows Partition Y = (YM, YO) where YM missing; YO observed Also Z = (ZM, ZO) Then need to model [ZM | ZO] Approach is: empty

Transform margins

ZN = T(Z) = Φ−1(ˆ F(Z))

Model dependence by MVN copula
ZN

M

ZN

O

∼ MVN
,
Σ11

Σ12 Σ21 Σ22

Take a sample from this conditional distribution

[ˆ ZN

M | ZN O]

Back transform sample and downweight values in

sample ˆ ZM = T −1(ˆ ZN

M)

SLIDE 35

Example of Handling Missing Data Joint distribution model for Z = (Z1, Z2, Z3) with infilled sample to replace missing Z3 values

2 4 6 0.0 0.5 1.0 1.5 2.0 2.5 3.0

✂✁ ✄✆☎

1 2 3 4 5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

✞✝ ✄✆☎

SLIDE 36

Extrapolation with Missing Data Recall conditional model is for X > u Y = aX + X bZ Extrapolation: simulate X > v and independently simulate Z then join as above to give Y

−2 2 4 6 8 10 5 10 15

✁

✂☎✄

SLIDE 37

Simulation Study to Assess Infill Method Consider 3 different patterns of missingness with X : Full data; Y1 : 50%; Y2 : 90%; Y3 : 80%; 9 true distributions of Z Methods: Use overlapping data only ⋆ Infill method ◦ Compare Estimators of: Pi = Pr(Yi > x | X > x) for i = 1, 2, 3 by RMSE efficiency relative to the Full Data case

SLIDE 38

Efficiency Results for Handling Missing Data Results for P1, P2, P3 respectively The infill method does well!

eff

1 2 3 4 5 6 7 8 9 0.6 0.8 1.0

eff

1 2 3 4 5 6 7 8 9 0.6 0.8 1.0

eff

1 2 3 4 5 6 7 8 9 0.6 0.8 1.0