Detecting mixtures in multivariate extremes S.H.A. Tendijck - PowerPoint PPT Presentation

Detecting mixtures in multivariate extremes S.H.A. Tendijck Lancaster University January 31, 2020

Motivating application 2 / 16

Two types of waves Swell versus wind waves: 3 / 16

Contents 1 Crash course in underlying theory 2 My model

Overview 1 Crash course in underlying theory 2 My model 5 / 16

Univariate extremes 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 -10 -5 0 5 10 x 6 / 16

Univariate extremes 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 5 10 0.02 0 -10 -5 0 5 10 x 6 / 16

Multivariate extremes 7 / 16

Conditional extremes 8 / 16

Conditional extremes Heffernan-Tawn model: = α X + Y Z for ( X , Y ) on standard margins and Z some residual distribution, independent of X . 8 / 16

Conditional extremes Heffernan-Tawn model: Y ∣( X > u ) = α X + Z for ( X , Y ) on standard margins and Z some residual distribution, independent of X . 8 / 16

Conditional extremes Heffernan-Tawn model: Y ∣( X > u ) = α X + X β Z for ( X , Y ) on standard margins and Z some residual distribution, independent of X . 8 / 16

Conditional extremes Heffernan-Tawn model: Y ∣( X > u ) = α X + X β Z for ( X , Y ) on standard margins and Z some residual distribution, independent of X . Pros: ● It can capture both asymptotic dependence ( α = 1) and asymptotic independence ( α < 1); ● Many bivariate distributions follow this structure asymptotically; ● Extends well to multivariate distributions. 8 / 16

Conditional extremes Heffernan-Tawn model: Y ∣( X > u ) = α X + X β Z for ( X , Y ) on standard margins and Z some residual distribution, independent of X . Pros: ● It can capture both asymptotic dependence ( α = 1) and asymptotic independence ( α < 1); ● Many bivariate distributions follow this structure asymptotically; ● Extends well to multivariate distributions. Cons: ● It doesn’t capture mixture structures; ● Data needs to be on standard margins; ● Inconsistent in modelling X ∣ Y and Y ∣ X when both are large. 8 / 16

Mixtures in extremes 10 8 6 Y L 4 2 0 -2 2 4 6 8 10 12 14 X L 9 / 16

Mixtures in extremes The Heffernan-Tawn model extends to ⎧ ⎪ α 1 X + X β 1 Z 1 ⎪ with probability p ; Y ∣( X > u ) = ⎨ ⎪ α 2 X + X β 2 Z 2 with probability 1 − p . ⎪ ⎩ 9 / 16

Mixtures in extremes The Heffernan-Tawn model extends to ⎧ ⎪ α 1 X + X β 1 Z 1 ⎪ with probability p ; Y ∣( X > u ) = ⎨ ⎪ α 2 X + X β 2 Z 2 with probability 1 − p . ⎪ ⎩ What do we want: ● Fit the model; ● Estimate the number of mixture components; ● Estimate the mixture probabilities. Methods: 1 Quantile-Regression model ; 2 Fitting a Heffernan-Tawn mixture model directly. 9 / 16

Quantile Regression How do we estimate the 90% conditional quantile of Y given X ? 12 10 8 6 Y 4 2 0 -2 0 2 4 6 8 10 12 X 10 / 16

Quantile Regression How do we estimate the 90% conditional quantile of Y given X ? 12 10 8 6 Y 4 2 0 -2 0 2 4 6 8 10 12 X Minimise the L 1 distance to the line, while keeping 90% below. 10 / 16

Overview 1 Crash course in underlying theory 2 My model 11 / 16

My model We assume the Heffernan-Tawn model holds, i.e., Y ∣( X > u ) = α X + X β Z . Our quantile regression model is given by α x + x β z . q τ ( x ) = 12 / 16

My model We assume the Heffernan-Tawn model holds, i.e., Y ∣( X > u ) = α X + X β Z . Our quantile regression model is given by q τ ( x ) = c + α x + x β z . 12 / 16

My model We assume the Heffernan-Tawn model holds, i.e., Y ∣( X > u ) = α X + X β Z . Our quantile regression model is given by q τ ( x ) = c + α x + x β z . For stability, we fit simultaneously for τ = 0 . 05 , 0 . 15 ,..., 0 . 95. We get 13 estimated parameters: α, ˆ ( ˆ z 10 ) . β, ˆ c , ˆ z 1 ,..., ˆ 12 / 16

My model Logistic Model 15 10 Y 5 0 -5 2 4 6 8 10 12 X 12 / 16

My model We assume a mixture HT model holds, i.e., ⎧ ⎪ α 1 X + X β 1 Z 1 with probability 1 − p , ⎪ Y ∣( X > u ) = ⎨ ⎪ α 2 X + X β 2 Z 2 ⎪ with probability p . ⎩ where α 1 > α 2 . 13 / 16

My model We assume a mixture HT model holds, i.e., ⎧ ⎪ α 1 X + X β 1 Z 1 with probability 1 − p , ⎪ Y ∣( X > u ) = ⎨ ⎪ α 2 X + X β 2 Z 2 ⎪ with probability p . ⎩ where α 1 > α 2 . Our quantile regression model is given by ⎧ ⎪ c 1 + α 1 x + x β 1 z if τ > p , ⎪ q τ ( x ) ∼ ⎨ ⎪ c 2 + α 2 x + x β 2 z if τ < p , ⎪ ⎩ 13 / 16

My model We assume a mixture HT model holds, i.e., ⎧ ⎪ α 1 X + X β 1 Z 1 with probability 1 − p , ⎪ Y ∣( X > u ) = ⎨ ⎪ α 2 X + X β 2 Z 2 ⎪ with probability p . ⎩ where α 1 > α 2 . Our quantile regression model is given by ⎧ ⎪ c 1 + α 1 x + x β 1 z if τ > p , ⎪ q τ ( x ) ∼ ⎨ ⎪ c 2 + α 2 x + x β 2 z if τ < p , ⎪ ⎩ For stability, we fit simultaneously for τ = 0 . 05 , 0 . 15 ,..., 0 . 95. We get 17 estimated parameters: α 2 , ˆ β 1 , ˆ ( ˆ z 10 ) . p , ˆ α 1 , ˆ β 2 , ˆ c 1 , ˆ c 2 , ˆ z 1 , ..., ˆ 13 / 16

My model Asymmetric Logistic Model 15 10 5 Y 0 -5 -10 2 4 6 8 10 12 14 X 13 / 16

Estimating the number of components Best fit with 1 mixture(s) 10 8 6 4 Y 2 0 -2 -4 -6 2 3 4 5 6 7 8 9 10 11 X 14 / 16

Estimating the number of components Best fit with 1 mixture(s) Best fit with 2 mixture(s) 10 12 8 10 8 6 6 4 4 Y 2 Y 2 0 0 -2 -2 -4 -4 -6 -6 2 3 4 5 6 7 8 9 10 11 2 3 4 5 6 7 8 9 10 11 X X 14 / 16

Estimating the number of components Best fit with 1 mixture(s) Best fit with 2 mixture(s) Best fit with 3 mixture(s) 10 12 12 8 10 10 8 8 6 6 6 4 4 4 Y 2 Y Y 2 2 0 0 0 -2 -2 -2 -4 -4 -4 -6 -6 -6 2 3 4 5 6 7 8 9 10 11 2 3 4 5 6 7 8 9 10 11 2 3 4 5 6 7 8 9 10 11 X X X Question: How can we compare? 14 / 16

Estimating the number of components Best fit with 1 mixture(s) Best fit with 2 mixture(s) Best fit with 3 mixture(s) 10 12 12 8 10 10 8 8 6 6 6 4 4 4 Y 2 Y Y 2 2 0 0 0 -2 -2 -2 -4 -4 -4 -6 -6 -6 2 3 4 5 6 7 8 9 10 11 2 3 4 5 6 7 8 9 10 11 2 3 4 5 6 7 8 9 10 11 X X X Question: How can we compare? Method: 10-fold cross-validation. 14 / 16

Estimating the number of components 10 4 1.206 1.204 Cross-Validation Statistics 1.202 1.2 1.198 1.196 1.194 1.192 1.19 1.188 1 2 3 4 5 6 7 8 9 Number of components 14 / 16

Estimating the number of components 0.04 9 0.035 8 7 0.03 6 5 0.025 4 3 0.02 2 1 0.015 0.01 0.005 0 1.185 1.19 1.195 1.2 1.205 1.21 10 4 CV statistics density 14 / 16

Assessing the model fit 20 15 10 Y 5 0 -5 2 4 6 8 10 12 X 15 / 16

Assessing the model fit 20 15 10 Y 5 0 -5 2 4 6 8 10 12 X Method p ˆ 95% confidence interval Simulation Quantile Regression 15 / 16

Assessing the model fit 20 15 10 Y 5 0 -5 2 4 6 8 10 12 X Method p ˆ 95% confidence interval Simulation 1 . 11 ⋅ 1 e − 5 ( 1 . 00 , 1 . 21 ) ⋅ 1 e − 5 Quantile Regression 0 . 90 ⋅ 1 e − 5 ?? 15 / 16

Problems Is this method already perfect? 16 / 16

Problems Is this method already perfect? No, there are just a couple of minor issues: 16 / 16

Problems Is this method already perfect? No, there are just a couple of minor issues: 1 Cross-Validation statistics are not necessarily convex; 16 / 16

Problems Is this method already perfect? No, there are just a couple of minor issues: 1 Cross-Validation statistics are not necessarily convex; 2 Not trivial how to fit this framework into a Bayesian setting; 16 / 16

Detecting mixtures in multivariate extremes S.H.A. Tendijck - PowerPoint PPT Presentation

Detecting mixtures in multivariate extremes S.H.A. Tendijck Lancaster University January 31, 2020 Motivating application 2 / 16 Motivating application 2 / 16 Two types of waves Swell versus wind waves: 3 / 16 Contents 1 Crash course in

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Detecting seasonality changes in multivariate extremes of climatological time series Philippe

Detecting Spammers and Content Detecting Spammers and Content Detecting Spammers and Content

12/6/2013 Detecting Fakes Image Forensics: Detecting Forged Photos 1.Detecting photorealistic

Analysis of a model of elastic plastic mixtures (Prandtl-Reuss-mixtures) Project of Josef

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

EM-like algorithms for nonparametric estimation in multivariate mixtures Didier Chauveau MAPMO -

Release granular mushrooms Release granular mushrooms and dried mixtures and dried mixtures

The science of mixtures and separation techniques Rahul Bhambure PhD Scientist, Chemical

Mixtures of models Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory

Multivariate extremes in ensemble forecasting Hans Wackernagel MINES ParisTech, Fontainebleau

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Multivariate Linear Regression Max Turgeon STAT 4690Applied Multivariate Analysis

Robust Statistics Part 2: Multivariate location and scatter Peter Rousseeuw LARS-IASC School,

EVALUATION OF STUDENT PERFORMANCE WITH DATA MINING: AN APPLICATION OF ID3 AND CART ALGORITHMS

Coordination Request Capture exercise, Validation, and Correction 1 SpaceCap: First steps

The AI Thunderdome Using OpenStack to accelerate AI training with Sahara, Spark, and Swift Sean

Phantom project Alexandre Ancel 2 Alexandre Fortin 1 Simon Garnotel 3 Olivia Miraucourt 1

Machine Learning 101 QCon SF 2019 Grishma Jena Data Scientist, IBM @DebateLover About me

RESPONSIBLE BUSINESS RESPONSIBLE BUSINESS ROTORUA AQUATIC CENTRE ROTORUA AQUATIC CENTRE

NOBINA AB Investor presentation, September - November 2017 1 LARGEST PUBLIC TRANSPORT COMPANY

The multi rotor turbine Project Manager, Sren O. Lind 17-11-2016 This material is not for