Multivariate and Functional Robust Fusion Methods for Big Data B. - - PowerPoint PPT Presentation

multivariate and functional robust fusion methods for big
SMART_READER_LITE
LIVE PREVIEW

Multivariate and Functional Robust Fusion Methods for Big Data B. - - PowerPoint PPT Presentation

Example A general setup for RFM. Some applications of RFM Simulation results References Multivariate and Functional Robust Fusion Methods for Big Data B. Ghattas joint work with A. Cholaquidis and R. Fraiman Universit e dAix-Marseille


slide-1
SLIDE 1

1/32

Example A general setup for RFM. Some applications of RFM Simulation results References

Multivariate and Functional Robust Fusion Methods for Big Data

  • B. Ghattas

joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille

badihghattas@gmail.com

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-2
SLIDE 2

2/32

Example A general setup for RFM. Some applications of RFM Simulation results References

Outline

1 Example 2 A general setup for RFM. 3 Some applications of RFM 4 Simulation results

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-3
SLIDE 3

3/32

Example A general setup for RFM. Some applications of RFM Simulation results References

The problem

We address one of the important problems in Big Data, namely how to combine estimators from different subsamples by robust fusion procedures, when we are unable to deal with the whole sample. Our Idea A classic ‘divide and conquer’. Cases: Multivariate location and scatter matrix, the covariance

  • perator for functional data, and clustering problems.
  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-4
SLIDE 4

4/32

Example A general setup for RFM. Some applications of RFM Simulation results References

PLAN

1 Example 2 A general setup for RFM. 3 Some applications of RFM 4 Simulation results

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-5
SLIDE 5

5/32

Example A general setup for RFM. Some applications of RFM Simulation results References

Estimating the median

The median of a huge set of iid random variables {X1, . . . , Xn} with common density fX we split the sample into m subsamples of size l, n = ml. We calculate the median of each subsample and obtain m random variables Y1, . . . , Ym. Then we take the median of the set Y1, . . . , Ym It is clear that it does not coincide with the median of the whole

  • riginal sample {X1, . . . , Xn}, but it will be close.

What else could we say about this estimator regarding its efficiency and robustness?

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-6
SLIDE 6

6/32

Example A general setup for RFM. Some applications of RFM Simulation results References

Estimating the median

Each of the m variables Yi is a median of l iid variables having density fX. Suppose l = 2k + 1, Yi has a density given by: gY (y) = (2k + 1)! (k!)2 FX(t)k(1 − FX(t))kfX(t). If fX is uniform on [0, 1], it becomes hY (y) = (2k + 1)! (k!)2 tk(1 − t)k1[0,1](t), which corresponds to a Beta(k + 1, k + 1) distribution.

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-7
SLIDE 7

7/32

Example A general setup for RFM. Some applications of RFM Simulation results References

Estimating the median

Asymptotically, we have for the empirical median ˆ θ = med(X1, . . . , Xn) ∼ N(θ, V(ˆ θ)) where V(ˆ θ) = 1/(4nfX(θ)2), while for ˜ θRFM, the median of medians, ˜ θRFM ∼ N(θ, V(˜ θRFM)) where V(˜ θRFM) = 1/(4mhY (θ)2). For the uniform case, both are centred at 1/2, fX(0.5) = 1, and hY (0.5) = (1/2)2k(2k + 1)!/(k!2) ∼

  • 2k/π

So the relative loss of efficiency V(ˆ θ) V(˜ θRFM) → 1/π

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-8
SLIDE 8

8/32

Example A general setup for RFM. Some applications of RFM Simulation results References

PLAN

1 Example 2 A general setup for RFM. 3 Some applications of RFM 4 Simulation results

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-9
SLIDE 9

9/32

Example A general setup for RFM. Some applications of RFM Simulation results References

The RFM

{X1, . . . , Xn} of iid random elements in a metric space E. θ a parameter to estimate a) split the sample into m subsamples with n = ml {X1, . . . , Xl}, {Xl+1, . . . , X2l}, . . . , {X(m−1)l+1, . . . , Xlm}. b) Compute a robust estimate of θ on each subsample obtaining ˆ θ1, . . . , ˆ θm. c) Compute the final estimate ˜ θRFM by RFM combining ˆ θ1, . . . , ˆ θm by a robust approach. For instance ˜ θRFM can be the deepest point, or the average of the 40% deepest points, among ˆ θ1, . . . , ˆ θm

Table: Parameters estimation using RFM

Consistency, efficiency, robustness and computational time properties of the RFM ?

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-10
SLIDE 10

10/32

Example A general setup for RFM. Some applications of RFM Simulation results References

The Depth function

Let X be a r.v. taking values in a Banach space (E, · ), with p.d. PX, and x ∈ E. The depth of x with respect to PX is defined as follows: D(x, PX) = 1 −

  • EPX

X − x X − x

  • .

(1)

(see Chaudhuri [1996], Vardi and Zhang [2000], and extension to a very general setup by Chakraborty and Chaudhuri [2014]).

We can use it for the ”fusion” step of RFM with a suitable norm.

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-11
SLIDE 11

11/32

Example A general setup for RFM. Some applications of RFM Simulation results References Breakdown point for the RFM

Breakdown point

Following Donoho [1982], the finite-sample breakdown point, Definition Let x = {x1, . . . , xn} be a dataset, θ and unknown parameter lying in a metric space Θ, and ˆ θn = ˆ θn(x) an estimate based on x. Let Xp be the set

  • f all data sets y of size n having n − p elements in common with x:

Xp = {y : card(y) = n, card(x ∩ y) = n − p}, then the breakdown point of ˆ θn at x is ǫ∗

n(ˆ

θn, x) = p∗/n, where p∗ = max{p ≥ 0; ∀y ∈ Xp, ˆ θn(y) is bounded and also bounded away from the boundary ∂Θ, if ∂Θ = ∅}.

Its is the maximum percentage of outliers (located at the worst possible positions) we can have in a sample before the estimate breaks in the sense that it can be arbitrarily large (or close to the boundary of the parameter space).

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-12
SLIDE 12

12/32

Example A general setup for RFM. Some applications of RFM Simulation results References Breakdown point for the RFM

BP Analysis

Consider the case where the robust estimates over each subsample have the breakdown point 0.5. Let Bi = 1 if observation i is an outlier and 0 otherwise. Assume that the variables Bi are iid ∼ B(p) Let Sj = l

s=1 B(j−1)l+s the number of outliers in subsample number

j, for j = 1, . . . , m. The RFM estimator will break down if and only if there are more than m/2 cases where Sj is greater than k (recall that l = 2k + 1).

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-13
SLIDE 13

13/32

Example A general setup for RFM. Some applications of RFM Simulation results References Breakdown point for the RFM

To take a glance at the behaviour of the BP, n = 30000, X ∼ B(p) Split the sample in m subsamples, and for each one compute Sj Calculate the proportion of subsubsamples containing more than l/2

  • utliers; that is, the percentage of times the estimator breaks down

Repeat this experiment 5000 times and look at the average value of this propotion.

m p = 0.45 p = 0.49 p = 0.495 p = 0.499 5 0.0020 0.0820 0.3892 10 0.0088 0.1564 0.5352 30 0.0052 0.1426 0.5186 50 0.0080 0.1598 0.5412 100 0.0192 0.2162 0.6084 150 0.0278 0.2728 0.6780

As expected, the best possible choice would be to take m as small as possible.

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-14
SLIDE 14

14/32

Example A general setup for RFM. Some applications of RFM Simulation results References

PLAN

1 Example 2 A general setup for RFM. 3 Some applications of RFM 4 Simulation results

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-15
SLIDE 15

15/32

Example A general setup for RFM. Some applications of RFM Simulation results References

Three applications

Estimating multivariate location and scatter matrix, Estimating the covariance operator for functional data, and Clustering. Solutions for many other problems may be derived from these cases (Principal Components, for example, both for non-functional and functional data).

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-16
SLIDE 16

16/32

Example A general setup for RFM. Some applications of RFM Simulation results References Robust Fusion for mean and Cov in finite dimensional spaces

RFM for mean and covriance

{X1, . . . , Xn} iid in Rd. For the location parameters, use simple robust estimates, denoted by ˆ θ1, . . . , ˆ θm (see for instance Maronna et al. [2006]). Regarding the depth function we propose to use the empirical version , D(θ, Pm) = 1 −

  • 1

m

m

  • j=1

ˆ θj − θ ˆ θj − θ

  • ,

where Pm is the empirical distribution of {ˆ θ1, . . . , ˆ θm}, and · is the Euclidean distance. Equivalently, for the scatter matrix we use the depth function D(Σ, Pm) = 1 −

  • 1

m

m

  • j=1

ˆ Σj − Σ ˆ Σj − Σ

  • ,

where ˆ Σ1, . . . , ˆ Σm are robust estimators of the scatter matrix, Σ = max1≤i≤d d

j=1 |Σij| and Pm the empirical distribution of

{ˆ Σ1, . . . , ˆ Σm}.

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-17
SLIDE 17

17/32

Example A general setup for RFM. Some applications of RFM Simulation results References Robust Fusion for the covariance operator

Covariance Operators

Several robust and non-robust estimators exit (see for instance Chakraborty and Chaudhuri [2014] ). A simple robust estimator (Gordaliza [1991]) to use for each of the m subsamples, It may be implemented using parallel computing. It is based on the notion of impartial trimming of the norm in HS spaces (where the covariance operators are defined). Fix trimming proportion α, α% of the data are dropped Take the average of the rest of the data. Then, the RFM estimator is defined as the deepest point among the m Robust estimators.

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-18
SLIDE 18

18/32

Example A general setup for RFM. Some applications of RFM Simulation results References Robust fusion for cluster analysis

ITkM

Given A sample {X1, . . . , Xn} ⊂ Rd and a trimming level 0 < α < 1, ITkM looks for a set { ˆ m1, . . . , ˆ mk} ⊂ Rd and a partition of the space C0, C1, . . . , Ck that minimizes the loss function 1 n − [nα]

k

  • j=1
  • Xi∈Cj

Xi − ˆ mj2. The set C0 is the set of trimmed data (with cardinality [nα]).

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-19
SLIDE 19

19/32

Example A general setup for RFM. Some applications of RFM Simulation results References Robust fusion for cluster analysis

RFM for Clustering

INPUT : Data, k, α1,α2.

1 Split the sample into m subsamples (recall that n = ml). 2 To each subsample, apply the empirical version of α-ITkM with

α = α1 and obtain ˆ M1, . . . , ˆ Mm, each one with k points in Rd.

3 Apply the empirical version of α-ITkM with α = α2 to the set

∪m

i=1 ˆ

Mi.

4 Obtain the output of the algorithm ( ˆ

MRFM, ˆ rRMF).

5 Build the clusters assigning to each observation the nearest cluster.

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-20
SLIDE 20

20/32

Example A general setup for RFM. Some applications of RFM Simulation results References

PLAN

1 Example 2 A general setup for RFM. 3 Some applications of RFM 4 Simulation results

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-21
SLIDE 21

21/32

Example A general setup for RFM. Some applications of RFM Simulation results References Location and scatter matrix for finite dimensional spaces

Using an 8-core PC, Intel core i7-3770 CPU, 8GB of RAM, 64 bit processor, with R v. 3.3.0 under Ubuntu. 5-dimensional Gaussian distribution with a covariance Σij = 0.2, i = j. Outliers: 5-dimensional Cauchy distribution with independent coordinates centered at 50. p = 0.13 and p = 0.2. n ∈ {0.1e6, 5e6, 10e6} and m ∈ {100, 500, 1000, 10000}. We replicate each simulation case K = 5 times and report the average. The estimators obtained by the RFM are the values which maximize the corresponding depth functions. Maximization is done in both cases over the set of the m estimates obtained from the subsamples.

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-22
SLIDE 22

22/32

Example A general setup for RFM. Some applications of RFM Simulation results References Location and scatter matrix for finite dimensional spaces

Location and Covariance, ...

MLE = the empirical mean of the whole sample, D-MLE = the deepest mean, Av-ROB= the average of the robust means, RFM= the deepest robust estimator.

n m MLE D-MLE Av-ROB RFM MLE D-MLE Av-ROB RFM p = 0.13 p = 0.20 0.1 100 14.7 14.7 0.00696 0.0273 22.7 22.4 0.0063 0.0270 0.1 500 15.1 14.5 0.00810 0.0303 22.5 22.3 0.0077 0.0353 0.1 1000 14.3 14.5 0.00678 0.0461 21.8 22.3 0.0054 0.0495 0.1 10000 16.6 11.5 2.97000 0.1750 22.1 22.0 8.1000 0.5130 5 100 14.8 14.5 0.00083 0.0042 22.8 22.3 0.0011 0.0031 5 500 15.1 14.6 0.00078 0.0051 22.8 22.3 0.0009 0.0058 5 1000 14.5 14.6 0.00115 0.0072 25.2 22.4 0.0009 0.0069 5 10000 14.4 14.5 0.00145 0.0128 21.3 22.3 0.0011 0.0127 10 100 14.5 14.5 0.00088 0.0024 21.7 22.4 0.0009 0.0027 10 500 14.5 14.5 0.00076 0.0041 23.5 22.4 0.0008 0.0039 10 1000 16.1 14.5 0.00074 0.0053 21.9 22.3 0.0007 0.0044 10 10000 14.3 14.5 0.00093 0.0083 3.27e8 22.3 0.0009 0.0099

The performances of most estimators decrease in general with m.

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-23
SLIDE 23

23/32

Example A general setup for RFM. Some applications of RFM Simulation results References Location and scatter matrix for finite dimensional spaces

Results for Covariance

MLE = MLE estimator, Av-MLE = the mean of the MLE estimators from the subsamples, D-MLE= the deppest among the MLE estimators , ROB= the global robust estimate, RFM= the robust fusion estimate, and Av-ROB= the average of the robust estimates. Table: p = 0.2

n m T0 T1 MLE Av-MLE D-MLE ROB Av-ROB RFM 0.1 100 0.82 6.08 1.50e7 1.50e7 2230 0.428 0.428 0.528 0.1 500 0.79 17.80 6.52e5 6.52e5 2060 0.430 0.441 0.556 0.1 1000 0.83 34.40 8.01e5 8.01e5 2020 0.435 0.455 0.686 0.1 10000 0.57 1760.00 1.18e5 1.18e5 2150 0.436 701.000 1.720 5.0 100 31.60 33.30 1.47e7 1.47e7 12300 0.412 0.412 0.422 5.0 500 30.00 64.50 1.64e7 1.64e7 4250 0.415 0.415 0.431 5.0 1000 31.10 151.00 4.55e8 4.55e8 3060 0.413 0.414 0.448 5.0 10000 32.80 2530.00 1.76e8 1.76e8 2110 0.414 0.419 0.522 10.0 100 355.00 98.10 1.14e8 1.14e8 29000 0.413 0.413 0.417 10.0 500 101.00 143.00 1.51e9 1.51e9 6260 0.412 0.413 0.434 10.0 1000 127.00 140.00 4.01e7 4.01e7 3930 0.415 0.415 0.437 10.0 10000 47.00 2250.00 5.33e24 5.33e24 2250 0.414 0.417 0.447

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-24
SLIDE 24

24/32

Example A general setup for RFM. Some applications of RFM Simulation results References Covariance operator

Simulations

Simulation model used in Kraus and Panaretos [2012]: X(t) = µ(t) + √ 2

10

  • k=1

λkak sin(2πkt) + √ 2

10

  • k=1

νkbk cos(2πkt), where νk = 1

3

k , λk = k−3, ak and bk are random standard Gaussian independent

  • bservations.

For main observations we use µ(t) = 0, for the outliers we µ(t) = 2 − 8 sin(πt). For t we used an equally spaced grid of T = 20 points in [0, 1]. The true: Cov(s, t) = 10

k=1 Ak(s)Ak(t) + Bk(s)Bk(t), where

Ak(t) = √ 2λk sin(2πkt) and Ak(t) = √ 2νk cos(2πkt). n ∈ {0.1e6, 1e6, 5e6, 10e6} and m ∈ {100, 500, 1000, 10000}. The proportion of outliers was fixed to p = 0.13 and p = 0.15. We replicate each simulation case K = 5 times and report the average.

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-25
SLIDE 25

25/32

Example A general setup for RFM. Some applications of RFM Simulation results References Covariance operator

Simulated dataset

0.0 0.2 0.4 0.6 0.8 1.0 −4 −2 2 4 Ts t(rbind(X, Y))

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-26
SLIDE 26

26/32

Example A general setup for RFM. Some applications of RFM Simulation results References Covariance operator T0= Average time in seconds for the whole sample robust estimate, T1= Average time in seconds for RFM (Estimates over subsamples + Fusion). MLE = MLE estimator, Av-MLE = the mean of the MLE estimators from the subsamples, D-MLE= the deppest among the MLE estimators , ROB= the global robust estimate, RFM= the robust fusion estimate, and Av-ROB= the average of the robust estimates.

Table: Cov. Operator estimates, p = 0.2, T = 20

n m T0 T1 MLE Av-MLE D-MLE ROB Av-ROB RFM 0.05 20 572 17.90 30.5 30.5 30.9 0.879 3.96 1.45 0.05 50 649 7.88 30.5 30.5 31.3 0.876 7.34 2.10 0.05 100 633 4.61 30.5 30.5 31.6 0.839 8.86 2.43 0.05 1000 478 19.50 30.5 30.5 32.3 0.864 13.10 7.08 0.10 20 1970 69.10 30.4 30.4 30.6 0.914 3.83 1.36 0.10 50 2030 28.10 30.4 30.4 31.1 0.921 4.32 1.55 0.10 100 2020 15.10 30.4 30.4 31.3 0.840 8.44 2.35 0.10 1000 1840 21.60 30.4 30.4 32.9 0.961 12.10 5.20

For p = 0.15, the Av-ROB still behaves well, better than RFM. But if we increase the proportion of outliers to p = 0.2, RFM clearly outperforms all the other estimators.

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-27
SLIDE 27

27/32

Example A general setup for RFM. Some applications of RFM Simulation results References Clustering

The dataset

Model used in Cuesta-Albertos et al. [1997]. Bivariate Gaussian distributions with the following parameters for the clusters and the outliers respectively: µ1 = (0, 0), µ2 = (0, 10), µ3 = (6, 0), µ4 = (2, 10/3), Σ1 = Σ2 = Σ3 = 1.5 ∗ Id, Σ4 = 20 ∗ Id The outliers were generated with µ4, Σ4 and n4. The sizes of the clusters are fixed with the following values: n1 = 15, n2 = 30, n3 = 30, n4 = 40. Outliers lying in the 75% level confidence ellipsoids of the clusters were replaced by others not beloging to that area. The outliers represent almost 35% of the whole sample. We use this base simulation and multiply each ni by a factor fac ∈ {10, 100, 1000, 10000}. m ∈ {10, 50, 100, 1000, 10000} with the restriction m < fac. Lastly, for ITkM we test α1 ∈ {0.2, 0.35, 0.45}, whereas for the fusion we fixed α2 = 0.1.

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-28
SLIDE 28

28/32

Example A general setup for RFM. Some applications of RFM Simulation results References Clustering

The dataset - illustration

  • ● ●
  • −15

−5 5 10 15 −10 −5 5 10 15 20

True Clusters

  • ● ●
  • −15

−5 5 10 15 −10 −5 5 10 15 20

TkMeans output

  • ● ●
  • ● ●
  • −15

−5 5 10 15 −10 −5 5 10 15 20

RFM output

Figure: True clusters, Clusters obtained by Global ITkM and by RFM, n = 11500, and m = 100. Blue points=outliers.

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-29
SLIDE 29

29/32

Example A general setup for RFM. Some applications of RFM Simulation results References Clustering

The matching Error

The Matching Error ME = min

s∈S

1 n

n

  • i=1

1{yi=s(ˆ

yi)}

where S is the set of all possible permutations of the classes label, yi is the true cluster of observation i, and ˆ yi is the one assigned by the algorithm.

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-30
SLIDE 30

30/32

Example A general setup for RFM. Some applications of RFM Simulation results References Clustering

n m T0 T1 T2 ME1 ME2 α1 = 0.2 1150 10 2.899 1.34 0.555 0.1539 0.1678 11500 10 21.200 21.69 6.835 0.1594 0.1603 11500 100 21.200 14.65 4.242 0.1594 0.1693 115000 10 274.900 263.80 75.110 0.1585 0.1585 115000 100 274.900 218.10 56.440 0.1585 0.1591 115000 1000 274.900 141.50 37.510 0.1585 0.1693 1150000 10 3452.000 3149.00 873.400 0.1582 0.1582 1150000 100 3452.000 2609.00 680.700 0.1582 0.1583 1150000 1000 3452.000 2158.00 546.900 0.1582 0.1590 1150000 10000 3452.000 1434.00 374.700 0.1582 0.1689 α1 = 0.35 1150 10 3.447 1.426 0.5412 0.1287 0.1310 11500 10 37.870 33.380 9.8990 0.1037 0.1071 11500 100 37.870 15.300 4.2860 0.1037 0.1343 115000 10 427.700 391.100 109.6000 0.1049 0.1050 115000 100 427.700 307.200 85.7000 0.1049 0.1071 115000 1000 427.700 137.700 38.3600 0.1049 0.1331 1150000 10 4925.000 4284.000 1166.0000 0.1052 0.1053 1150000 100 4925.000 3660.000 928.2000 0.1052 0.1055 1150000 1000 4925.000 3052.000 792.9000 0.1052 0.1074 1150000 10000 4925.000 1397.000 372.2000 0.1052 0.1336 α1 = 0.45 10 10 2.723 1.27 0.5158 0.1330 0.1567 100 10 55.580 34.12 9.8050 0.1370 0.1403 100 100 55.580 13.11 3.6500 0.1370 0.1723 1000 10 698.900 586.60 170.4000 0.1325 0.1330 1000 100 698.900 323.90 86.3500 0.1325 0.1355 1000 1000 698.900 122.50 33.5300 0.1325 0.1729 10000 10 7190.000 7087.00 2115.0000 0.1327 0.1328 10000 100 7190.000 5654.00 1508.0000 0.1327 0.1330 10000 1000 7190.000 3287.00 829.6000 0.1327 0.1360 10000 10000 7190.000 1258.00 328.1000 0.1327 0.1726

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-31
SLIDE 31

31/32

Example A general setup for RFM. Some applications of RFM Simulation results References Clustering

Conclusion and ...

A general framework adjustable for a wide variabilty of problems Encouraging results when working in presence of outliers Other attractive problems; convex hull, detecting communities in graphs. GRACIAS

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data

slide-32
SLIDE 32

32/32

Example A general setup for RFM. Some applications of RFM Simulation results References Clustering

Bibliography

Anirvan Chakraborty and Probal Chaudhuri. The spatial distribution in infinite dimensional spaces and related quantiles and depths. Ann. Statist., 42(3): 1203–1231, 06 2014. doi: 10.1214/14-AOS1226. Probal Chaudhuri. On a geometric notion of quantiles for multivariate data. Journal

  • f the American Statistical Association, 91(434):862–872, 1996.
  • J. A. Cuesta-Albertos, A. Gordaliza, and C. Matr´
  • an. Trimmed k-means: an attempt

to robustify quantizers. Ann. Statist., 25(2):553–576, 04 1997. doi: 10.1214/aos/1031833664. D.L. Donoho. Breakdown properties of multivariate location estimators. PhD thesis,

  • Dep. Statistics, Harvard University, 1982.

Alfonso Gordaliza. Best approximations to random variables based on trimming

  • procedures. Journal of Approximation Theory, 64(2):162 – 180, 1991. ISSN

0021-9045. doi: https://doi.org/10.1016/0021-9045(91)90072-I. David Kraus and Victor M. Panaretos. Dispersion operators and resistant second-order functional data analysis. Biometrika, 99(4):813–832, 2012. ISSN 1464-3510. doi: 10.1093/biomet/ass037.

  • R. Maronna, R. Martin, and V. Yohai. Robust Statistics: Theory and Methods.

Wiley Series in Probability and Statistics., 2006. Yehuda Vardi and Cun-Hui Zhang. The multivariate L1-median and associated data

  • depth. Proceedings of the National Academy of Sciences, 97(4):1423–1426,

February 2000. doi: 10.1073/pnas.97.4.1423.

  • B. Ghattas joint work with A. Cholaquidis and R. Fraiman

Universit´ e d’Aix-Marseille Multivariate and Functional Robust Fusion Methods for Big Data