Level sets estimation of random compact sets P. Heinrich, R. S. - - PowerPoint PPT Presentation

level sets estimation of random compact sets
SMART_READER_LITE
LIVE PREVIEW

Level sets estimation of random compact sets P. Heinrich, R. S. - - PowerPoint PPT Presentation

Level sets estimation of random compact sets P. Heinrich, R. S. Stoica and C. V. Tran Universit e Lille 1 - Laboratoire Paul Painlev e Workshop in Spatial Statistics and Image Analysis in Biology Avignon, May 9-11 2012 Introduction :


slide-1
SLIDE 1

Level sets estimation of random compact sets

  • P. Heinrich, R. S. Stoica and C. V. Tran

Universit´ e Lille 1 - Laboratoire Paul Painlev´ e Workshop in Spatial Statistics and Image Analysis in Biology Avignon, May 9-11 2012

slide-2
SLIDE 2

Introduction : motivating example Level sets : a tool for compact random sets averaging Estimation of level sets Examples of application Conclusions and perspectives

slide-3
SLIDE 3

A practical application (1)

Pattern detection in spatial data :

◮ the data d : image analysis, epidemiology, galaxy catalogues ◮ detect and characterise the pattern “hidden” in the data :

  • bjects, cluster pattern or filamentary network

◮ hypothesis : the pattern is the outcome γ of a stochastic

process Γ

◮ possible solution in this context : probabilistic modelling and

maximisation

slide-4
SLIDE 4

A practical application (2)

Gibbs modelling framework

◮ Markov random fields, marked point processes, etc. ◮ general structure of the probability density :

h(γ|θ) = exp [−Ud(γ|θ) − Ui(γ|θ)] α(θ) and also the necessary mathematical details so that everything is well defined ...

slide-5
SLIDE 5

A practical application (3)

Gibbs modelling framework (continued)

◮ Ud(γ|θ) : this term is related to the objects location in the

data field (inhomogeneous process)

◮ Ui(γ|θ) : this term is related to the object interaction and to

the morphology of the pattern (prior model, regularisation term)

◮ α(θ) : normalisation constant (not always available

analytically)

◮ pattern estimator :

  • γ = arg max

γ∈Ω{h(γ|θ)} = arg min γ∈Ω{Ud(γ|θ) + Ui(γ|θ)}

(1)

slide-6
SLIDE 6

A practical application (4)

Some concluding remarks

◮ simulated annealing algorithm : convergence towards the

uniform distribution on the solution sub-space given by (1)

◮ the model parameters are not always known ... ◮ the convergence is difficult to be stated ◮ ... or the solution is not always unique (continuous models

and/or priors on the model parameters)

◮ ⇒ a real need to average the obtained solution in order to

  • btain a much more robust solution

Idea : use level sets as a tool for averaging random patterns

slide-7
SLIDE 7

Level sets : basic notions and definitions (1)

Random compact sets and coverage function :

◮ (Ω, A, P) : probability space ◮ (W = [0, 1]d, B, ν) : measure space (... where the data field

leaves) with B the corresponding Borel σ−algebra and ν the Lebesgue measure

◮ C : the class of compact sets in W

A random compact set Γ in W is a random map from Ω to C that is measurable in the sense ∀C ∈ C, {ω : Γ(ω) ∩ C = ∅} ∈ A The coverage function is given by : p(w) = P(w ∈ Γ)

slide-8
SLIDE 8

Level sets : basic notions and definitions (2)

Level or Quantile sets : for α ∈ [0, 1] the (deterministic) α−level set is Qα = {w ∈ W : p(w) > α}

  • r for simplicity {p > α}.

Vorob’ev expectation : the Borel set EV Γ such that ν (EV Γ) = E [ν (Γ)] and {p > α∗} ⊂ EV Γ ⊂ {p ≥ α∗}, where α∗ = inf{α ∈ [0, 1] : ν (Qα) ≤ E [ν (Γ)]}. The Vorob’ev expectation is the α∗−level set that matches the mean volume of Γ.

slide-9
SLIDE 9

Some known results and properties (1)

✲ ✻

ν(W )

F−(α0) F(α0)

  • α0

α1 α2 1

Figure: Behaviour of function F(α) = ν (Qα)

Remarks :

◮ F is c`

adl` ag with constant regions (plateaux)

◮ constant regions of p(w) ⇒ discontinuities of ν (Qα) ◮ constant regions of ν (Qα) ⇒ discontinuities of p(w)

slide-10
SLIDE 10

Some known results and properties (2)

Vorob’ev expectation :

◮ it is unique provided F(α) = ν (Qα) = ν ({p > α}) is

continuous at α∗ ; then we have EV Γ = {p ≥ α∗}

◮ it minimises

B → E [ν (B△Γ)] under the constraint ν (B) = E [ν (Γ)], where △ is the symmetric difference (Molchanov, 05).

More generally, on level sets :

◮ p(w) not always available in an analytical closed form ◮ the level sets cannot be computed for all the points w ∈ W

⇒ discretisation should be considered

slide-11
SLIDE 11

Plug-in estimation (1)

Definition

◮ consider n i.i.d. copies Γ1, Γ2, . . . , Γn of Γ ◮ the empirical counterpart of p(w)

pn(w) = 1 n

n

  • i=1

1{w∈Γi }

◮ the plug-in estimator

Qn,α = {pn > α}

slide-12
SLIDE 12

Plug-in estimation (2)

Properties :

the problem was deeply studied in the literature

◮ some references : (Molchanov, 87, 90, 98), (Cuevas, 97, 06)

and many others

◮ L1−consistency under weak assumptions → p(w) does not

need to be continuous

◮ Hausdorff distance : similar consistency results using some

extra assumptions

◮ rates of convergence and asymptotic normality : regularity

conditions on p(w)

Aim of our work

◮ plug-in estimator that takes into account the discretisation

effects

◮ estimator for the Vorob’ev expectation → its definition

contains another quantity that need approximation ...

slide-13
SLIDE 13

A new level-set estimator (1)

Discretisation : for any Borel set B in W and r ∈ 2−N, its corresponding grid approximation is Br =

  • w∈B∩rZd

[w, w + r)d. Regularity : the “upper box counting dimension” of ∂B is dimbox(∂B) = lim sup

r→0

log Nr(∂B) − log r , with Nr(∂B) = Card{w ∈ rZd : [w, w + r)d ∩ ∂B = ∅}.

slide-14
SLIDE 14

A new level-set estimator (2)

Proposition

Assume that dimbox(∂B) < d. For all ε > 0, there exists rε such that 0 < r < rε ⇒ ν (Br△B) ≤ r d−dimbox(∂B)−ε.

Proposition

Assume that dimbox(∂Γ) ≤ d − κ with probability one for some κ > 0. For all α such that ν ({p = α}) = 0, (i) with probability 1, lim

r→0 n→∞

ν

  • Qr

n,α△Qα

  • = 0

(ii) for all ε > 0, E

  • ν
  • Qr

n,α△Qα

  • ≤ r κ + 2e−2nε2 + F(α − ε) − F(α + ε).

The proof is an extension of the result in (Cuevas, 06).

slide-15
SLIDE 15

Vorob’ev expectation estimator (1)

Kovyazin’s mean : the empirical counter-part of the Vorob’ev

  • expectation. That is the Borel set Kn such that

ν (Kn) = 1 n

n

  • i=1

ν (Γi) and {pn > α∗

n} ⊂ Kn ⊂ {pn ≥ α∗ n},

where α∗

n = inf{α ∈ [0, 1] : ν ({pn > α}) ≤ ν (Kn}.)

Theorem

Assume that ν ({p = α∗}) = 0. Then, with probability one, lim

n→∞ ν (Kn△EV Γ) = 0.

The proof revisits the result given by (Kovyazin, 86).

slide-16
SLIDE 16

Vorob’ev expectation estimator (2)

Grid approximation of Kn : this is the estimator we propose. That is the Borel set Kn,r such that {pn > α∗

n,r}r ⊂ Kn,r ⊂ {pn ≥ α∗ n,r}r,

where α∗

n,r = inf{α ∈ [0, 1] : ν ({pn > α}r) ≤ ν (Kn)}.

Some remarks :

◮ quite strong assumption : ν (Kn) is computed exactly ... ◮ an alternative idea may consider directly the discretisation of

Γ or Kn, but this does not guarantee a mean volume equal to ν (Kn) ...

◮ still, in practice ...

slide-17
SLIDE 17

Consistency of the Vorob’ev estimator

Theorem

Assume that dimbox(∂Γ) ≤ d − κ with probability one for some κ > 0, and that ν ({p = α∗}) = 0 and ν ({p = β∗}) = 0 with β∗ = sup{α ∈ [0, 1] : ν ({p > α}) ≥ E [ν (Γ)]}. Then, we have almost surely lim

r→0 n→∞

ν (Kn,r△EV Γ) = 0.

Proof.

We write that ν (Kn,r△EV Γ) ≤ ν (Kn,r△Kn) + ν (Kn△EV Γ) and use Theorem 1 and two lemmas to conclude. For technical details, a draft is available on demand ...

slide-18
SLIDE 18

Cosmic filaments : simulated annealing detection

(Stoica, Martinez and Saar, 07,10) a)

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 2 4 6 8 10 12

b)

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 5 10

Figure: a) Original data. b) Cylinder configuration detected.

slide-19
SLIDE 19

Cosmic filaments : level sets averaging

a)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2000 4000 6000 8000 10000 12000 14000

b)

Figure: a) Behaviour of the level set volume. b) Estimated Vorob’ev expectation.

slide-20
SLIDE 20

Epidemiology (veterinary context)

Disease : sub-clinical mastitis for diary herds

◮ points → farms location ◮ to each farm → disease score (continuous variable) ◮ clusters pattern detection : regions where there is a lack of

hygiene or rigour in farm management

50 100 150 200 250 300 350 50 100 150 200 250 300

Figure: The spatial distribution of the farms outlines almost the entire French territory (INRA Avignon).

slide-21
SLIDE 21

Epidemiology : sub-clinical mastitis data

(Stoica, Gay and Kretzschmar, 07) a)

50 100 150 200 250 300 350 50 100 150 200 250 300

b)

50 100 150 200 250 300 50 100 150 200 250 300 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure: Disease data scores and coordinates for the year 1996 : a) cluster pattern (disk configuration) detected ; b) Level sets.

slide-22
SLIDE 22

Conclusion :

◮ estimator including the discretisation effects ◮ averaging the shape of the pattern ...

Perspectives :

◮ ... provided the model is correct ... ◮ relax hypotheses ◮ what is the variance of the pattern ?

Acknowledgements :

this work was done together with wonderful co-authors and also with help of some very generous people ... Some of them are today with us :) ...

slide-23
SLIDE 23

GDR G´ eom´ etrie Stochastique

Aim :

◮ network of scientists ◮ no obligations at all ... ◮ joining mathematicians sharing common research interests but

not only ... also the scientists from the corresponding application domains ...

Contact :

◮ Pierre.Calka@univ-rouen.fr ◮ web page : http://gdr-geostoch.math.cnrs.fr/