Probability on spaces of functions with applications to inverse - - PowerPoint PPT Presentation

probability on spaces of functions with applications to
SMART_READER_LITE
LIVE PREVIEW

Probability on spaces of functions with applications to inverse - - PowerPoint PPT Presentation

Probability on spaces of functions with applications to inverse problems and data assimilation Jan Mandel Department of Mathematical and Statistical Sciences University of Colorado Denver European Mathematical Society School on Mathematical


slide-1
SLIDE 1

Probability on spaces of functions with applications to inverse problems and data assimilation

Jan Mandel

Department of Mathematical and Statistical Sciences University of Colorado Denver

European Mathematical Society School on Mathematical Modeling, Numerical Analysis and Scientific Computing Kacov (Czech Republic), May 29 – June 3, 2016

slide-2
SLIDE 2

Why stochastics in scientific computing?

  • Funding trends - new dimensions, links, applications.
  • Uncertainty is everywhere. Data and models cannot be 100%.
  • Old paradigm - data in, results out. But we live in a dynamic world.
  • Hot: Uncertainty quantification, data assimilation, inverse problems
  • Statistics of the solutions of partial differential equations
slide-3
SLIDE 3

An eclectic collection of books I saw in Summer 2012 at the Parallel Algorithms Group at CERFACS says it all.

slide-4
SLIDE 4

Purpose

For numerical analysts and practitioners of scientific computing. Assume little previous probability, some analysis. It never hurts to go through the basics - particularly if we want to steal them for our purposes!

slide-5
SLIDE 5

Random variables

Random variable (r.v.) X = measurable function (Ω, Σ) → R Sample space Ω, σ-algebra Σ of its subsets= events Probability is a measure µ on the subsets in Σ, called measurable. That is {ω ∈ Ω|X (ω) ∈ A} ∈ Σ ∀ Borel A ⊂ R (∀ open is enough) Recall that measure is a function that assigns numbers to subsets, with some properties. Example: volume, the Lebesgue measure on Rn. But unlike in real analysis, non-measurable functions are common. Example: dice. Usually it is left unsaid what Ω is.

slide-6
SLIDE 6

Distribution and density

Distribution of a r.v. X is the measure µX on R µX (A) = µ ({ω ∈ Ω|X (ω) ∈ A}) = µ

  • X−1 (A)
  • Usually written simply as µX (A) = Pr (X ∈ A) .

Probability density p of a r.v. X is the Radon-Nykodym derivative of its distribution µX w.r.t. the Lebesgue measure λ: µX (A) =

  • A

pdλ ⇔ p = dµX dλ Examples: 1. Dice. 2. Normal (gaussian) random variable f µX (A) = Pr {X ∈ A} =

  • A

e−x2

2

√ 2πdx

slide-7
SLIDE 7

Random elements in functional spaces

All remains the same. Random element X with values in a Banach space V is a measurable function X : Ω → V . That is, ∀B ∈ B (H) : X−1 (B) ∈ Σ where B (H) is the σ-algebra of all Borel sets in V (open sets is enough). Random function is simply a random element with values in a space V

  • f functions.
slide-8
SLIDE 8

Probability distributions on functional spaces

The distribution of X of a random element X with values in a Banach space V is the measure on V defined by µX (A) = µ

  • X−1 (A)
  • = Pr (X ∈ A)

Note that µX (V ) = Pr (X ∈ V ) = 1, i.e., µX is a probability measure. Note: will will often assume that V = H is a separable Hilbert space. Otherwise things can get complicated.

slide-9
SLIDE 9

Mean of a random element

The mean value (if it exists) of a random variable X : Ω → R is the integral with respect to the probability measure µ on Ω, E (X) =

  • Ω Xdµ =
  • Ω X (ω) µ (dω) .

In Rn, the mean is defined entry by entry, E (X) = (E (X1) , . . . , E (Xn)) . which is the same as requiring for all coordinate unit vectors ei

  • ei, E (X)
  • = E
  • ei, X
  • ∀ei

For H = Rn is the same as E (X) ∈ H, v, E (X) = E (v, X) , ∀v ∈ H which has no coordinates and we take that for the definition of E (X) in general. Note that the mean is definited weakly.

slide-10
SLIDE 10

Covariance of a random element

In Rn, we have the usual covariance entry by entry, Cov (X, Y ) =

  • Cij
  • Cij =
  • ei, Cov (X, Y ) ej

= E

  • (Xi − E (Xi))
  • Yj − E
  • Yj
  • that is

u, Cov (X, Y ) v = E (u, X − E (X) v, Y − E (Y )) with H = Rn, u = ei, v = ej but there are no explicit coordinates any

  • more. Thus, Cov (X, Y ) is defined weakly

u, Cov (X, Y ) v = E (u, X − E (X) v, Y − E (Y )) ∀u, v ∈ H. Cov (X) stands for Cov (X, X). Clearly, if Cov (X) exists, it is self-adjoint and positive semidefinite.

slide-11
SLIDE 11

Lp spaces Lp = Lp (Ω, H) is the space of all H-valued random elements X such that the moment E (|X|p) =

  • Ω |X|p dµ < ∞

Note: |X| is the norm on H and has single bars. We reserve double bars for stochastic norms: The space Lp equipped with the norm Xp = (E (|X|p))1/p is a Banach space, and L2 (Ω, H) equipped with the inner product E (U, V ) =

  • Ω U (ω) , V (ω) dω

is a Hilbert space.

slide-12
SLIDE 12

Mean and covariance inLp

If p > r ≥ 1, then Lp (Ω, H) ⊂ Lr (Ω, H) If X ∈ L1 (Ω, H), its mean E(X) exists from Riesz representation theorem - the linear functional v ∈ H → E v, X ∈ R is bounded. Likewise, if X ∈ L2 (Ω, H), its covariance Cov (X) exists. L2 (Ω, H) are historically called “second order” random elements.

slide-13
SLIDE 13

The first surprises

Cov (X) must be a compact operator with finite trace. Thus, there is no distribution with covariance identity in infinite dimension. There is no Lebesgue measure in infinite dimension. Thus, there is no standard density in infinite dimension.

slide-14
SLIDE 14

There is no measure on infinite dimensional Hilbert space that is translation or rotation invariant

Proof: There are infinitely many orthonormal vectors. Put a ball with radius 1/2 at the end of each, the balls all have the same positive measure and fit in a ball at zero with radius 3/2, which, therefore, cannot have a finite measure. In particular, N (0, I) cannot be a (σ-additive) probability measure such that balls are measurable.

slide-15
SLIDE 15

How to define N (0, I) on a separable Hilbert space

  • µ (whole space Y ) = 1, rotation invariant ⇒ balls not measurable
  • Defined on the algebra C of cylinder sets with finite dimensional Borel

measurable base

Cylinder set Ball =

countable cylinder sets

  • µ cannot be extended to a σ-algebra, which would contain balls
  • µ cannot be σ-additive, only finitely additive. Balls would be

measurable. Consider 1D base => weak random variable

slide-16
SLIDE 16

Gaussian and weak random vectors on Hilbert space

Gaussian random elements are determined by their distributions on the finite dimensional bases of cylindrical sets. 1D bases enough: X : Ω → H such that ∀v ∈ H the function ω ∈ Ω → v, X (ω) ∈ R is measurable, is called weak random variable (not necessarily Gaussian) Mean m of X is defined weakly by m, v = E [v, X] ∀v ∈ H Covariance C of X is defined weakly by Cu, v = E [u, X v, X] ∀u, v ∈ H Gaussian X is (strong) random element (its distribution is a measure) ⇔ C has finite trace X ∼ N (0, I) (white noise) is weak random element only X, v ∼ N (0, 1) ∀v ∈ H, and E [u, X v, X] = 0 ∀u, v ∈ H, u ⊥ v

slide-17
SLIDE 17

Covariance matrices and exponential forms

Assume the covariance C is nonsingular. In finite dimension, e−1

2|u|2 C−1 ∝ density of N (0, C)

(∝ means proportional) and |·|C−1 is a norm, defined by |u|2

Q−1 =

  • C−1u, u
  • =
  • C−1/2u, C−1/2u
  • =
  • C−1/2u
  • 2

⇒ −1 2 |u|2

C−1 ∈ R ⇒ e−1

2|u|2 C−1 > 0

In infinite dimension, C is compact ⇒ C−1/2 defined only on dense subspace D

  • C−1/2

= R

  • C1/2

, called the Cameron-Martin space of the Gaussian measure with covariance C.

slide-18
SLIDE 18

More surprises: Cameron-Martin space and Feldman-H´ ajek theorem

In infinite dimensiona Hilbert space, the measure N (0, C) of the Cameron-Martin space R

  • C1/2

is 0. Equivalently, X ∼ N (0, C) ⇒ X / ∈ R

  • C1/2

almost surely (a.s.) Corollary: e−1

2|X|2 C−1 = 0 a.s.

Feldman-H´ ajek theorem: On infinite dimensional Hilbert space, any two Gaussian measures ν1 and ν2 are either equivalent or mutually singular (i.e., H = H1 + H2, ν1 (H1) = ν2 (H2) = 0.) Note: In finite dimension, all Gaussian measures with nonsingular convariance are equivalent. Special case: N (u1, C) and N (u2, C) are equivalent ⇔ u1 − u2 ∈ R

  • C1/2

.

slide-19
SLIDE 19

From least squares to Bayes theorem

Inverse problem H(u) ≈ d with background knowledge u ≈ uf |u − uf|2

Q−1 + |d − H (u)|2 R−1 → min u

uf=forecast, what we think the state should be, d=data, H=observation operator H (u)=what the data should be given state u e

−1

2

  • |u−ua|2

(Qa)−1

  • ∝ pa(u)

analysis (posterior) density

= e−1

2|d−H(u)|2 R−1

  • ∝ p(d|u)

data likelihood

e−1

2

  • u−uf

2

Q−1

  • ∝ p(u)

forecast (prior) density

→ max

u

∝ means proportional u is the Maximum Aposteriori Probability (MAP) estimate. This is same as the Bayes theorem for Gaussian probability densities

slide-20
SLIDE 20

First the Bayes theorem for discrete states

Consider system in n possible states A1... Ak. A model forecast the probabilities Pr (A1) ,.. Pr (Am). Data event D arrives and (e.g., from instrument properties) we know Pr (D|Ai), the conditional probabilities.of getting data in the event D given system state Ai We want to get improved Pr (Ai|D), the probabilities of states Ai given data D (the analysis) By definition of conditional probabilites: Pr (D ∩ Ai) = Pr (D|Ai) Pr (Ai) = Pr (Ai|D) Pr (D) Pr (Ai|D) = Pr (D|Ai) Pr (Ai) Pr (D) , giving (∝ means proportional): Pr (Ai|D) ∝ Pr (D|Ai) Pr (Ai) Normalize by

n

  • i=1

Pr (Ai|D) = 1: Pr (Ai|D) =

Pr(D|Ai) Pr(Ai)

n

i=1 Pr(D|Ai) Pr(Ai).

slide-21
SLIDE 21

Bayes theorem for probability densities

Consider system state u ∈ Rn with probability density pf (u) (forecast, from a model). Data d comes from some distribution with probability density p (d|u) conditional on system in state u. As in the discrete case, p (u|d), the probability density (analysis) of u given data d, is pa (u) = p (u|d) = p (d|u) pf (u)

  • Rn p (d|u) pf (u) du ∝ p (d|u) pf (u) .
slide-22
SLIDE 22

Bayes theorem for Gaussian probability densities

Gaussian case: u ∼ N (u, Q), d = H (u) + e, e ∼ N (0, R). Then forecast density:pf (u) ∝ e

−1

2|u−u|2 Q−1,

data likelihood: p (d|u) ∝ e−1

2|H(u)−d|2 R−1

analysis density pa (u) ∝ e

−1

2|u−u|2 Q−1e−1 2|H(u)−d|2 R−1

H is called observation operator in data assimilation, and forward

  • perator in inverse problems. It can be quite complicated, e.g., solving

a PDE. H (u) − d is the mismatch between the data and the model (called “innovation” in data assimilation) Single answer: Maximum a-Posteriori Probability estimate (MAP): pa (u) → max Note. In infinite dimension, the MAP estimate can still be defined in a generalized sense (Dashti et al, 2013).

slide-23
SLIDE 23

Bayes theorem in infinite dimension

Forecast and analysis are probability distributions Bayes theorem: pa (u) ∝ p (d|u) pf (u) No Lebesgue measure, no densities. Integrate over an arbitrary measurable set A instead:

  • A

pa (u) du ∝

  • A

p (d|u) pf (u) du µa (A) ∝

  • A

p (d|u) dµf (u) Data likelihood is the Radon-Nikodym derivative: p (d|u) ∝ dµa dµf

  • Normalize: µa (A) =
  • A p(d|u)dµf(u)
  • V p(d|u)dµf(u)
  • But how do we know that
  • V

p (d|u) dµf (u) > 0?

slide-24
SLIDE 24

Infinite dimensional Gaussian case

  • Data likelihood: d = Hu + ε, ε ∼ N (0, R), p (d|u) = e−1

2|Hu−d|2 R−1

  • µf is Gaussian measure on U, data d ∈ V
  • state space U and data space V are separable Hilbert spaces
  • All good when data is finite dimensional, then

p (d|u) = e−1

2|Hu−d|2 R−1 > 0 always.

  • Difficulties when the data is infinite dimensional (a.k.a. functional

data)

slide-25
SLIDE 25

Infinite-dimensional data, Gaussian measure error bad

  • The simplest example: µf = N (0, Q), H = I, d = 0, R = Q, U = V.

The whole state is observed, data error distribution = state error

  • distribution. Come half-way? Wrong.
  • p (d|u) = const e−1

2|u|2 R−1 = const e

− 1

2

  • R−1/2u,R−1/2u
  • data likelihood p (d|u) > 0 if u ∈ R1/2 (V ) = D
  • R−1/2
  • p (d|u) = e−∞ = 0 if u /

∈ R1/2 (V ) = Q1/2 (V )

  • Q1/2 (V ) is the Cameron-Martin space of the measure N (0, Q)
  • But µ = N (0, Q) ⇒ µ
  • Q1/2 (V )
  • = 0. Thus,
  • V

p (d|u) dµf (u) = 0

slide-26
SLIDE 26

Commutative case

  • State and data covariance commute ⇒ same eigenvectors ei
  • Qei = qiei, ∞

i=1 qi < ∞, Rei = riei

  • Recall that p (d|u) = e−1

2|d−u|2 R−1, µf = N (0, Q)

Theorem.(Kasanick´ y and Mandel, 2016)

  • V p (d|u) dµf (u) > 0 for any d ⇔ ∃α > 0 : ∀i : ri > α

That is, (in the commutative case) Bayesian estimation is well posed for any data value if and only if the eigenvalues of the state covariance are bounded away from zero.

  • In particular, R cannot be trace class: ∞

i=1 ri = ∞, and data error

cannot be a probability measure..

slide-27
SLIDE 27

Infinite-dimensional data, white noise error good

  • All is good when data is finite-dimensional and R not singular
  • More generally, when data covariance R is bounded below:

u, Ru ≥ α |u|2 ∀u, α > 0 ⇒ |u|2

R−1 < ∞

∀u ⇒ p (d|u) = e−|Hu−d|2

R−1 > 0

∀u ⇒

  • U

p (d|u) dµf (u) > 0

  • But if V is infinite dimensional, then N (0, R) is not a probability

measure on V - the trace condition is violated, Tr (R) = ∞

i=0 ri = ∞.

  • But this is not a problem. The data likelihood p (d|u) is just a

function of u on the state state U for a fixed d.

  • For a fixed u, p (·|u) does not need to be a probability density. That

was just an example.

slide-28
SLIDE 28

Positive data likelihood

  • Theorem. If the forecast µf is a probability measure on the state space

V , and, for a fixed realization of the data d, the function u → p (d|u) is µf-measurable, and 0 < p (d|·) ≤ C µf-a.s., (In fact, p (d|·) > 0 on some set of positive measure µf is enough). Then the analysis measure µa (A) =

  • A p (d|u) dµf (u)
  • V p (d|u) dµf (u)

is well defined.

  • Proof. Because µf (V ) = 1, from the assumption,

0 ≤

  • V p (d|u) dµf (u) ≤ C.

If

  • V p (d|u) dµf (u) = 0, then p (d|·) = 0 µf- a.s., hence
  • V p (d|u) dµf (u) > 0.
slide-29
SLIDE 29

Examples of positive data likelihood

  • White noise: (V, ·, ·) is a Hilbert space and

p (d|u) = e−1

2d−Hu,d−Hu

  • Pointwise Gaussian: µf is a random field on domain D ⊂ R2, data is

a function d : D → R, and p (d|u) = e−1

2

  • D|d(x)−u(x)|2dx
  • Pointwise positive:

p (d|u) = e

  • D f(x,d(x),u(x))dx

(the satellite sensing application will be like that)

slide-30
SLIDE 30

Karhunen-Lo` eve explansion - derivation

Consider X ∈ L2 (Ω, H), C = Cov (X). C is compact self-adjoint ⇒ ∃ complete orthonormal system of eigenvectors:Cun = λnun. For fixed ω ∈ Ω, expand X − E (X) in an abstract Fourier series, convergent in H : X (ω) = E (X) + ∞

n=1θn (ω) un, θn (ω) = X (ω) − E (X) , un .

E (θn) = E (X − E (X) , um) = 0 E (θmθn) = E (X − E (X) , um X − E (X) , un) = um, Cun =

  • λn if m = n

0 if m = 0 Write θn = λ1/2

n

ξn, then E (ξmξn) = ξm, ξnL2(Ω) = δmn.

slide-31
SLIDE 31

Karhunen-Lo` eve explansion

If X ∈ L2 (Ω, H), C = Cov (X), then there exists expansion X = E (X) + ∞

n=1λ1/2 n

ξnun, E (ξn) = 0. convergent a.s. in H and double-orthogonal, um, unH = δmn, ξm, ξnL2(Ω) = δmn. (See [14] for further details.)

slide-32
SLIDE 32

Applications of the Karhunen-Lo` eve explansion: Data analysis

Estimate the covariance from data (realizations of U) by sample

  • covariance. Called Principal Component Analysis (PCA),

Karhunen-Lo` eve transform (KLT), or Proper Orthogonal Decomposition (POD). The components with several largest eigenvalues λn are responsible for most of the variance in the random element X. The practical computation is done by SVD of the data minus sample mean, which is much less expensive and less prone to numerical errors than computing the sample covariance first.

slide-33
SLIDE 33

Applications of the Karhunen-Lo` eve explansion: Generating random smooth functions

Prescribe the covariance so that it has suitable eigenvectors, and use the Karhunen-Lo` eve expansion to generate the random element X (rather, the first few terms to generate a version of X in finite dimension). For example, random Fourier series on [0, π] ,with H = L2 (0, π) F (x) = ∞

n=1λ1/2 n

ξn sin(nx), ξn ∼ N (0, I) , with λn → 0. The smoothness of F depends on how fast λn → 0. Need at least ∞

n=1 λn < ∞ for F to be well defined and F ∈ L2 (0, π)

a.s.

slide-34
SLIDE 34

Applications of the Karhunen-Lo` eve explansion: Numerical methods for stochastic PDEs

Use the first few terms Karhunen-Lo` eve expansion of the to build random coefficients and assumed form of the solution (called trial space in variational methods), and test functions. The solution is found numerically as a deterministic function of a small number of random variables ξ1, . . . , ξn. This is the foundation of methods such as stochastic Galerkin [2], stochastic collocation [9] and polynomial chaos . The first application like that I know is Babuska 1961 [1].

slide-35
SLIDE 35

Independent random elements

  • Definition. Let (Ω, Σ, µ) be a probability space. Events A, B ∈ Σ are

independent if µ (A ∩ B) = µ (A) µ (B). Random elements X, Y : Ω → V are independent if for any Borel sets S, T ⊂ V , the events X−1 (S) and Y −1 (T) are independent. i.i.d. means independent identically distributed.

  • Example. Independent random elements defined on different sample

spaces Ω. The standard property carries over: If X, Y are random elements with values in Hilbert space, then Cov (X, Y ) = 0.

slide-36
SLIDE 36

Strong law of large numbers

  • Theorem. [13, Corollary 7.10]Let (Xi) be i.i.d. random elements with

values in Banach space B with E (|X1|) < ∞, and En = 1

n (X1 + · · · + Xn). Then

En → E (X) almost surely. Strong, but not as useful for our purposes – in computations, we like explicit rates of convergence.

slide-37
SLIDE 37

L2 law of large numbers

  • Lemma. Let Xk ∈ L2 (Ω, H) be i.i.d. Then

En (Xk) − E (X1)2 ≤ 1 √n X1 − E (X1)2 ≤ 2 √n X12 and En (Xk) ⇒ E (X1) as n → ∞.

  • Proof. The usual proof caries over. First E (X1) = 0. Then ??),

En (Xk)2

2 = E

  

 1

N

n

  • k=1

Xk

 

  • 2

  = 1

n2

n

  • k=1

n

  • ℓ=1

E (Xk, Xℓ) = 1 nE

  • |X1|2

= 1 n X12

2 .

slide-38
SLIDE 38

In the general case, from the triangle inequality, En (Xk) − E (X1)2 = 1 √n X1 − E (X1)2 ≤ 1 √n (X12 + E (X1)2) .

slide-39
SLIDE 39

Lp laws of large numbers Lemma Let H be a Hilbert space, Xi ∈ Lp (Ω, H) be i.i.d., p ≥ 2, and En (Xk) = 1

n

n

k=1 Xk. Then,

En (Xk) − E (X1)p ≤ Cp √n X1 − E (X1)p ≤ 2Cp √n X1p , where Cp depends on p only. Note: Lp laws of large numbers do not hold on a general Banach space. Related questions are studied a field called geometry of Banach spaces. Note: Lp laws of large numbers do not hold on the space of [H] of all bounded linear operators on a Hilbert space. [H] is a Banach space, and any Banach space can be isomorphically embedded in [H] for some Hilbert space H.

slide-40
SLIDE 40

Covariance by tensor product

This is important in ensemble methods, where the covariance is estimated by sample covariance. In Rn, the product xyT is a rank-one matrix. To get coordinate-free definition, write it as linear operator, xyT : z → x

  • yTz
  • = x y, z

In Hilbert space, define tensor product by x ⊗ y ∈ [H] , (x ⊗ y) z = x y, z ∀z ∈ H Now Cov (X, Y ) = E ((Y − E (Y )) ⊗ (Y − E (Y ))) = E (X ⊗ Y ) − E (X) ⊗ E (Y )

slide-41
SLIDE 41

Laws of large numbers for sample covariance

Write sample mean operator as EN (Xk) = 1 N

N

  • k=1

Xk. Covariance can be estimated by sample covariance CN(Xk, Yk) = EN (Xk − EN (Xk)) ⊗ (Yk − EN (Yk)) = EN (Xk ⊗ Yk) − EN (Xk) ⊗ EN (Yk) We already know EN (Xk) → EN (X) in Lp as N−1/2, p ≥ 2. But EN (Xk ⊗ Yk) = 1 N

N

  • k=1

Xk ⊗ Yk are in [H] and there is no Lp law of large numbers in [H]

slide-42
SLIDE 42

Lp law of large numbers for the sample covariance Let Xi ∈ Lp (Ω, H) be i.i.d. and CN ([X1, . . . , XN]) = 1 N

N

  • i=1

Xi ⊗ Xi −

  1

N

N

  • i=1

Xi

  ⊗   1

N

N

  • i=1

Xi

 

But [H] is not a Hilbert space. Use L2 (Ω, VHS) with VHS the space of Hilbert-Schmidt operators, |A|2

HS = ∞

  • n=1

Aen, Aen < ∞, A, BHS =

  • n=1

Aen, Ben , where {en} is any complete orthonormal sequence in H. In finite dimension, the Hilbert-Schmidt norm becomes the Frobenius norm |A|2

HS =

  • i,j
  • aij
  • 2.
slide-43
SLIDE 43

Lp law of large numbers for the sample covariance (cont.) Let Xi ∈ Lp (Ω, V ) be i.i.d. and V a separable Hilbert space. Then, from the Lp law of large numbers in the Hilbert space VHS of Hilbert-Schmidt operators on V , CN ([X1, . . . , XN]) − Cov (X1)p ≤ |CN ([X1, . . . , XN]) − Cov (X1)|HSp ≤ const(p) √ N X12

2p .

where the constant depends on p only. Note that since H is separable, VHS is also separable.

slide-44
SLIDE 44

Continuity of the sample covariance

Write the N-th sample covariance of [X1, X2, . . .], as CN (X) = 1 N

N

  • i=1

Xi ⊗ Xi −

  1

N

N

  • i=1

Xi

  ⊗   1

N

N

  • i=1

Xi

 

Bound If Xi ∈ L2p are identically distributed, then CN (X)p ≤ X1 ⊗ X1p + X12

p ≤ X12 2p + X12 p

Continuity of the sample covariance: If [Xi, Ui] ∈ L4 are identically distributed, then CN(X) − CN(U)2 ≤ √ 8 X1 − U14

  • X12

4 + U12 4

.

slide-45
SLIDE 45

Data assimilation - Analysis cycle

  • Goal: Inject data into a running model

analysis step advance in time analysis step forecast − → analysis − → forecast − → analysis . . . data ր data ր

  • Kalman filter used already in Apollo 11 navigation, now in GPS,

computer vision, weather forecasting, remote sensing,...

slide-46
SLIDE 46

From least squares to Kalman filter

Inverse problem Hu ≈ d with background knowledge u ≈ uf |u − uf|2

Q−1 + |d − Hu|2 R−1 → min u

uf=forecast, what we think the state should be, d=data, H=linear observation operator Hu=what the data should be given state u if no errors e

−1

2

  • u−uf

2

Q−1+|d−Hu|2 R−1

  • ∝ pa(u)=p(u|d)

analysis (posterior) density

= e−1

2

  • u−uf

2

Q−1

  • ∝ p(u)

forecast (prior) density

e−1

2|d−Hu|2 R−1

  • ∝ p(d|u)

data likelihood

→ max

u

The analysis density pa (u) ∝ e

−1

2|u−ua|2 Qa−1 mean and covariance are:

ua = uf + K(d − Huf), Qa = (I − KH)Q where K = K(Q) = QHT(HQHT + R)−1 is called the Kalman gain

slide-47
SLIDE 47

Kalman filter (KF)

  • Add advancing in time - model is operator u → Mu + b:

mean and covariance advanced as Mu and MQMT

  • Hard to advance the covariance matrix when the model is nonlinear.
  • Need linear operators, or tangents and adjoints.
  • Can’t maintain the covariance matrix for large problems, such as

from discretizations of PDEs.

slide-48
SLIDE 48

Ensemble forecasting to the rescue

  • Ensemble weather forecast: Independent randomized simulations. If

it rains in 30% of them, then say that “the chance of rain is 30%”.

  • Ensemble Kalman filter (EnKF, Evensen 1994): replace the

covariance in KF by the sample covariance of the ensemble, then apply the KF formula for the mean to each ensemble member independently.

  • But the analysis covariance was wrong – too small even if the

covariance were exact.

  • Fix: randomize also the data by sampling from the data error
  • distribution. (Burgers, Evensen, van Leeuwen, 1998).
  • Then all is good and we should get the right ensembles...

at least in the Gaussian case, and up to the approximation by the sample covariance.

slide-49
SLIDE 49

The EnKF with data randomization is statistically exact if the covariance is exact

  • Lemma. Let Uf ∼ N(uf, Qf), D = d + E, E ∼ N(0, R) and

Ua=Uf+K(D-HUf), K=QfHT(HQfHT+R)−1. Then Ua ∼ N (ua, Qa), i.e., U has the correct analysis distribution from the Bayes theorem and the Kalman filter.

  • Proof. Computation in Burgers, Evensen, van Leeuwen, 1998.
  • Corollary. If the forecast ensemble [Uf

1 , . . . , Uf N] is a sample from

Gaussian forecast distribution, and the exact covariance is used, then the analysis ensemble [Ua

1 , . . . , Ua N] is a sample from the analysis

distribution.

slide-50
SLIDE 50

EnKF properties

  • The model is needed only as a black box.
  • The ensemble members interact through ensemble mean and

covariance only

  • The EnKF was derived for Gaussian distributions but the algorithm

does not depend on this.

  • So it is often used for nonlinear models and non-Gaussian

distributions anyway.

  • Folklore: “high-dimensional problems require large ensembles” a.k.a.

“curse of dimensionality”. Not necessarily so.

  • Curse of dimensionality: slow convergence in high dimension. With

arbitrary probability distributions, sure. But probability distributions in practice are not arbitrary: the state is a discretization of a random function.The smoother the functions, the faster the EnKF convergence.

slide-51
SLIDE 51

Curse of dimensionality? Not for probability measures!

101 102 103 10−2 10−1 100 101 102 model size filter error EnKF, N = 10, m = 25

Constant covariance eigenvalues λk = 1 and the inverse law λk = 1/k are not probability measures in the limit because ∞

k=1 λk = ∞. Inverse square law

λk = 1/k2 gives a probability measure because ∞

n=1 λk < ∞.

Observation: m=25 uniformly sampled data points on [0,1], N=10 ensemble members, sin(kπx) as eigenvectors. From the thesis Beezley (2009), Fig. 4.7.

slide-52
SLIDE 52

Convergence of EnKF in the large ensemble limit

  • Laws of large numbers to guarantee that the EnKF gives correct

results for large ensembles, in the Gaussian case: Le Gland et al. (2011) in Lp entry-by-entry, Mandel et al (2011) in probability (in the lecture notes).

  • In general, the EnKF converges to a mean-field limit (Le Gland et al.

2011, Law et al. 2015).

  • mean-field approximation = the effect of all other particles on any
  • ne particle is replaced by a single averaged effect
  • mean field limit = large number of particles, the influence of each

becomes negligible.

  • Here: dimension-independent Lp - coordinates free
slide-53
SLIDE 53

Convergence of the EnKF to the Kalman filter

  • Generate the initial ensemble X(0)

N

= [X(0)

N1, . . . , X(0) NN] i.i.d. and

Gaussian.

  • Run EnKF, get ensembles [X(k)

N1, . . . , X(k) NN],advance in time by linear

models: Xk+1,f

Ni

= MkXk

Ni + fk.

  • For theoretical purposes, define the reference ensembles

[U(k)

N1, . . . , U(k) NN] by U(0) N

= X(0)

N

and U(k)

N

by the EnKF, except with the exact covariance of the filtering distribution instead of the sample covariance.

  • We already know that U(k)

N

is a sample (i.i.d.) from the Gaussian filtering distribution (as it would be delivered by the Kalman filter).

  • We will show by induction that X(k)

Ni → U(k) Ni in Lp for all

1 ≤ p < ∞ as N → ∞, for all analysis cycles k.

  • In which sense? We need this for all i = 1, . . . , N but UNi change

and N changes too.

slide-54
SLIDE 54

Exchangeability of EnKF ensembles

  • Definition. Ensemble [X1, . . . , XN] is exchangeable if its joint

distribution is permutation invariant.

  • Lemma. All ensembles generated by the EnKF are exchangeable.

Proof: The initial ensemble is i.i.d., thus exchangeable, and the analysis step is permutation invariant. Corollary The random elements

     X(k)

N1

U(k)

N1

  , . . . ,   X(k)

NN

U(k)

NN

     are also

  • exchangeable. (Recall that UNi are obtained using the exact covariance

from the Kalman filter rather than using the sample covariance.) Corollary.

  • X(k)

N1 − U(k) N1, . . . , X(k) NN − U(k) NN

  • are identically distributed,

so we only need to consider the convergence

  • X(k)

N1 − U(k) N1

  • p

→ 0.

slide-55
SLIDE 55

Convergence of the EnKF to mean-field limit

Nonlinear tranformation of ensemble as vector of exchangeable random variables [X1, X2, . . . , XN] → [Y1, Y2, . . . , YN] Lp continuity of the model. Drop the superscript m Mean field filter: Y Mean field

k

= XMean field

k

+ K(Q)(Dk − HXMean field

k

), Q = Cov (X1) Randomized EnKF: Y Randomized

k

= XRandomized

k

+ K(QN)(Dk − HXRandomized

k

), QN = CN

  • XRandomized

1

, . . . , XRandomized

N

  • , same Dk
slide-56
SLIDE 56

Convergence of the EnKF to mean-field limit (cont)

Subtract, continuity of Kalman gain: K(Q) − K(QN)p ≤ const Q − QN2p Lp law of large numbers for CN

  • XMean field

1

, . . . , XMean field

N

  • .

Apriori bound

  • Xm

k

  • p ≤ const (m) for all m from
  • (HQH∗ + R)−1
  • ≤ 1

α by R ≥ αI

Induction over m: Xm,Randomized

1

→ Xm,Mean field

1

in all Lp, 1 ≤ p < ∞

  • Dimension independent, but constants grow with every cycle.

Cascade in p - need L2p bounds in the previous cycle to get Lp.

  • Future - combine with long-term stability resuts? (Law, Stuart).

Need to assume more about the model (ergodic) and sufficient

  • bservations.
slide-57
SLIDE 57

Convergence of the square root EnKF

  • Start from random ensemble, then update the ensemble by a

deterministic algorithm to guarantee the analysis sample mean and sample covariance from Kalman formulas applied to the forecast sample mean and covariance. This involves factorization of the Gram matrix, hence “square root”. No data randomization.

  • The map updating the covariance from forecast to analysis is locally

Lipschitz continuous in all Lp.

  • The sample mean and sample covariance of the initial ensemble

converge in all Lp to the exact background with the standard Monte-Carlo rate 1/ √ N by Lp laws of large numbers.

  • Because the model operator and the update (analysis) step operators

are locally Lipschitz continuous in all Lp, at every step, the EnKF ensemble converges in all Lp to the exact filtering mean and covariance, with the standard Monte-Carlo convergence rate 1/ √ N (Kwiatkowski and M., 2015).

  • Convergence in terms of sample mean and sample covariance only.

Nothing said about individual ensembe members. Linear case only.

slide-58
SLIDE 58

Applica'on: Assimila'on of ac've fires detec'on from polar-orbi'ng satellites

Funded projects/Goals:

  • Wildland fire simula'on and forecas'ng with assimila'on of

satellite Ac've Fires detec'on data. Running autonomously for any fire anywhere any 'me in the US, distributed on the web as a fire forecast layer over weather forecast. (NASA)

  • Wildfire danger and simula'on system for Israel (opera'onal)

(Israel Homeland Security)

  • Op'mal placement of sensors for a large experimental burns in

2017 to improve smoke modeling (U.S. Joint Fire Science Program)

  • Data assimila'on methodology (NSF and Czech Grant Agency)

With Adam K. Kochanski, Sher Schranz, and Mar'n Vejmelka

slide-59
SLIDE 59
slide-60
SLIDE 60

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!SFIRE Atmosphere!model!WRF Surface!fire!spread!model

Wind Heat!and! vapor! fluxes

Fuel!moisture!model

Surface!air! temperature,! rela?ve! humidity, rain

Chemical!transport! model!WRFBChem

Fire! emissions! (smoke)

RAWS fuel moisture sta'ons VIIRS/MODIS fire detec'on HRRR forecast Data assimila'on

WRF-SFIRE components

Data assimila7on Satellite moisture sensing Data assimila'on

slide-61
SLIDE 61

Wildland Fire Behavior and Risk Forecasting PI: Sher Schranz, CSU/CIRA

As of: March 1, 2016

The model: WRF-SFIRE

WRF is a large open source project headed by NCAR

2013 Patch Springs Fire, UT

slide-62
SLIDE 62

Representa'on of the fire area by a level set func'on

  • The level set func'on is given on center nodes of the fire mesh
  • Interpolated linearly, parallel to the mesh lines
  • Fireline connects the points where the interpolated values are zero
slide-63
SLIDE 63

Evolving the fireline by the level set method

Level set func'on L Level set equa'on Fire area: L<0 Right-hand side < 0 → Level set func'on goes down → fire area grows

∂L ∂t = −R ∇L

slide-64
SLIDE 64

The fire model: fuel consump'on

igni'on fuel 'me

Time constant of fuel:

30 sec - Grass burns quickly 1000 sec – Dead & down branches(~40% decrease in mass over 10 min)

slide-65
SLIDE 65

Integra'ng fuel lee over mesh cells, with submesh fire region representa'on

slide-66
SLIDE 66

Coupling with WRF- ARW

  • WRF-ARW is explicit in

'me, semi-implicit in ver'cal direc'on only

  • Physics packages

including fire are called only in the last Runge-Ku@a substep

  • Fire module inputs

wind, outputs heat and vapor flux

Runge-Kufa order 3 integra'on in 'me

slide-67
SLIDE 67

The fire model is running on a finer mesh than the atmosphere model

slide-68
SLIDE 68

WRF parallel infrastructure - MPI and OpenMP

  • Distributed memory (DM):

halo exchanges between grid patches: each patch runs in

  • ne MPI process;

programmer only lists the variables to exchange

  • Shared memory (SM):

OpenMP loops over 7les within the patch

  • Computa'onal rou'nes are

7le callable.

  • Fire model executes on the

same horizontal 'les as the atmosphere model, in the same threads

MPI OpenMP threads, mul'core

Example: 2 MPI processes 4 threads each

The parallel infrastructure constrains the algorithms used.

patch

halo

'le

slide-69
SLIDE 69

Parallelism in WRF-Fire: implemen'ng a PDE solver in WRF physics layer, meant for pointwise calcula'ons

slide-70
SLIDE 70

Parallel performance

  • 2009 Harmanli fire (Bulgaria), satellite data. Janus

cluster, University of Colorado/NCAR. Intel X5660

  • processors. 180x180x41, 221x221x41 atmosphere

grids 50m, fire mesh 5m. Time step 0.3 s.

  • Faster than real time on 120 cores.
slide-71
SLIDE 71

Crime and punishment

  • Euler equa'ons require

inflow velocity boundary condi7ons only. Yet WRF imposes lateral boundary condi'ons (from global weather state) all around

  • WRF makes it up by only

nudging at the boundary, ar'ficial viscosity, and smoothing in a layer around the boundary. It’s a balancing act.

  • We are using WRF in a

regime it was not meant for: up to 1MWm-2 fire heat flux, very fine meshes

color: ver'cal wind component + fire arrows: horizontal wind all wind at level 18 (approx. 2km al'tude) 2009 Harmanli fire, Bulgaria model setup: Georgi Jordanov, Nina Dobrinkova, Jon Beezley

smoothed strip red dot = instability

slide-72
SLIDE 72

So how do the data look like?

slide-73
SLIDE 73

Satellite Fire Detec7on data (Level 3 product, for public consump7on) 2010 Fourmile Canyon Fire, Boulder, CO

16

slide-74
SLIDE 74

Aqua MODIS scanning

slide-75
SLIDE 75

Level 2 data – for science use: Fixed-size granules about 5 min of flight each

slide-76
SLIDE 76

|‌ 19

The data

  • Level 2 MODIS/VIIRS

Ac've Fires (for now, resampled as geo'ff)

  • Intersec'on of the

granules with the domain of interest

  • No fire detec7on is also

informa7on!

  • Except when data

missing (e.g., cloud)

  • Shown with fire

perimeter and wind field from the simula'on

slide-77
SLIDE 77

20

Fire detec'on with simulated fire perimeter and wind vectors (detail)

Fire detec'on 0-6 hrs old Low confidence Nominal confidence High confidence Water Ground – no fire Cloud – no detec'on possible

slide-78
SLIDE 78

Occasional imperfect data

Granule interests the domain par'ally. Spurious band cca 1000km long

slide-79
SLIDE 79

Data assimila'on: Connec'ng the data with the model

  • Active Fires data resolution 375m-1km is much

coarser than the fuel data (Landfire) and fire models 30m-200m

  • False positives, false negatives, geolocation

errors, artifacts possible

  • Confidence levels given when fire is detected
  • Unfortunately there is no confidence level for

land or water detection (no fire)

  • Missing values: clouds, instrument error
  • Improve the model in a statistical sense, not

for direct use such as initialization

slide-80
SLIDE 80

Comparing Ac've Fires detec'on data and fire arrival 'me from simula'on

slide-81
SLIDE 81

Measuring how well does the model fit the data

  • The fire spread model state is encoded as the fire arrival

'me T

  • The fire is very likely to be detected at a loca'on when it

arrives there, and for a few hours aeer

  • The probability of detec'on quickly decreases aeerwards
  • The fire arrival 'me T from the model + sensor proper'es

allow us to compute the probability of fire detec'on or non-detec'on in every pixel

  • Plug in the actually observed data => data likelihood
  • Data likelihood = the probability of the data values

given the state T

slide-82
SLIDE 82

Probability of fire detec'on

[Based on Morise-e et. al 2005; Schroeder et al. 2008, 2014 ]

Ac'vely burning area in the pixel Proxy for radia've heat flux in the pixel. Probability of fire detec'on Time since fire arrival (h) Heat flux

slide-83
SLIDE 83

Data likelihood of ac've fires detec'on

Log probability of posi7ve detec'on • Fire is likely to be

detected for few hours aeer arrival at the loca'on.

  • The probability of

detec'on quickly decreases aeerwards.

  • The tails model the

uncertainty of the data and the model in 'me and space.

Logarithm of probability is more convenient. Instead of mul'plying probabili'es, add their logarithms.

slide-84
SLIDE 84

Probability of no fire detec'on

Determined by the probability of detec'on. probability( yes|T ) + probability( no|T ) = 1

slide-85
SLIDE 85

Evalua'ng the fit of the model state to the data: Data likelihood

  • The model state is encoded as the fire arrival time T.
  • Data likelihood = probability of the data given the model state T.
  • Computing with logarithms is more conventient - add not multiply
  • log(data likelihood) =

= confidence level(cell) ⋅ log data likelihood

grid cells

(cell)

granules

= c(cell) f

grid cells

(cell,T granule

granules

−T)

  • Cloud mask or missing data in a cell implemented by c(cell) = 0
slide-86
SLIDE 86

Automa'c igni'on from Ac've Fires data by maximum likelihood

  • Find the most probable fire given the data.

(Much like what a human expert might do.)

  • Using for igni'on from the first few MODIS/

VIIRS detec'ons (James Haley, PhD student, in progress)

  • But this ignores any previous modeling – not

data assimila'on yet

log Pr(data|T) = c(cell) f

grid cells

(cell,T granule

granules

−T,x, y) → max

T

slide-87
SLIDE 87

Data Assimila'on

  • A basic data assimila'on cycle to ingest data:

– Stop the simula'on – Modify the model state (called forecast, Tf) to reconcile the model with new data that arrived since the last cycle, get new es'mated state (called analysis, T) – Con'nue the simula'on from the analysis T

  • We do not want to follow the data or the model exactly.

Rather, split the difference taking into account their rela've uncertain'es.

  • We want the data likelihood large and the difference

between the analysis and the forecast small

  • This is how weather forecas@ng works nowadays – every

few hours (usually 1 to 6), new data are ingested like this.

slide-88
SLIDE 88

Assimila7on of MODIS/VIIRS Ac7ve Fire detec7on: maximum probability es7mate

  • We want the data likelihood Pr(data,T) large and the difference

between the analysis and the forecast T-Tf

  • Penalize the difference T-Tf in a suitable norm
  • Choose the norm to penalize non-smooth changes – measuring

the values of deriva'ves,

  • Efficient precondi'oned gradient method, easily running on a

laptop (not a supercomputer), only one of two itera'ons needed

31

u

2 ≈ u − ∂2

∂2x − ∂2 ∂2 y ⎛ ⎝ ⎜ ⎞ ⎠ ⎟

−a

udxdy

, a >1

log Pr(data|T) − T −T f

2 → max T

slide-89
SLIDE 89

32

Penalty by powers of Laplacian

  • Penalty by equivalent to prior assumption that error in T is

a gaussian random field with mean Tf and covariance A

  • Here,
  • With zero boundary conditions on rectangle, the eigenvectors are of the

form

  • Evaluate the action of powers of A by Fast Fourier Transform

(FFT)

! A = − ∂2 ∂2x − ∂2 ∂2 y ⎛ ⎝ ⎜ ⎞ ⎠ ⎟

−p

!!!!!!!!p >1,!!λ jk ∝

jπ a

( )

2

+

kπ b

( )

2

⎛ ⎝ ⎜ ⎞ ⎠ ⎟

−p

→0

! T ∼ !e

−1 2 T−Tf

A−1 2

⇔T =T f + θkλk

1/2 k

Tk ,!θk ∼ N(0,1),!ATk = λkTk

!

T−Tf

A−1 2

!λk →0!fast! ⇒ !random!field!smooth

!Tjk(x, y)∝sin jπx

a sin kπ y b

slide-90
SLIDE 90

Minimiza'on by precondi'oned steepest descent

33

J(T) = α 2 T −T f

A−1 2 −

ck fk(Tk

S −T(xk, yk ) k

) → min

C(T−T f )=0

∇J(T) = α A(T −T f ) − F(T), F(T) = ck

d dt fk(Tk

S −T(xk, yk ) k

) But ∇J(T) is a terrible descent direction, A ill conditioned

  • no progress at all!

Better: preconditioned descent direction A∇J(T) = α(T −T f ) − AF(T) AF(T) = − ∂2 ∂2x − ∂2 ∂2 y ⎛ ⎝ ⎜ ⎞ ⎠ ⎟

− p

ck

d dt fk(Tk

S −T(xk, yk ) k

) p >1:spatial smoothing of the forcing by log likelihood maximization T at ignition point does not change ⇒ descent direction δ from the saddle point problem Aδ + Cλ = ∂ ∂t f (T S −T,x, y), C Tδ = 0 Now one descent iteration is enough.

slide-91
SLIDE 91

Data assimila'on results 2013 Patch Springs fire, UT

Forecast Analysis

slide-92
SLIDE 92

But fire is coupled with the atmosphere

35

Atmosphere Heat release Fire propaga'on Wind Heat flux

  • Heat flux from the fire changes the state of the atmosphere
  • ver 'me.
  • Then the fire model state changes by data assimila'on.
  • The atmospheric state is no longer compa'ble with the fire.
  • How to change the state of the atmosphere model in

response data assimila'on into the fire model?

  • And not break the atmospheric model.
  • Replay the fire from given fire arrival 7me
slide-93
SLIDE 93

Spin up the atmospheric model a[er the fire model state is updated by data assimila7on

Fire arrival 'me changed by data assimila'on

Ac7ve fire detec7on

Atmosphere out

  • f sync with fire

Forecast fire simula'on Coupled atmosphere-fire Replay heat fluxes derived from the changed fire arrival 7me Rerun atmosphere model from an earlier 7me Con'nue coupled fire-atmosphere simula'on Atmosphere and fire in sync again

slide-94
SLIDE 94

Cycling with atmospheric model spin up

Forecast at 'me 1 Analysis at 'me 1 Assimila'on

slide-95
SLIDE 95

Cycling with atmospheric model spin up

Forecast Analysis Spin up

slide-96
SLIDE 96

Cycling with atmospheric model spin up

Replay the fire heat fluxes Use forecast Interpolate the fire arrival 'me Con'nue the coupled model from the analysis fire state and atmosphere state adapted to the fire Use analysis

slide-97
SLIDE 97

Progress of the op7miza7on algorithm

slide-98
SLIDE 98

Conclusion

  • A simple and efficient method – implemented by FFT, 2-3

itera'ons are sufficient to minimize the cost func'on numerically, 1 itera'on already prefy good

  • Pixels under a cloud cover do not contribute to the cost

func'on

  • Robust for missing and irregular data
  • Standard bayesian data assimila'on framework:

Forecast + data = analysis

  • Future:

– Automa'c igni'on from detec'on by maximum likelihood – Na've swath coordinates, data likelihood computed for na've pixels, not the cells of the computa'onal mesh – Befer forecast probability distribu'on to allow firefigh'ng ac'vi'es – now a bit too resistant to change

41

slide-99
SLIDE 99

References

[1] Ivo Babuˇ

  • ska. On randomised solutions of Laplace’s equation. ˇ

Casopis Pˇ est. Mat., 86:269–276, 1961. [2] Ivo Babuˇ ska and Panagiotis Chatzipantelidis. On solving elliptic stochastic partial differential equations. Comput. Methods Appl. Mech. Engrg., 191(37-38):4093–4122, 2002. [3] A. V. Balakrishnan. Applied functional analysis. Springer-Verlag, New York, 1976. [4] Jonathan D. Beezley. High-Dimensional Data Assimilation and Morphing Ensemble Kalman Filters with Applications in Wildfire Modeling. PhD thesis, University of Colorado Denver, 2009. [5] Gerrit Burgers, Peter Jan van Leeuwen, and Geir Evensen. Analysis scheme in the ensemble Kalman filter. Monthly Weather Review, 126:1719–1724, 1998. [6] Giuseppe Da Prato. An introduction to infinite-dimensional analysis. Springer-Verlag, Berlin, 2006.

slide-100
SLIDE 100

[7] M. Dashti, K. J. H. Law, A. M. Stuart, and J. Voss. MAP estimators and their consistency in Bayesian nonparametric inverse problems. Inverse Problems, 29(9):095017, 27, 2013. [8] Geir Evensen. Sequential data assimilation with nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. Journal of Geophysical Research, 99 (C5)(10):143–162, 1994. [9] B. Ganis, H. Klie, M. F. Wheeler, T. Wildey, I. Yotov, and D. Zhang. Stochastic collocation and mixed finite elements for flow in porous media. Comput. Methods Appl. Mech. Engrg., 197:3547–3559, 2008. [10] Evan Kwiatkowski and Jan Mandel. Convergence of the square root ensemble Kalman filter in the large ensemble limit. SIAM/ASA Journal on Uncertainty Quantification, 3(1):1–17, 2015. [11] Kody J. H. Law, Hamidou Tembine, and Raul Tempone. Deterministic methods for filtering, part I: Mean-field ensemble Kalman filtering. arXiv:1409.0628, 2014. [12] F. Le Gland, V. Monbet, and V.-D. Tran. Large sample asymptotics for the ensemble Kalman filter. In Dan Crisan and Boris Rozovskiˇ ı, editors, The Oxford Handbook of Nonlinear Filtering, pages 598–631. Oxford University Press, 2011.

slide-101
SLIDE 101

[13] Michel Ledoux and Michel Talagrand. Probability in Banach spaces. Ergebnisse der Mathematik und ihrer Grenzgebiete (3), Vol. 23. Springer-Verlag, Berlin, 1991. [14] Michel Lo`

  • eve. Probability theory. Third edition. D. Van Nostrand Co., Inc.,

Princeton, N.J.-Toronto, Ont.-London, 1963. [15] J. Mandel, S. Amram, J. D. Beezley, G. Kelman, A. K. Kochanski, V. Y. Kondratenko, B. H. Lynn, B. Regev, and M. Vejmelka. New features in WRF-SFIRE and the wildfire forecasting and danger system in Israel. Natural Hazards and Earth System Sciences Discussions, 2(2):1759–1797, 2014. [16] J. Mandel, A. K. Kochanski, M. Vejmelka, and J. D. Beezley. Data assimilation

  • f satellite fire detection in coupled atmosphere-fire simulations by WRF-SFIRE.

In Domingos Xavier Viegas, editor, Advances in Forest Fire Research, pages 716–724. Coimbra University Press, 2014. [17] Jan Mandel, Jonathan D. Beezley, and Adam K. Kochanski. Coupled atmosphere-wildland fire modeling with WRF 3.3 and SFIRE 2011. Geoscientific Model Development, 4:591–610, 2011. [18] Jan Mandel, Loren Cobb, and Jonathan D. Beezley. On the convergence of the ensemble Kalman filter. Applications of Mathematics, 56:533–541, 2011.

slide-102
SLIDE 102

[19] A. M. Stuart. Inverse problems: a Bayesian perspective. Acta Numer., 19:451–559, 2010.