Probability on spaces of functions with applications to inverse - PowerPoint PPT Presentation

First the Bayes theorem for discrete states Consider system in n possible states A 1 ... A k . A model forecast the probabilities Pr ( A 1 ) , .. Pr ( A m ). Data event D arrives and (e.g., from instrument properties) we know Pr ( D | A i ), the conditional probabilities . of getting data in the event D given system state A i We want to get improved Pr ( A i | D ), the probabilities of states A i given data D (the analysis) By definition of conditional probabilites: Pr ( D ∩ A i ) = Pr ( D | A i ) Pr ( A i ) = Pr ( A i | D ) Pr ( D ) Pr ( A i | D ) = Pr ( D | A i ) Pr ( A i ) , Pr ( D ) giving ( ∝ means proportional): Pr ( A i | D ) ∝ Pr ( D | A i ) Pr ( A i ) n Pr( D | A i ) Pr( A i ) � Normalize by Pr ( A i | D ) = 1: Pr ( A i | D ) = i =1 Pr( D | A i ) Pr( A i ) . � n i =1

Bayes theorem for probability densities Consider system state u ∈ R n with probability density p f ( u ) (forecast, from a model). Data d comes from some distribution with probability density p ( d | u ) conditional on system in state u . As in the discrete case, p ( u | d ), the probability density (analysis) of u given data d , is p ( d | u ) p f ( u ) p a ( u ) = p ( u | d ) = R n p ( d | u ) p f ( u ) du ∝ p ( d | u ) p f ( u ) . �

Bayes theorem for Gaussian probability densities Gaussian case: u ∼ N ( u, Q ), d = H ( u ) + e , e ∼ N (0 , R ). Then − 1 2 | u − u | 2 forecast density: p f ( u ) ∝ e Q − 1 , 2 | H ( u ) − d | 2 data likelihood: p ( d | u ) ∝ e − 1 R − 1 − 1 2 | u − u | 2 Q − 1 e − 1 2 | H ( u ) − d | 2 analysis density p a ( u ) ∝ e R − 1 H is called observation operator in data assimilation, and forward operator in inverse problems. It can be quite complicated, e.g., solving a PDE. H ( u ) − d is the mismatch between the data and the model (called “innovation” in data assimilation) Single answer: Maximum a-Posteriori Probability estimate (MAP): p a ( u ) → max Note. In infinite dimension, the MAP estimate can still be defined in a generalized sense (Dashti et al, 2013).

Bayes theorem in infinite dimension Forecast and analysis are probability distributions Bayes theorem: p a ( u ) ∝ p ( d | u ) p f ( u ) No Lebesgue measure, no densities. Integrate over an arbitrary measurable set A instead: � � p a ( u ) du ∝ p ( d | u ) p f ( u ) du A A � µ a ( A ) ∝ p ( d | u ) dµ f ( u ) A Data likelihood is the Radon-Nikodym derivative: p ( d | u ) ∝ dµ a dµ f � A p ( d | u ) dµ f ( u ) • Normalize: µ a ( A ) = � V p ( d | u ) dµ f ( u ) p ( d | u ) dµ f ( u ) > 0? � • But how do we know that V

Infinite dimensional Gaussian case • Data likelihood: d = Hu + ε , ε ∼ N (0 , R ), p ( d | u ) = e − 1 2 | Hu − d | 2 R − 1 • µ f is Gaussian measure on U , data d ∈ V • state space U and data space V are separable Hilbert spaces • All good when data is finite dimensional, then p ( d | u ) = e − 1 2 | Hu − d | 2 R − 1 > 0 always. • Difficulties when the data is infinite dimensional (a.k.a. functional data)

Infinite-dimensional data, Gaussian measure error bad • The simplest example: µ f = N (0 , Q ), H = I, d = 0 , R = Q, U = V. The whole state is observed, data error distribution = state error distribution. Come half-way? Wrong. � � − 1 R − 1 / 2 u,R − 1 / 2 u • p ( d | u ) = const e − 1 2 | u | 2 R − 1 = const e 2 � R − 1 / 2 � • data likelihood p ( d | u ) > 0 if u ∈ R 1 / 2 ( V ) = D • p ( d | u ) = e −∞ = 0 if u / ∈ R 1 / 2 ( V ) = Q 1 / 2 ( V ) • Q 1 / 2 ( V ) is the Cameron-Martin space of the measure N (0 , Q ) � � � Q 1 / 2 ( V ) p ( d | u ) dµ f ( u ) = 0 • But µ = N (0 , Q ) ⇒ µ = 0. Thus, V

Commutative case • State and data covariance commute ⇒ same eigenvectors e i • Qe i = q i e i , � ∞ i =1 q i < ∞ , Re i = r i e i • Recall that p ( d | u ) = e − 1 2 | d − u | 2 R − 1 , µ f = N (0 , Q ) Theorem. (Kasanick´ y and Mandel, 2016) � V p ( d | u ) dµ f ( u ) > 0 for any d ⇔ ∃ α > 0 : ∀ i : r i > α That is, (in the commutative case) Bayesian estimation is well posed for any data value if and only if the eigenvalues of the state covariance are bounded away from zero . • In particular, R cannot be trace class: � ∞ i =1 r i = ∞ , and data error cannot be a probability measure..

Infinite-dimensional data, white noise error good • All is good when data is finite-dimensional and R not singular • More generally, when data covariance R is bounded below: � u, Ru � ≥ α | u | 2 ∀ u, α > 0 ⇒ | u | 2 R − 1 < ∞ ∀ u ⇒ p ( d | u ) = e −| Hu − d | 2 R − 1 > 0 ∀ u � p ( d | u ) dµ f ( u ) > 0 ⇒ U • But if V is infinite dimensional, then N (0 , R ) is not a probability measure on V - the trace condition is violated, Tr ( R ) = � ∞ i =0 r i = ∞ . • But this is not a problem. The data likelihood p ( d | u ) is just a function of u on the state state U for a fixed d . • For a fixed u , p ( ·| u ) does not need to be a probability density. That was just an example.

Positive data likelihood Theorem. If the forecast µ f is a probability measure on the state space V , and, for a fixed realization of the data d , the function u �→ p ( d | u ) is µ f -measurable, and 0 < p ( d |· ) ≤ C µ f -a.s., (In fact, p ( d |· ) > 0 on some set of positive measure µ f is enough). Then the analysis measure A p ( d | u ) dµ f ( u ) � µ a ( A ) = � V p ( d | u ) dµ f ( u ) is well defined. Proof. Because µ f ( V ) = 1, from the assumption, V p ( d | u ) dµ f ( u ) ≤ C . � 0 ≤ V p ( d | u ) dµ f ( u ) = 0, then p ( d |· ) = 0 µ f - a.s., hence � If V p ( d | u ) dµ f ( u ) > 0. �

Examples of positive data likelihood • White noise : ( V, �· , ·� ) is a Hilbert space and p ( d | u ) = e − 1 2 � d − Hu,d − Hu � • Pointwise Gaussian: µ f is a random field on domain D ⊂ R 2 , data is a function d : D → R , and � p ( d | u ) = e − 1 D | d ( x ) − u ( x ) | 2 dx 2 • Pointwise positive: � D f ( x,d ( x ) ,u ( x )) dx p ( d | u ) = e (the satellite sensing application will be like that)

Karhunen-Lo` eve explansion - derivation Consider X ∈ L 2 (Ω , H ), C = Cov ( X ). C is compact self-adjoint ⇒ ∃ complete orthonormal system of eigenvectors: Cu n = λ n u n . For fixed ω ∈ Ω, expand X − E ( X ) in an abstract Fourier series, convergent in H : X ( ω ) = E ( X ) + � ∞ n =1 θ n ( ω ) u n , θ n ( ω ) = � X ( ω ) − E ( X ) , u n � . E ( θ n ) = E ( � X − E ( X ) , u m � ) = 0 E ( θ m θ n ) = E ( � X − E ( X ) , u m � � X − E ( X ) , u n � ) � λ n if m = n = � u m , Cu n � = 0 if m � = 0 Write θ n = λ 1 / 2 ξ n , then E ( ξ m ξ n ) = � ξ m , ξ n � L 2 (Ω) = δ mn . n

Karhunen-Lo` eve explansion If X ∈ L 2 (Ω , H ), C = Cov ( X ), then there exists expansion n =1 λ 1 / 2 X = E ( X ) + � ∞ ξ n u n , E ( ξ n ) = 0 . n convergent a.s. in H and double-orthogonal, � u m , u n � H = δ mn , � ξ m , ξ n � L 2 (Ω) = δ mn . (See [14] for further details.)

Applications of the Karhunen-Lo` eve explansion: Data analysis Estimate the covariance from data (realizations of U ) by sample covariance. Called Principal Component Analysis (PCA), Karhunen-Lo` eve transform (KLT), or Proper Orthogonal Decomposition (POD) . The components with several largest eigenvalues λ n are responsible for most of the variance in the random element X . The practical computation is done by SVD of the data minus sample mean, which is much less expensive and less prone to numerical errors than computing the sample covariance first.

Applications of the Karhunen-Lo` eve explansion: Generating random smooth functions Prescribe the covariance so that it has suitable eigenvectors, and use the Karhunen-Lo` eve expansion to generate the random element X (rather, the first few terms to generate a version of X in finite dimension). For example, random Fourier series on [0 , π ] , with H = L 2 (0 , π ) n =1 λ 1 / 2 F ( x ) = � ∞ ξ n sin( nx ), ξ n ∼ N (0 , I ) , n with λ n → 0. The smoothness of F depends on how fast λ n → 0. n =1 λ n < ∞ for F to be well defined and F ∈ L 2 (0 , π ) Need at least � ∞ a.s.

Applications of the Karhunen-Lo` eve explansion: Numerical methods for stochastic PDEs Use the first few terms Karhunen-Lo` eve expansion of the to build random coefficients and assumed form of the solution (called trial space in variational methods), and test functions. The solution is found numerically as a deterministic function of a small number of random variables ξ 1 , . . . , ξ n . This is the foundation of methods such as stochastic Galerkin [2], stochastic collocation [9] and polynomial chaos . The first application like that I know is Babuska 1961 [1].

Independent random elements Definition. Let (Ω , Σ , µ ) be a probability space. Events A, B ∈ Σ are independent if µ ( A ∩ B ) = µ ( A ) µ ( B ). Random elements X, Y : Ω → V are independent if for any Borel sets S, T ⊂ V , the events X − 1 ( S ) and Y − 1 ( T ) are independent. i.i.d. means independent identically distributed. Example. Independent random elements defined on different sample spaces Ω. The standard property carries over: If X, Y are random elements with values in Hilbert space, then Cov ( X, Y ) = 0.

Strong law of large numbers Theorem. [13, Corollary 7.10]Let ( X i ) be i.i.d. random elements with values in Banach space B with E ( | X 1 | ) < ∞ , and E n = 1 n ( X 1 + · · · + X n ). Then E n → E ( X ) almost surely. Strong, but not as useful for our purposes – in computations, we like explicit rates of convergence.

L 2 law of large numbers Lemma. Let X k ∈ L 2 (Ω , H ) be i.i.d. Then 1 2 � E n ( X k ) − E ( X 1 ) � 2 ≤ √ n � X 1 − E ( X 1 ) � 2 ≤ √ n � X 1 � 2 and E n ( X k ) ⇒ E ( X 1 ) as n → ∞ . Proof. The usual proof caries over. First E ( X 1 ) = 0. Then ?? ),  2  � �   n n n � �  1  = 1 � � � � E n ( X k ) � 2 � �   2 = E X k E ( � X k , X ℓ � ) � �   n 2 � N � � � k =1 k =1 ℓ =1 = 1 = 1 � | X 1 | 2 � n � X 1 � 2 nE 2 .

In the general case, from the triangle inequality, 1 � E n ( X k ) − E ( X 1 ) � 2 = √ n � X 1 − E ( X 1 ) � 2 1 ≤ √ n ( � X 1 � 2 + � E ( X 1 ) � 2 ) .

L p laws of large numbers Lemma Let H be a Hilbert space, X i ∈ L p (Ω , H ) be i.i.d., p ≥ 2, and E n ( X k ) = 1 � n k =1 X k . Then, n � E n ( X k ) − E ( X 1 ) � p ≤ C p √ n � X 1 − E ( X 1 ) � p ≤ 2 C p √ n � X 1 � p , where C p depends on p only. Note: L p laws of large numbers do not hold on a general Banach space. Related questions are studied a field called geometry of Banach spaces. Note: L p laws of large numbers do not hold on the space of [ H ] of all bounded linear operators on a Hilbert space. [ H ] is a Banach space, and any Banach space can be isomorphically embedded in [ H ] for some Hilbert space H .

Covariance by tensor product This is important in ensemble methods, where the covariance is estimated by sample covariance. In R n , the product xy T is a rank-one matrix. To get coordinate-free definition, write it as linear operator, � � xy T : z �→ x y T z = x � y, z � In Hilbert space, define tensor product by x ⊗ y ∈ [ H ] , ( x ⊗ y ) z = x � y, z � ∀ z ∈ H Now Cov ( X, Y ) = E (( Y − E ( Y )) ⊗ ( Y − E ( Y ))) = E ( X ⊗ Y ) − E ( X ) ⊗ E ( Y )

Laws of large numbers for sample covariance Write sample mean operator as N E N ( X k ) = 1 � X k . N k =1 Covariance can be estimated by sample covariance C N ( X k , Y k ) = E N ( X k − E N ( X k )) ⊗ ( Y k − E N ( Y k )) = E N ( X k ⊗ Y k ) − E N ( X k ) ⊗ E N ( Y k ) We already know E N ( X k ) → E N ( X ) in L p as N − 1 / 2 , p ≥ 2. But N E N ( X k ⊗ Y k ) = 1 � X k ⊗ Y k N k =1 are in [ H ] and there is no L p law of large numbers in [ H ]

L p law of large numbers for the sample covariance Let X i ∈ L p (Ω , H ) be i.i.d. and     N N N C N ([ X 1 , . . . , X N ]) = 1  1  1 � � �  ⊗ X i ⊗ X i − X i X i  N N N i =1 i =1 i =1 But [ H ] is not a Hilbert space. Use L 2 (Ω , V HS ) with V HS the space of Hilbert-Schmidt operators, ∞ ∞ � � | A | 2 HS = � Ae n , Ae n � < ∞ , � A, B � HS = � Ae n , Be n � , n =1 n =1 where { e n } is any complete orthonormal sequence in H . In finite dimension, the Hilbert-Schmidt norm becomes the Frobenius norm � � � � 2 . � | A | 2 � � � HS = � a ij � � i,j

L p law of large numbers for the sample covariance (cont.) Let X i ∈ L p (Ω , V ) be i.i.d. and V a separable Hilbert space. Then, from the L p law of large numbers in the Hilbert space V HS of Hilbert-Schmidt operators on V , � C N ([ X 1 , . . . , X N ]) − Cov ( X 1 ) � p ≤ �| C N ([ X 1 , . . . , X N ]) − Cov ( X 1 ) | HS � p ≤ const( p ) � X 1 � 2 √ 2 p . N where the constant depends on p only. Note that since H is separable, V HS is also separable.

Continuity of the sample covariance Write the N -th sample covariance of [ X 1 , X 2 , . . . ], as     N N N C N ( X ) = 1  1  1 � � �  ⊗ X i ⊗ X i − X i X i  N N N i =1 i =1 i =1 Bound If X i ∈ L 2 p are identically distributed, then � C N ( X ) � p ≤ � X 1 ⊗ X 1 � p + � X 1 � 2 p ≤ � X 1 � 2 2 p + � X 1 � 2 p Continuity of the sample covariance: If [ X i , U i ] ∈ L 4 are identically distributed, then √ � � X 1 � 2 4 + � U 1 � 2 � C N ( X ) − C N ( U ) � 2 ≤ 8 � X 1 − U 1 � 4 4 .

Data assimilation - Analysis cycle • Goal: Inject data into a running model analysis advance analysis step in time step forecast − → analysis − → forecast − → analysis . . . data ր data ր • Kalman filter used already in Apollo 11 navigation, now in GPS, computer vision, weather forecasting, remote sensing,...

From least squares to Kalman filter Inverse problem Hu ≈ d with background knowledge u ≈ u f Q − 1 + | d − Hu | 2 | u − u f | 2 R − 1 → min u u f =forecast, what we think the state should be, d =data, H = linear observation operator Hu =what the data should be given state u if no errors �� 2 � u − u f � � u − u f � � � 2 Q − 1 + | d − Hu | 2 − 1 e − 1 e − 1 2 | d − Hu | 2 R − 1 Q − 1 2 R − 1 e = 2 → max u � �� ∝ p a ( u )= p ( u | d ) ∝ p ( u ) ∝ p ( d | u ) data likelihood analysis (posterior) density forecast (prior) density − 1 2 | u − u a | 2 Qa − 1 mean and covariance are: The analysis density p a ( u ) ∝ e u a = u f + K ( d − Hu f ) , Q a = ( I − KH ) Q where K = K ( Q ) = QH T ( HQH T + R ) − 1 is called the Kalman gain

Kalman filter (KF) • Add advancing in time - model is operator u �→ Mu + b : mean and covariance advanced as Mu and MQM T • Hard to advance the covariance matrix when the model is nonlinear. • Need linear operators, or tangents and adjoints. • Can’t maintain the covariance matrix for large problems, such as from discretizations of PDEs.

Ensemble forecasting to the rescue • Ensemble weather forecast: Independent randomized simulations. If it rains in 30% of them, then say that “the chance of rain is 30%”. • Ensemble Kalman filter (EnKF, Evensen 1994): replace the covariance in KF by the sample covariance of the ensemble, then apply the KF formula for the mean to each ensemble member independently. • But the analysis covariance was wrong – too small even if the covariance were exact. • Fix: randomize also the data by sampling from the data error distribution . (Burgers, Evensen, van Leeuwen, 1998). • Then all is good and we should get the right ensembles... at least in the Gaussian case, and up to the approximation by the sample covariance.

The EnKF with data randomization is statistically exact if the covariance is exact Lemma. Let U f ∼ N ( u f , Q f ) , D = d + E , E ∼ N (0 , R ) and U a =U f +K(D-HU f ), K=Q f H T (HQ f H T +R) − 1 . Then U a ∼ N ( u a , Q a ) , i.e., U has the correct analysis distribution from the Bayes theorem and the Kalman filter. Proof. Computation in Burgers, Evensen, van Leeuwen, 1998. Corollary. If the forecast ensemble [ U f 1 , . . . , U f N ] is a sample from Gaussian forecast distribution, and the exact covariance is used, then the analysis ensemble [ U a 1 , . . . , U a N ] is a sample from the analysis distribution.

EnKF properties • The model is needed only as a black box . • The ensemble members interact through ensemble mean and covariance only • The EnKF was derived for Gaussian distributions but the algorithm does not depend on this. • So it is often used for nonlinear models and non-Gaussian distributions anyway. • Folklore: “high-dimensional problems require large ensembles” a.k.a. “ curse of dimensionality ”. Not necessarily so. • Curse of dimensionality: slow convergence in high dimension. With arbitrary probability distributions, sure. But probability distributions in practice are not arbitrary: the state is a discretization of a random function.The smoother the functions, the faster the EnKF convergence.

Curse of dimensionality? Not for probability measures! EnKF, N = 10 , m = 25 10 2 10 1 filter error 10 0 10 − 1 10 − 2 10 1 10 2 10 3 model size Constant covariance eigenvalues λ k = 1 and the inverse law λ k = 1 /k are not probability measures in the limit because � ∞ k =1 λ k = ∞ . Inverse square law λ k = 1 /k 2 gives a probability measure because � ∞ n =1 λ k < ∞ . Observation: m=25 uniformly sampled data points on [0,1], N=10 ensemble members, sin( kπx ) as eigenvectors. From the thesis Beezley (2009), Fig. 4.7.

Convergence of EnKF in the large ensemble limit • Laws of large numbers to guarantee that the EnKF gives correct results for large ensembles, in the Gaussian case: Le Gland et al. (2011) in L p entry-by-entry, Mandel et al (2011) in probability (in the lecture notes). • In general, the EnKF converges to a mean-field limit (Le Gland et al. 2011, Law et al. 2015). - mean-field approximation = the effect of all other particles on any one particle is replaced by a single averaged effect - mean field limit = large number of particles, the influence of each becomes negligible. • Here: dimension-independent L p - coordinates free

Convergence of the EnKF to the Kalman filter • Generate the initial ensemble X (0) = [ X (0) N 1 , . . . , X (0) NN ] i.i.d. and N Gaussian. • Run EnKF, get ensembles [ X ( k ) N 1 , . . . , X ( k ) NN ] , advance in time by linear models: X k +1 ,f = M k X k Ni + f k . Ni • For theoretical purposes, define the reference ensembles [ U ( k ) N 1 , . . . , U ( k ) NN ] by U (0) = X (0) and U ( k ) by the EnKF, except with N N N the exact covariance of the filtering distribution instead of the sample covariance. • We already know that U ( k ) is a sample (i.i.d.) from the Gaussian N filtering distribution (as it would be delivered by the Kalman filter). • We will show by induction that X ( k ) Ni → U ( k ) Ni in L p for all 1 ≤ p < ∞ as N → ∞ , for all analysis cycles k . • In which sense? We need this for all i = 1 , . . . , N but U Ni change and N changes too.

Exchangeability of EnKF ensembles Definition. Ensemble [ X 1 , . . . , X N ] is exchangeable if its joint distribution is permutation invariant. Lemma. All ensembles generated by the EnKF are exchangeable. Proof: The initial ensemble is i.i.d., thus exchangeable, and the analysis step is permutation invariant.        X ( k )  X ( k )    , . . . , N 1 NN Corollary The random elements  are also  U ( k ) U ( k )  N 1 NN exchangeable . (Recall that U Ni are obtained using the exact covariance from the Kalman filter rather than using the sample covariance.) � � X ( k ) N 1 − U ( k ) N 1 , . . . , X ( k ) NN − U ( k ) Corollary . are identically distributed, NN � � � X ( k ) N 1 − U ( k ) � � so we only need to consider the convergence → 0. � � N 1 � p

Convergence of the EnKF to mean-field limit Nonlinear tranformation of ensemble as vector of exchangeable random variables [ X 1 , X 2 , . . . , X N ] �→ [ Y 1 , Y 2 , . . . , Y N ] L p continuity of the model. Drop the superscript m Mean field filter: Y Mean field = X Mean field + K ( Q )( D k − HX Mean field ), Q = Cov ( X 1 ) k k k Randomized EnKF: Y Randomized = X Randomized + K ( Q N )( D k − HX Randomized ), k k k � � X Randomized , . . . , X Randomized Q N = C N , same D k 1 N

Convergence of the EnKF to mean-field limit (cont) Subtract, continuity of Kalman gain: �K ( Q ) − K ( Q N ) � p ≤ const � Q − Q N � 2 p � � L p law of large numbers for C N X Mean field , . . . , X Mean field . 1 N � � � X m � � Apriori bound � p ≤ const ( m ) for all m from k � � ( HQH ∗ + R ) − 1 � � ≤ 1 � � α by R ≥ αI Induction over m : X m, Randomized → X m, Mean field in all L p , 1 1 1 ≤ p < ∞ • Dimension independent, but constants grow with every cycle. Cascade in p - need L 2 p bounds in the previous cycle to get L p . • Future - combine with long-term stability resuts? (Law, Stuart). Need to assume more about the model (ergodic) and sufficient observations.

Convergence of the square root EnKF • Start from random ensemble, then update the ensemble by a deterministic algorithm to guarantee the analysis sample mean and sample covariance from Kalman formulas applied to the forecast sample mean and covariance. This involves factorization of the Gram matrix, hence “square root”. No data randomization. • The map updating the covariance from forecast to analysis is locally Lipschitz continuous in all L p . • The sample mean and sample covariance of the initial ensemble converge in all L p to the exact background with the standard √ N by L p laws of large numbers. Monte-Carlo rate 1 / • Because the model operator and the update (analysis) step operators are locally Lipschitz continuous in all L p , at every step, the EnKF ensemble converges in all L p to the exact filtering mean and √ covariance, with the standard Monte-Carlo convergence rate 1 / N (Kwiatkowski and M., 2015). • Convergence in terms of sample mean and sample covariance only. Nothing said about individual ensembe members. Linear case only.

Applica'on: Assimila'on of ac've fires detec'on from polar-orbi'ng satellites With Adam K. Kochanski, Sher Schranz, and Mar'n Vejmelka Funded projects/Goals: • Wildland fire simula'on and forecas'ng with assimila'on of satellite Ac've Fires detec'on data. Running autonomously for any fire anywhere any 'me in the US, distributed on the web as a fire forecast layer over weather forecast. (NASA) • Wildfire danger and simula'on system for Israel (opera'onal) (Israel Homeland Security) • Op'mal placement of sensors for a large experimental burns in 2017 to improve smoke modeling (U.S. Joint Fire Science Program) • Data assimila'on methodology (NSF and Czech Grant Agency)

WRF-SFIRE components HRRR forecast Atmosphere!model!WRF Chemical!transport! model!WRFBChem Surface!air! temperature,! rela?ve! Heat!and! Fire! humidity, vapor! emissions! rain fluxes (smoke) Wind !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!SFIRE Fuel!moisture!model Surface!fire!spread!model Data assimila'on Data assimila'on Data assimila7on RAWS fuel moisture sta'ons Satellite moisture sensing VIIRS/MODIS fire detec'on

Wildland Fire Behavior and Risk Forecasting As of: The model: WRF-SFIRE March PI: Sher Schranz, CSU/CIRA 1, 2016 WRF is a large open source project headed by NCAR 2013 Patch Springs Fire, UT

Representa'on of the fire area by a level set func'on • The level set func'on is given on center nodes of the fire mesh • Interpolated linearly, parallel to the mesh lines • Fireline connects the points where the interpolated values are zero

Evolving the fireline by the level set method Level set func'on L Fire area: L<0 ∂ L ∂ t = − R ∇ L Level set equa'on Right-hand side < 0 → Level set func'on goes down → fire area grows

The fire model: fuel consump'on fuel igni'on 'me Time constant of fuel: 30 sec - Grass burns quickly 1000 sec – Dead & down branches(~40% decrease in mass over 10 min)

Integra'ng fuel lee over mesh cells, with submesh fire region representa'on

Coupling with WRF- ARW • WRF-ARW is explicit in 'me, semi-implicit in ver'cal direc'on only • Physics packages including fire are called only in the last Runge-Ku@a substep • Fire module inputs Runge-Kufa order 3 integra'on in 'me wind, outputs heat and vapor flux

The fire model is running on a finer mesh than the atmosphere model

WRF parallel infrastructure - MPI and OpenMP • Distributed memory (DM): halo exchanges between grid MPI patches : each patch runs in one MPI process; patch programmer only lists the variables to exchange halo • Shared memory (SM): OpenMP loops over 7les within the patch • Computa'onal rou'nes are 'le 7le callable . • Fire model executes on the same horizontal 'les as the OpenMP threads, atmosphere model, in the mul'core same threads Example: 2 MPI processes 4 threads each The parallel infrastructure constrains the algorithms used.

Parallelism in WRF-Fire: implemen'ng a PDE solver in WRF physics layer, meant for pointwise calcula'ons

Parallel performance • 2009 Harmanli fire (Bulgaria), satellite data. Janus cluster, University of Colorado/NCAR. Intel X5660 processors. 180x180x41, 221x221x41 atmosphere grids 50m, fire mesh 5m. Time step 0.3 s. • Faster than real time on 120 cores.

red dot = instability Crime and punishment • Euler equa'ons require inflow velocity boundary condi7ons only. Yet WRF imposes lateral boundary condi'ons (from global weather state) all around • WRF makes it up by only nudging at the boundary, ar'ficial viscosity, and smoothing in a layer around the boundary. It ’ s a balancing act. color: ver'cal wind component + fire smoothed strip • We are using WRF in a arrows: horizontal wind all wind at level 18 (approx. 2km al'tude) regime it was not meant 2009 Harmanli fire, Bulgaria for : up to 1MWm -2 fire heat model setup: Georgi Jordanov, Nina Dobrinkova, Jon Beezley flux, very fine meshes

So how do the data look like?

Satellite Fire Detec7on data (Level 3 product, for public consump7on) 2010 Fourmile Canyon Fire, Boulder, CO 16

Aqua MODIS scanning

Level 2 data – for science use: Fixed-size granules about 5 min of flight each

The data • Level 2 MODIS/VIIRS Ac've Fires (for now, resampled as geo'ff) • Intersec'on of the granules with the domain of interest • No fire detec7on is also informa7on! • Except when data missing (e.g., cloud) • Shown with fire perimeter and wind field from the simula'on | ‌ 19

Fire detec'on with simulated fire perimeter and wind vectors (detail) Water Ground – no fire Cloud – no detec'on possible Fire detec'on 0-6 hrs old Low confidence Nominal confidence High confidence 20

Occasional imperfect data Granule interests the domain par'ally. Spurious band cca 1000km long

Data assimila'on: Connec'ng the data with the model • Active Fires data resolution 375m-1km is much coarser than the fuel data (Landfire) and fire models 30m-200m • False positives, false negatives, geolocation errors, artifacts possible • Confidence levels given when fire is detected • Unfortunately there is no confidence level for land or water detection (no fire) • Missing values: clouds, instrument error • Improve the model in a statistical sense, not for direct use such as initialization

Comparing Ac've Fires detec'on data and fire arrival 'me from simula'on

Measuring how well does the model fit the data • The fire spread model state is encoded as the fire arrival 'me T • The fire is very likely to be detected at a loca'on when it arrives there, and for a few hours aeer • The probability of detec'on quickly decreases aeerwards • The fire arrival 'me T from the model + sensor proper'es allow us to compute the probability of fire detec'on or non-detec'on in every pixel • Plug in the actually observed data => data likelihood • Data likelihood = the probability of the data values given the state T

Probability of fire detec'on [Based on Morise-e et. al 2005; Schroeder et al. 2008, 2014 ] Probability of fire detec'on Ac'vely burning area in the pixel Proxy for radia've heat flux in the pixel. Heat flux Time since fire arrival (h)

Data likelihood of ac've fires detec'on Logarithm of probability is more convenient. Instead of mul'plying probabili'es, add their logarithms. Log probability of posi7ve detec'on • Fire is likely to be detected for few hours aeer arrival at the loca'on. • The probability of detec'on quickly decreases aeerwards. • The tails model the uncertainty of the data and the model in 'me and space.

Probability of no fire detec'on Determined by the probability of detec'on. probability( yes| T ) + probability( no| T ) = 1

Evalua'ng the fit of the model state to the data: Data likelihood • The model state is encoded as the fire arrival time T . • Data likelihood = probability of the data given the model state T . • Computing with logarithms is more conventient - add not multiply • log(data likelihood) = ∑ ∑ confidence level(cell) ⋅ log data likelihood = (cell) g ranules grid cells ∑ ∑ = − T ) (cell, T granule c (cell) f g ranules grid cells • Cloud mask or missing data in a cell implemented by c (cell) = 0

Automa'c igni'on from Ac've Fires data by maximum likelihood • Find the most probable fire given the data. (Much like what a human expert might do.) ∑ ∑ log Pr(data| T ) = − T , x , y ) → max (cell, T granule c (cell) f T g ranules grid cells • Using for igni'on from the first few MODIS/ VIIRS detec'ons (James Haley, PhD student, in progress) • But this ignores any previous modeling – not data assimila'on yet

Data Assimila'on • A basic data assimila'on cycle to ingest data: – Stop the simula'on – Modify the model state (called forecast , T f ) to reconcile the model with new data that arrived since the last cycle, get new es'mated state (called analysis , T ) – Con'nue the simula'on from the analysis T • We do not want to follow the data or the model exactly. Rather, split the difference taking into account their rela've uncertain'es. • We want the data likelihood large and the difference between the analysis and the forecast small • This is how weather forecas@ng works nowadays – every few hours (usually 1 to 6), new data are ingested like this.

Assimila7on of MODIS/VIIRS Ac7ve Fire detec7on: maximum probability es7mate • We want the data likelihood Pr(data , T ) large and the difference between the analysis and the forecast T - T f • Penalize the difference T - T f in a suitable norm 2 → max log Pr(data| T ) − T − T f T • Choose the norm to penalize non-smooth changes – measuring the values of deriva'ves, − a ⎛ ⎞ 2 ≈ u − ∂ 2 ∂ 2 x − ∂ 2 ∫ , a > 1 u udxdy ⎜ ⎟ ∂ 2 y ⎝ ⎠ • Efficient precondi'oned gradient method, easily running on a laptop (not a supercomputer), only one of two itera'ons needed 31

Penalty by powers of Laplacian 2 T − T f • Penalty by equivalent to prior assumption that error in T is ! A − 1 a gaussian random field with mean T f and covariance A − 1 2 ⇔ T = T f + 2 T − T f ∑ θ k λ k 1/2 T k ,! θ k ∼ N (0,1),! AT k = λ k T k T ∼ ! e A − 1 ! k ! λ k → 0!fast! ⇒ !random!field!smooth • − p • Here, ⎛ ⎞ − p ( ) A = − ∂ 2 − ∂ 2 ( ) ⎛ ⎞ 2 2 !!!!!!!! p > 1,!! λ jk ∝ j π + → 0 k π ⎜ ⎟ ⎜ ⎟ ∂ 2 y ⎝ ⎠ a b ∂ 2 x ⎝ ⎠ ! • With zero boundary conditions on rectangle, the eigenvectors are of the ! T jk ( x , y ) ∝ sin j π x a sin k π y form b • Evaluate the action of powers of A by Fast Fourier Transform (FFT) 32

Minimiza'on by precondi'oned steepest descent 2 − J ( T ) = α S − T ( x k , y k ) ∑ 2 T − T f ) → c k f k ( T k min A − 1 C ( T − T f ) = 0 k S − T ( x k , y k ) ∇ J ( T ) = α A ( T − T f ) − F ( T ), F ( T ) = ∑ d c k dt f k ( T k ) k But ∇ J ( T ) is a terrible descent direction, A ill conditioned - no progress at all! Better: preconditioned descent direction A ∇ J ( T ) = α ( T − T f ) − AF ( T ) − p ⎛ ⎞ AF ( T ) = − ∂ 2 ∂ 2 x − ∂ 2 S − T ( x k , y k ) ∑ d c k dt f k ( T k ) ⎜ ⎟ ∂ 2 y ⎝ ⎠ k p > 1:spatial smoothing of the forcing by log likelihood maximization T at ignition point does not change ⇒ descent direction δ from the saddle point problem A δ + C λ = ∂ ∂ t f ( T S − T , x , y ), C T δ = 0 Now one descent iteration is enough. 33

Data assimila'on results 2013 Patch Springs fire, UT Analysis Forecast

But fire is coupled with the atmosphere Atmosphere Heat flux Wind Fire Heat propaga'on release Heat flux from the fire changes the state of the atmosphere • over 'me. Then the fire model state changes by data assimila'on. • The atmospheric state is no longer compa'ble with the fire. • How to change the state of the atmosphere model in • response data assimila'on into the fire model? And not break the atmospheric model. • Replay the fire from given fire arrival 7me • 35

Spin up the atmospheric model a[er the fire model state is updated by data assimila7on Rerun atmosphere Atmosphere and Atmosphere out Coupled model from an fire in sync again of sync with fire atmosphere-fire earlier 7me Con'nue coupled Replay heat fluxes Forecast fire Fire arrival 'me fire-atmosphere derived from the simula'on changed by simula'on changed data assimila'on fire arrival 7me Ac7ve fire detec7on

Cycling with atmospheric model spin up Forecast at 'me 1 Analysis at 'me 1 Assimila'on

Cycling with atmospheric model spin up Analysis Spin up Forecast

Cycling with atmospheric model spin up Con'nue the coupled model from the analysis fire state and atmosphere state adapted to the fire Use analysis Replay the fire heat fluxes Interpolate the fire arrival 'me Use forecast

Progress of the op7miza7on algorithm

Conclusion • A simple and efficient method – implemented by FFT, 2-3 itera'ons are sufficient to minimize the cost func'on numerically, 1 itera'on already prefy good • Pixels under a cloud cover do not contribute to the cost func'on • Robust for missing and irregular data • Standard bayesian data assimila'on framework: Forecast + data = analysis • Future: – Automa'c igni'on from detec'on by maximum likelihood – Na've swath coordinates, data likelihood computed for na've pixels, not the cells of the computa'onal mesh – Befer forecast probability distribu'on to allow firefigh'ng ac'vi'es – now a bit too resistant to change 41

References ska. On randomised solutions of Laplace’s equation. ˇ [1] Ivo Babuˇ Casopis Pˇ est. Mat. , 86:269–276, 1961. [2] Ivo Babuˇ ska and Panagiotis Chatzipantelidis. On solving elliptic stochastic partial differential equations. Comput. Methods Appl. Mech. Engrg. , 191(37-38):4093–4122, 2002. [3] A. V. Balakrishnan. Applied functional analysis . Springer-Verlag, New York, 1976. [4] Jonathan D. Beezley. High-Dimensional Data Assimilation and Morphing Ensemble Kalman Filters with Applications in Wildfire Modeling . PhD thesis, University of Colorado Denver, 2009. [5] Gerrit Burgers, Peter Jan van Leeuwen, and Geir Evensen. Analysis scheme in the ensemble Kalman filter. Monthly Weather Review , 126:1719–1724, 1998. [6] Giuseppe Da Prato. An introduction to infinite-dimensional analysis . Springer-Verlag, Berlin, 2006.

[7] M. Dashti, K. J. H. Law, A. M. Stuart, and J. Voss. MAP estimators and their consistency in Bayesian nonparametric inverse problems. Inverse Problems , 29(9):095017, 27, 2013. [8] Geir Evensen. Sequential data assimilation with nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. Journal of Geophysical Research , 99 (C5)(10):143–162, 1994. [9] B. Ganis, H. Klie, M. F. Wheeler, T. Wildey, I. Yotov, and D. Zhang. Stochastic collocation and mixed finite elements for flow in porous media. Comput. Methods Appl. Mech. Engrg. , 197:3547–3559, 2008. [10] Evan Kwiatkowski and Jan Mandel. Convergence of the square root ensemble Kalman filter in the large ensemble limit. SIAM/ASA Journal on Uncertainty Quantification , 3(1):1–17, 2015. [11] Kody J. H. Law, Hamidou Tembine, and Raul Tempone. Deterministic methods for filtering, part I: Mean-field ensemble Kalman filtering. arXiv:1409.0628, 2014. [12] F. Le Gland, V. Monbet, and V.-D. Tran. Large sample asymptotics for the ensemble Kalman filter. In Dan Crisan and Boris Rozovskiˇ ı, editors, The Oxford Handbook of Nonlinear Filtering , pages 598–631. Oxford University Press, 2011.

Probability on spaces of functions with applications to inverse - PowerPoint PPT Presentation

Probability on spaces of functions with applications to inverse problems and data assimilation Jan Mandel Department of Mathematical and Statistical Sciences University of Colorado Denver European Mathematical Society School on Mathematical

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Probability & Stochastic Processes Introduction to Probability Theory Sample Spaces Event

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Tyrol Hill Park Phase 4 Elementary Campbell Elementary Campbell Park Spaces Open Park

MATH 105: Finite Mathematics 7-1: Sample Spaces and Assignment of Probability Prof. Jonathan

Foundations of Computer Science Lecture 15 Probability Computing Probabilities Probability and

Counting and Probability Whats to come? Counting and Probability Whats to come?

18.175: Lecture 1 Probability spaces and -algebras Scott Sheffield MIT 1 18.175 Lecture 1

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Topological algebras on Boolean spaces as dual spaces and applications in formal language theory

AMPO Public Involvement Working Group Meeting March 11, 2019 at 2pm eastern time Welcome and

Processing Measurement Uncertainty: From Intervals and p-Boxes to Probabilistic Nested Intervals

Lecture 3 Outline Readings discussion Introduction to qualitative research Overview of

Introduction to Political Research Session 5: Theory in the Research Process, Concepts, Laws and

Use All Your Advocacy Tools to Help Workers and Families Login at:

Longer Life, Better Health or Both? Page 1 2017 Director of Public Health Annual Report Written

Welcome to CS 368! Introductions, Overview, Course Mechanics, Resources, etc. 2012 Summer

Translation Translation The Making of the Language The Making of the Language Versions of the

Probability on spaces of functions with applications to inverse - PowerPoint PPT Presentation

Probability on spaces of functions with applications to inverse problems and data assimilation Jan Mandel Department of Mathematical and Statistical Sciences University of Colorado Denver European Mathematical Society School on Mathematical

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Probability &amp; Stochastic Processes Introduction to Probability Theory Sample Spaces Event

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Tyrol Hill Park Phase 4 Elementary Campbell Elementary Campbell Park Spaces Open Park

MATH 105: Finite Mathematics 7-1: Sample Spaces and Assignment of Probability Prof. Jonathan

Foundations of Computer Science Lecture 15 Probability Computing Probabilities Probability and

Counting and Probability Whats to come? Counting and Probability Whats to come?

18.175: Lecture 1 Probability spaces and -algebras Scott Sheffield MIT 1 18.175 Lecture 1

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Topological algebras on Boolean spaces as dual spaces and applications in formal language theory

AMPO Public Involvement Working Group Meeting March 11, 2019 at 2pm eastern time Welcome and

Processing Measurement Uncertainty: From Intervals and p-Boxes to Probabilistic Nested Intervals

Lecture 3 Outline Readings discussion Introduction to qualitative research Overview of

Introduction to Political Research Session 5: Theory in the Research Process, Concepts, Laws and

Use All Your Advocacy Tools to Help Workers and Families Login at:

Longer Life, Better Health or Both? Page 1 2017 Director of Public Health Annual Report Written

Welcome to CS 368! Introductions, Overview, Course Mechanics, Resources, etc. 2012 Summer

Translation Translation The Making of the Language The Making of the Language Versions of the

Probability & Stochastic Processes Introduction to Probability Theory Sample Spaces Event