rare event probability estimation and convex programming
play

Rare-event probability estimation and convex programming Z. I. - PowerPoint PPT Presentation

Rare-event probability estimation and convex programming Z. I. Botev Rare-event probability estimation and convex programming p.1/29 Problem formulation We consider the problem of estimating rare-event probabilities of the form = P ( S


  1. Rare-event probability estimation and convex programming Z. I. Botev Rare-event probability estimation and convex programming – p.1/29

  2. Problem formulation We consider the problem of estimating rare-event probabilities of the form ℓ = P ( S ( X ) > γ ) , X = ( X 1 , . . . , X d ) , where S : R d → R and X 1 , . . . , X d are random variables with joint density f ( x ) . Almost any high-dimensional integration problem can be cast into this framework: option pricing, insurance value at risk, ABC, Ising model in physics, queuing models, counting problems in operations research and CS. Dichotomy: Frequently, one can easily simulate the rare event, but it is not clear how one can use the simulated data to estimate ℓ . Rare-event probability estimation and convex programming – p.2/29

  3. Bird’s eye view of existing methods importance sampling: state-dependent and state-independent; no need to simulate the rare-event under the original probability law in order to estimate efficiently the probability of its occurrence. adaptive importance sampling: cross entropy method conditioning splitting: splitting method typically yields exact or approximate realizations of the rare-event With the exception of splitting, all treat the problem of simulating the rare-event and estimating the rare-event probability as essentially separate problems. Rare-event probability estimation and convex programming – p.3/29

  4. Standard importance sampling We propose to estimate the quantity ℓ using maximum likelihood methods, where the data upon which the likelihood function depends is generated from computer simulation of the rare-event under the original probability law. To introduce the idea it is convenient to think of ℓ as a normalization constant ℓ s of the conditional density f s ( x ) = f ( x ) I { S ( x ) > γ } = w s ( x ) . ℓ s ℓ s Rare-event probability estimation and convex programming – p.4/29

  5. The typical importance sampling scheme Let f 1 ( x ) = w 1 ( x ) /ℓ 1 be another density whose normalizing constant ℓ 1 is known and { x : f 1 ( x ) > 0 } ⊇ { x : f s ( x ) > 0 } . Then, the natural estimator is n � f ( X j ) I { S ( X j ) > γ } s = 1 iid ℓ ∗ ∼ f 1 . , X 1 , . . . , X n n f 1 ( X j ) j =1 For good performance we need the tails of f 1 to be at least as heavy as the tails of f s . Frequently, the choice of f 1 is dictated by asymptotic analysis of ℓ ( γ ) as γ ↑ ∞ . Rare-event probability estimation and convex programming – p.5/29

  6. The typical importance sampling scheme It is well known that the zero-variance importance sampling density for estimating ℓ s is the conditional pdf f s . So we may try to use f s itself as an importance sampling density. We consider the estimator n � ℓ s = 1 w s ( X j ) iid � ∼ ¯ , X 1 , . . . , X n f, λ 1 f 1 ( X j ) + λ s w s ( X j ) / � n ℓ s j =1 � �� � ≈ f s where the normalizing constant ℓ s on the right is replaced with � ℓ s , giving rise to a nonlinear equation for � ℓ s . Rare-event probability estimation and convex programming – p.6/29

  7. Comparison of estimators Compare now the traditional importance sampling estimator with the proposed solve-the-equation-type. For the solve-the-equation-type the support and tail restrictions on f 1 are no longer necessary for good performance. The only requirement is that { x : f 1 ( x ) × f s ( x ) > 0 } � = ∅ , that is, the supports of f 1 and f s overlap. This is an example of using computer simulated data of the rare-event to obtain an M-type estimator of ℓ . Next, we try to generalize. Rare-event probability estimation and convex programming – p.7/29

  8. Suppose we are given the sequence of densities f t ( x ) = w t ( x ) = f ( x ) H t ( x ) , t = 1 , . . . , s , ℓ t ℓ t where f is a known density, { H t } are known functions, and ℓ t are probabilities acting as normalizing constants to { w t } . We are interested in estimating ℓ k ′ = E f H k ′ ( X ) for some k ′ . We assume that for at least one f t , say f 1 , the corresponding normalizing constant ℓ 1 is known, and, without loss of generality, equal to unity. We call f 1 a reference density. Rare-event probability estimation and convex programming – p.8/29

  9. Connectivity To proceed, suppose we are given a graph with s nodes and an edge between nodes i and j if and only if E f I { H i ( X ) > 0 } × I { H j ( X ) > 0 } > 0 . (1) We assume that there exists a path between any two nodes. We call the condition ( ?? ) on the supports of { f t } Vardi’s connectivity condition . Assume we have the iid sample iid ∼ f t ( x ) , X t, 1 , . . . , X t,n t t = 1 , . . . , s . Rare-event probability estimation and convex programming – p.9/29

  10. Connectivity B A C D Rare-event probability estimation and convex programming – p.10/29

  11. Mixture model Conceptually, same as sampling n = n 1 + · · · + n s variables with stratification from mixture s s � � f ( x ) = 1 def ¯ n t f t ( x ) = λ t f t ( x ) , λ t = n t /n . n t =1 t =1 Let the pooled sample be denoted via X 1 , . . . , X n , where the first n 1 samples are outcomes from f 1 , the next n 2 are samples from f 2 , and so on. Define the vector of parameters z = ( z 1 , . . . , z n ) = ( − ln(1 /λ 1 ) , − ln( ℓ 2 /λ 2 ) , . . . , − ln( ℓ s /λ s )) . Rare-event probability estimation and convex programming – p.11/29

  12. Empirical Likelihood Estimator Now consider the likelihood of the observed data X t, 1 , . . . , X t,n t , t = 1 , . . . , s as a function of z : s n k s n k � � � � w k ( X k,j ) f k ( X k,j ) = λ k e − z k j =1 j =1 k =1 k =1 s n k s n k � � � � w k ( X k,j ) ¯ f ( X k,j ) × = f ( X k,j ) . λ k e − z k ¯ j =1 j =1 k =1 k =1 Rare-event probability estimation and convex programming – p.12/29

  13. Empirical Likelihood Estimator The last yields the partial log-likelihood as a function of z : s n k � � w k ( X k,j ) ln f ( X k,j ) = λ k e − z k ¯ k =1 j =1 s n k � � ln( w k ( X k,j ) /λ k ) + z k − ln( ¯ = f ( X k,j )) j =1 k =1 n s � � ln( ¯ = const . − f ( X j )) + n k z k j =1 k =1 � s � n s � � � w k ( X j ) e z k = const . − ln + n λ k z k . j =1 k =1 k =1 Rare-event probability estimation and convex programming – p.13/29

  14. Empirical Likelihood Estimator Under the connectivity condition of Vardi and iid assumption on the sample, it can be shown that the maximum of the partial log-likelihood is the same as the maximum of the complete likelihood. In other words, the unique nonparametric maximum likelihood estimate of z (and hence of ℓ ) solves the almost surely convex optimization program (with z 1 = ln λ 1 fixed) � � z = argmin D ( z ) z � s � n s � � � (2) D ( z ) def � w k ( X j ) e z k − = ln n k z k , j =1 k =1 k =1 Rare-event probability estimation and convex programming – p.14/29

  15. Almost surely?? Define the matrix G i,j = E f I { H i ( X ) > 0 } × I { H j ( X ) > 0 } and its empirical counterpart � G i,j = 1 � I { H i ( X k ) > 0 } × I { H j ( X k ) > 0 } n k Under the connectivity condition, matrix G is irreducible, that is if for any pair ( i, j ) , we have � G k i,j > 0 for some k = 1 , 2 , . . . , , and since � G → G almost surely, with probability one the function � D ( z ) is a strictly convex and the maximum likelihood program has a unique solution. Rare-event probability estimation and convex programming – p.15/29

  16. M-estimator & Method of Moments To execute the optimization, we compute the ( s − 1) × 1 gradient ∇ � D ( z ) , which is a vector with components: n � w t ( X j ) e z t [ ∇ � � s k =1 w k ( X j ) e z k − n t , D ] t ( z ) = t = 2 , . . . , s . j =1 Using the prior information that ℓ 1 = 1 or z 1 = ln( λ 1 ) , these estimating equations are equivalent to solving the s − 1 dimensional system for the unknown z 2 , . . . , z s : n � w t ( X j ) e � z t 1 � s z k = λ t , t = 2 , . . . , s . n k =1 w k ( X j ) e � j =1 Rare-event probability estimation and convex programming – p.16/29

  17. Moment-matching master equation The last bit is equivalent to: n � A t,j � ℓ t = , A t,j = H t ( X j ) , t = 2 , . . . , s . � s k =1 A k,j n k / � ℓ k j =1 (3) It is sometimes easier to directly minimize � D , instead of solving the nonlinear system. The system can be solved using Jacobi/Gauss-Seidel type iteration. The process is similar to finding eigenvalues via power iteration. Rare-event probability estimation and convex programming – p.17/29

  18. Jacobi/Gauss-Seidel iteration algorithm Require: Matrix A and initial starting point ℓ = ( ℓ 1 , . . . , ℓ s ) = (1 , . . . , 1) Set ε = ∞ and ℓ ∗ ← ℓ while ε > 10 − 10 do for i = 2 , . . . , s do n � A i,j � s ℓ i ← k =1 A k,j n k /ℓ ∗ k j =1 | ℓ i − ℓ ∗ i | ε ← max i ℓ i Set ℓ ∗ ← ℓ return The vector of estimated probabilities � ℓ ← ℓ . Rare-event probability estimation and convex programming – p.18/29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend