Rare-event probability estimation and convex programming Z. I. - PowerPoint PPT Presentation

Rare-event probability estimation and convex programming Z. I. Botev Rare-event probability estimation and convex programming – p.1/29

Problem formulation We consider the problem of estimating rare-event probabilities of the form ℓ = P ( S ( X ) > γ ) , X = ( X 1 , . . . , X d ) , where S : R d → R and X 1 , . . . , X d are random variables with joint density f ( x ) . Almost any high-dimensional integration problem can be cast into this framework: option pricing, insurance value at risk, ABC, Ising model in physics, queuing models, counting problems in operations research and CS. Dichotomy: Frequently, one can easily simulate the rare event, but it is not clear how one can use the simulated data to estimate ℓ . Rare-event probability estimation and convex programming – p.2/29

Bird’s eye view of existing methods importance sampling: state-dependent and state-independent; no need to simulate the rare-event under the original probability law in order to estimate efficiently the probability of its occurrence. adaptive importance sampling: cross entropy method conditioning splitting: splitting method typically yields exact or approximate realizations of the rare-event With the exception of splitting, all treat the problem of simulating the rare-event and estimating the rare-event probability as essentially separate problems. Rare-event probability estimation and convex programming – p.3/29

Standard importance sampling We propose to estimate the quantity ℓ using maximum likelihood methods, where the data upon which the likelihood function depends is generated from computer simulation of the rare-event under the original probability law. To introduce the idea it is convenient to think of ℓ as a normalization constant ℓ s of the conditional density f s ( x ) = f ( x ) I { S ( x ) > γ } = w s ( x ) . ℓ s ℓ s Rare-event probability estimation and convex programming – p.4/29

The typical importance sampling scheme Let f 1 ( x ) = w 1 ( x ) /ℓ 1 be another density whose normalizing constant ℓ 1 is known and { x : f 1 ( x ) > 0 } ⊇ { x : f s ( x ) > 0 } . Then, the natural estimator is n � f ( X j ) I { S ( X j ) > γ } s = 1 iid ℓ ∗ ∼ f 1 . , X 1 , . . . , X n n f 1 ( X j ) j =1 For good performance we need the tails of f 1 to be at least as heavy as the tails of f s . Frequently, the choice of f 1 is dictated by asymptotic analysis of ℓ ( γ ) as γ ↑ ∞ . Rare-event probability estimation and convex programming – p.5/29

The typical importance sampling scheme It is well known that the zero-variance importance sampling density for estimating ℓ s is the conditional pdf f s . So we may try to use f s itself as an importance sampling density. We consider the estimator n � ℓ s = 1 w s ( X j ) iid � ∼ ¯ , X 1 , . . . , X n f, λ 1 f 1 ( X j ) + λ s w s ( X j ) / � n ℓ s j =1 � �� ≈ f s where the normalizing constant ℓ s on the right is replaced with � ℓ s , giving rise to a nonlinear equation for � ℓ s . Rare-event probability estimation and convex programming – p.6/29

Comparison of estimators Compare now the traditional importance sampling estimator with the proposed solve-the-equation-type. For the solve-the-equation-type the support and tail restrictions on f 1 are no longer necessary for good performance. The only requirement is that { x : f 1 ( x ) × f s ( x ) > 0 } � = ∅ , that is, the supports of f 1 and f s overlap. This is an example of using computer simulated data of the rare-event to obtain an M-type estimator of ℓ . Next, we try to generalize. Rare-event probability estimation and convex programming – p.7/29

Suppose we are given the sequence of densities f t ( x ) = w t ( x ) = f ( x ) H t ( x ) , t = 1 , . . . , s , ℓ t ℓ t where f is a known density, { H t } are known functions, and ℓ t are probabilities acting as normalizing constants to { w t } . We are interested in estimating ℓ k ′ = E f H k ′ ( X ) for some k ′ . We assume that for at least one f t , say f 1 , the corresponding normalizing constant ℓ 1 is known, and, without loss of generality, equal to unity. We call f 1 a reference density. Rare-event probability estimation and convex programming – p.8/29

Connectivity To proceed, suppose we are given a graph with s nodes and an edge between nodes i and j if and only if E f I { H i ( X ) > 0 } × I { H j ( X ) > 0 } > 0 . (1) We assume that there exists a path between any two nodes. We call the condition ( ?? ) on the supports of { f t } Vardi’s connectivity condition . Assume we have the iid sample iid ∼ f t ( x ) , X t, 1 , . . . , X t,n t t = 1 , . . . , s . Rare-event probability estimation and convex programming – p.9/29

Connectivity B A C D Rare-event probability estimation and convex programming – p.10/29

Mixture model Conceptually, same as sampling n = n 1 + · · · + n s variables with stratification from mixture s s � � f ( x ) = 1 def ¯ n t f t ( x ) = λ t f t ( x ) , λ t = n t /n . n t =1 t =1 Let the pooled sample be denoted via X 1 , . . . , X n , where the first n 1 samples are outcomes from f 1 , the next n 2 are samples from f 2 , and so on. Define the vector of parameters z = ( z 1 , . . . , z n ) = ( − ln(1 /λ 1 ) , − ln( ℓ 2 /λ 2 ) , . . . , − ln( ℓ s /λ s )) . Rare-event probability estimation and convex programming – p.11/29

Empirical Likelihood Estimator Now consider the likelihood of the observed data X t, 1 , . . . , X t,n t , t = 1 , . . . , s as a function of z : s n k s n k � � � � w k ( X k,j ) f k ( X k,j ) = λ k e − z k j =1 j =1 k =1 k =1 s n k s n k � � � � w k ( X k,j ) ¯ f ( X k,j ) × = f ( X k,j ) . λ k e − z k ¯ j =1 j =1 k =1 k =1 Rare-event probability estimation and convex programming – p.12/29

Empirical Likelihood Estimator The last yields the partial log-likelihood as a function of z : s n k � � w k ( X k,j ) ln f ( X k,j ) = λ k e − z k ¯ k =1 j =1 s n k � � ln( w k ( X k,j ) /λ k ) + z k − ln( ¯ = f ( X k,j )) j =1 k =1 n s � � ln( ¯ = const . − f ( X j )) + n k z k j =1 k =1 � s � n s � � � w k ( X j ) e z k = const . − ln + n λ k z k . j =1 k =1 k =1 Rare-event probability estimation and convex programming – p.13/29

Empirical Likelihood Estimator Under the connectivity condition of Vardi and iid assumption on the sample, it can be shown that the maximum of the partial log-likelihood is the same as the maximum of the complete likelihood. In other words, the unique nonparametric maximum likelihood estimate of z (and hence of ℓ ) solves the almost surely convex optimization program (with z 1 = ln λ 1 fixed) � � z = argmin D ( z ) z � s � n s � � � (2) D ( z ) def � w k ( X j ) e z k − = ln n k z k , j =1 k =1 k =1 Rare-event probability estimation and convex programming – p.14/29

Almost surely?? Define the matrix G i,j = E f I { H i ( X ) > 0 } × I { H j ( X ) > 0 } and its empirical counterpart � G i,j = 1 � I { H i ( X k ) > 0 } × I { H j ( X k ) > 0 } n k Under the connectivity condition, matrix G is irreducible, that is if for any pair ( i, j ) , we have � G k i,j > 0 for some k = 1 , 2 , . . . , , and since � G → G almost surely, with probability one the function � D ( z ) is a strictly convex and the maximum likelihood program has a unique solution. Rare-event probability estimation and convex programming – p.15/29

M-estimator & Method of Moments To execute the optimization, we compute the ( s − 1) × 1 gradient ∇ � D ( z ) , which is a vector with components: n � w t ( X j ) e z t [ ∇ � � s k =1 w k ( X j ) e z k − n t , D ] t ( z ) = t = 2 , . . . , s . j =1 Using the prior information that ℓ 1 = 1 or z 1 = ln( λ 1 ) , these estimating equations are equivalent to solving the s − 1 dimensional system for the unknown z 2 , . . . , z s : n � w t ( X j ) e � z t 1 � s z k = λ t , t = 2 , . . . , s . n k =1 w k ( X j ) e � j =1 Rare-event probability estimation and convex programming – p.16/29

Moment-matching master equation The last bit is equivalent to: n � A t,j � ℓ t = , A t,j = H t ( X j ) , t = 2 , . . . , s . � s k =1 A k,j n k / � ℓ k j =1 (3) It is sometimes easier to directly minimize � D , instead of solving the nonlinear system. The system can be solved using Jacobi/Gauss-Seidel type iteration. The process is similar to finding eigenvalues via power iteration. Rare-event probability estimation and convex programming – p.17/29

Jacobi/Gauss-Seidel iteration algorithm Require: Matrix A and initial starting point ℓ = ( ℓ 1 , . . . , ℓ s ) = (1 , . . . , 1) Set ε = ∞ and ℓ ∗ ← ℓ while ε > 10 − 10 do for i = 2 , . . . , s do n � A i,j � s ℓ i ← k =1 A k,j n k /ℓ ∗ k j =1 | ℓ i − ℓ ∗ i | ε ← max i ℓ i Set ℓ ∗ ← ℓ return The vector of estimated probabilities � ℓ ← ℓ . Rare-event probability estimation and convex programming – p.18/29

Rare-event probability estimation and convex programming Z. I. - PowerPoint PPT Presentation

Rare-event probability estimation and convex programming Z. I. Botev Rare-event probability estimation and convex programming p.1/29 Problem formulation We consider the problem of estimating rare-event probabilities of the form = P ( S

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

ISBT WORKING PARTY ON RARE DONORS Israel Have a national Rare Donor Program: Yes Number of

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

Estimation of a rare event probability on a complex system modeled with a kriging algorithm.

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

CME / CE Rare Disease Clinical Trials Module: Rare Pulmonology Clinical Trials Timothy Craig,

Rare Rare Rare-earth Rare-earth earth-based half earth-based half based half-Heusler based

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

14. Convex programming Convex sets and functions Convex programs Hierarchy of

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

Partial Wave Analysis using Graphics Cards Niklaus Berger Hadron 2011, Mnchen IHEP Beijing

Modeling of Survival Data Now we will explore the relationship between survival and explanatory

Predicting Speech Quality Ivan Halim Parmonangan 1 , Hiroki Tanaka 1,2 , Sakriani Sakti 1,2 ,

Least squares optimal identification of LTI dynamical systems Bart De Moor KU Leuven Dept.EE:

Architecture-Based Software Reliability Estimation: Problem Space, Challenges and Strategies

Robust Data Processing in What If We Only Have . . . the Presence of Uncertainty What If We Only

Non-parametric Methods Selim Aksoy Bilkent University Department of Computer Engineering

Non-parametric Methods Selim Aksoy Bilkent University Department of Computer Engineering