Probabilistic Optimal Estimation and Filtering Least Squares and - - PowerPoint PPT Presentation

probabilistic optimal estimation and filtering
SMART_READER_LITE
LIVE PREVIEW

Probabilistic Optimal Estimation and Filtering Least Squares and - - PowerPoint PPT Presentation

Probabilistic Optimal Estimation and Filtering Least Squares and Randomized Algorithms Fabrizio Dabbene 1 Mario Sznaier 2 Roberto Tempo 1 1 CNR - IEIIT Politecnico di Torino 2 Northeastern University Boston Workshop on Uncertain Dynamical


slide-1
SLIDE 1

Probabilistic Optimal Estimation and Filtering

Least Squares and Randomized Algorithms Fabrizio Dabbene1 Mario Sznaier2 Roberto Tempo1

1CNR - IEIIT

Politecnico di Torino

2Northeastern University

Boston

Workshop on Uncertain Dynamical Systems, Udine, Italy, 23-26 August 2011

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 1 / 48

slide-2
SLIDE 2

Motivation: Identification for Robust Control

The classical approach to system identification is based on statistical assumptions about the measurement error, and provides estimates that have stochastic nature Worst-case identification, on the other hand, only assumes the knowledge of deterministic error bounds, and provides guaranteed estimates, thus being in principle better suited for robust control design However, a main limitation of such deterministic bounds lies in the fact that they often turn out being overly conservative, thus leading to estimates of limited use

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 2 / 48

slide-3
SLIDE 3

Motivation: Identification for Robust Control

A re-approahment

We propose a re-approachement of the two paradigms: stochastic and worst-case, introducing probabilistically optimal estimates The main idea is to “exclude" sets of measure at most ǫ (accuracy) from the set of deterministic estimates We are decreasing the so-called worst-case radius of information at the expense of a probabilistic “risk." We compute a trade-off curve which shows how the radius of information decreases as a function of the accuracy

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 3 / 48

slide-4
SLIDE 4

The IBC setting for systems ID and filtering

Estimation problem: “Given an unknown element x, find an estimate of the function S(x), based on a priori information K and on measurements of the function I(x) corrupted by additive noise q.”

Ingredients (sets)

A problem element set X, with prior information K ⊆ X A measurement space Y A solution space Z

Ingredients (operators)

An information operator I : X → Y Additive uncertainty/noise y = Ix + q A solution operator S : X → Z

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 4 / 48

slide-5
SLIDE 5

The IBC setting for systems ID and filtering

Estimation problem: “Given an unknown element x, find an estimate of the function S(x), based on a priori information K and on measurements of the function I(x) corrupted by additive noise q.”

Ingredients (sets)

A problem element set X, with prior information K ⊆ X A measurement space Y A solution space Z

Ingredients (operators)

An information operator I : X → Y Additive uncertainty/noise y = Ix + q A solution operator S : X → Z

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 4 / 48

slide-6
SLIDE 6

IBC Setting for System ID

Estimation algorithm

Estimation algorithm

An algorithm A is a mapping (in general nonlinear) from Y into Z, i.e. A : Y → Z An algorithm provides an approximation A(y) of Sx using the available information y ∈ Y of x ∈ K The outcome of such an algorithm is called an estimator and the notation ˆ z = A(y) is used

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 5 / 48

slide-7
SLIDE 7

The IBC setting for systems ID and filtering

Illustration of the considered framework

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 6 / 48

slide-8
SLIDE 8

The setup of this talk

The IBC setting for systems ID and filtering

Problem element set X is Rn The information operator I : X → Y is linear The uncertainty q ∈ Q ⊆ Rm, where Q is a bounding set The solution set Z is Rs and the solution operator S : X → Z is linear

Assumption (Sufficient information)

We assume that the information operator I is a one-to-one mapping, i.e. m ≥ n and rank I = n We assume for the sake of simplicity that the three sets X, Y, Z are equipped by the same ℓp norm.

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 7 / 48

slide-9
SLIDE 9

Example

System parameter identification

Parameter identification problem which has the objective to identifying a linear system from noisy measurements. The problem elements are the input-output pairs ξ = ξ(t, x) of a dynamic system, parametrized by some unknown parameter vector x ∈ K ⊆ X and with given basis functions ϕi(t) ξ(t, x) =

n

  • i=1

xiϕi(t) = ΦT(t)x Suppose then that m noisy measurements of ξ(t, x) are available for t1 < t2 < · · · < tm, y = Ix + q = [Φ(t1) · · · Φ(tm)]T x + q. (1) The solution operator is given by the identity, Sx = x and Z ≡ X. In this context, one usually assumes unknown but bounded errors |qi| ≤ R, i = 1, . . . , m, that is Q = B∞(R)

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 8 / 48

slide-10
SLIDE 10

The consistency set I−1(y)

A key role is played by following set, which represents the set of all problem elements x ∈ K ⊆ X compatible with (i.e. not invalidated by) the information Ix, the uncertainty q and the bounding set Q

Consistency set I−1(y)

For given y ∈ Y, define I−1(y) . = {x ∈ K | there exists q ∈ Q : y = Ix + q} (2) Under the sufficient information assumption, the set I−1(y) is bounded. For instance, in the previous example we have I−1(y) =

  • x ∈ K : y − [Φ(t1)

· · · Φ(tm)]T x∞ ≤ R

  • In system identification, I−1(y) is sometimes referred to as parameter

feasible set.

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 9 / 48

slide-11
SLIDE 11

The IBC setting for systems ID and filtering

Our setup for this talk

Illustration of the considered framework

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 10 / 48

slide-12
SLIDE 12

Worst-Case Setting

...

...

The Worst Case Setting

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 11 / 48

slide-13
SLIDE 13

Worst-Case Setting

Errors and optimal algorithms

Given perturbed information y ∈ Y, the worst-case error is defined as r wc(A, y) . = max

x∈I−1(y) Sx − A(y)p.

This error is based on the available information y ∈ Y about x ∈ K, and it measures the approximation error between Sx and A(y) An algorithm Awc

  • is called worst-case optimal if it minimizes the error

r wc(A, y) for any y ∈ Y r wc

  • (y) .

= r wc(Awc

  • , y) .

= inf

A r wc(A, y)

A worst-case optimal estimator is given by ˆ zwc

  • = Awc
  • (y)

The minimal error r wc

  • (y) is called the (local) worst-case radius of

information

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 12 / 48

slide-14
SLIDE 14

Chevbychev center and central algorithms

The Chebychev center zc(H) of a set H ⊂ Z and its “radius” rc(H) are defined as max

h∈H h − zc(H) .

= inf

z∈Z max h∈H h − z .

= rc(H) Optimal algorithms map data y into the Chebychev center of the set SI−1(y), i.e. zc(SI−1(y)) = ˆ zwc

  • For this reason they are also called central algorithms

For given set H, the ℓp-Chebychev center xc(H) and radius rc(H) are the center and radius of the smallest ℓp ball enclosing H. In general zc(H) may not be unique and not necessarily it belongs to H. if H is centrally symmetric then the

  • rigin is a Chebychev center of H

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 13 / 48

slide-15
SLIDE 15

Chevbychev center and central algorithms

The Chebychev center zc(H) of a set H ⊂ Z and its “radius” rc(H) are defined as max

h∈H h − zc(H) .

= inf

z∈Z max h∈H h − z .

= rc(H) Optimal algorithms map data y into the Chebychev center of the set SI−1(y), i.e. zc(SI−1(y)) = ˆ zwc

  • For this reason they are also called central algorithms

For given set H, the ℓp-Chebychev center xc(H) and radius rc(H) are the center and radius of the smallest ℓp ball enclosing H. In general zc(H) may not be unique and not necessarily it belongs to H. if H is centrally symmetric then the

  • rigin is a Chebychev center of H

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 13 / 48

slide-16
SLIDE 16

Related literature

The computation of the worst-case radius of information r wc

  • (y) and of the

derivation of optimal algorithms Awc

  • have been the focal point of a vaste

literature in a system identification setting If the norm is ℓ2 and K ≡ X, then the linear optimal estimator is the least squares algorithm Als(y) = Sxls, with Ixls − y2 . = min

x∈X Ix − y2

In this case, Als(y) is the Chebychev center of the ellipsoid §I−1(y) For ℓ∞ norms, an optimal algorithm and the radius of information can be computed solving 2n linear programs, corresponding to computing the center of the tightest hyperrectangle containing the polytope SI−1(y) Spline algorithms have also been introduced, defined as follows Asp(y) = Sxsp(y) where Ixsp(y) − y∞ . = min

x∈X Ix − y∞.

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 14 / 48

slide-17
SLIDE 17

Illustrating example

Consider the problem of estimating the parameters of a MA(3) model ξ(k) = x1u(k) + x2u(k − 1) + x3u(k − 2) + q(k), k = 1, . . . , 50 where u(k) is known and q is a uniformly distributed noise with |q(k)| ≤ 0.5

WC optimal (central) algorithm: blu , LS estimate: yellow ◦, spline algorithm: red ⋄

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 15 / 48

slide-18
SLIDE 18

Worst-Case Setting

Pro and contra

The worst-case setting is ideal for robust control, since it provides explicit bound on the parameter uncertainty Unfortunately, in many problem instances, these bound may be very large Probabilistic approach: In recent years, a parallel approach to robust control has emerged, aimed at guranteeing robust performance for most

  • f the parameters values

The idea at the basis of this talk is: Why not apply this approach directly to the identification problem, thus providing bounds guaranteed for most

  • f the cases?

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 16 / 48

slide-19
SLIDE 19

Probabilistic setting

...

...

A Probabilistic Setting

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 17 / 48

slide-20
SLIDE 20

Probabilistic setting

Random uncertainty

We still assume that q ∈ Q, but we also assume to have information on the distribution of q (note that this info is usually available) Objective: To derive optimal algorithms and to compute the related errors for when the uncertainty q is random... In this setting, the error of an algorithm is measured in a worst-case sense, but we disregard a set of measure at most ǫ ∈ (0, 1) from the consistency set I−1(y)

Assumption (Random measurement uncertainty)

We assume that the measurement noise q is a real random vector with given probability density pQ(q) and support set Q ⊆ Rm. Denote by PQ(q) the probability distribution of q, and by µQ the probability measure generated by pQ(q) over the set Q.

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 18 / 48

slide-21
SLIDE 21

Probabilistic setting

Random uncertainty

We still assume that q ∈ Q, but we also assume to have information on the distribution of q (note that this info is usually available) Objective: To derive optimal algorithms and to compute the related errors for when the uncertainty q is random... In this setting, the error of an algorithm is measured in a worst-case sense, but we disregard a set of measure at most ǫ ∈ (0, 1) from the consistency set I−1(y)

Assumption (Random measurement uncertainty)

We assume that the measurement noise q is a real random vector with given probability density pQ(q) and support set Q ⊆ Rm. Denote by PQ(q) the probability distribution of q, and by µQ the probability measure generated by pQ(q) over the set Q.

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 18 / 48

slide-22
SLIDE 22

Induced measure over I−1(y) and SI−1(y)

Probabilistic setting

The probability measure over the set Q induces, by means of equ. (1), a probability measure ˜ µI−

1 over the set I−1(y)

For any measurable set B ⊆ X, we can measure it “through” the probability measure µQ as follows: ˜ µI

− 1(B) = µQ(q ∈ Q | ∃x ∈ B ∩ I−1(y) : Ix + q = y)

This conditional measure is such that points outside the consistency set I−1(y) have measure zero, and ˜ µI

− 1

  • I−1(y)
  • = 1, that is this induced

measure is concentrated over I−1(y) We denote by ˜ PI

− 1 the induced probability distribution ˜

PI

− 1 and by ˜

pI

− 1 the

density, both having support over I−1(y)

The measure ˜ µI−

1 is mapped into SI−1(y) to a measure ˜

µSI−

1, and a pdf

˜ pSI−

1 and cdf ˜

PSI−

1 Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 19 / 48

slide-23
SLIDE 23

Probabilistic error and optimal algorithms

Probabilistic setting

Given perturbed information y ∈ Y and accuracy ǫ ∈ (0, 1), we define the probabilistic error (to level ǫ) as r pr(A, y, ǫ) . = inf

X0 : ˜ µI

− 1(X0)≤ǫ

max

x∈{I−1(y)\X0} Sx − A(y)p

(3) Clearly, r pr(A, y, ǫ) ≤ r wc(A, y) for any algorithm A, data y ∈ Y and ǫ ∈ (0, 1), which implies a reduction of the approximation error in a probabilistic setting An algorithm Apr

  • is called probabilistic optimal (to level ǫ) if it minimizes

the radius of information r pr(A, y, ǫ) for any y ∈ Y and ǫ ∈ (0, 1) r pr

  • (y, ǫ) .

= r pr(Apr

  • , y, ǫ) = inf

A r pr(A, y, ǫ)

The probabilistic optimal estimator is given by ˆ zpr

  • (ǫ) .

= Apr

  • (y, ǫ)

The minimal error r pr

  • (y, ǫ) is called the probabilistic radius of information

(to level ǫ)

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 20 / 48

slide-24
SLIDE 24

Probabilistic error and optimal algorithms

Probabilistic setting

Given perturbed information y ∈ Y and accuracy ǫ ∈ (0, 1), we define the probabilistic error (to level ǫ) as r pr(A, y, ǫ) . = inf

X0 : ˜ µI

− 1(X0)≤ǫ

max

x∈{I−1(y)\X0} Sx − A(y)p

(3) Clearly, r pr(A, y, ǫ) ≤ r wc(A, y) for any algorithm A, data y ∈ Y and ǫ ∈ (0, 1), which implies a reduction of the approximation error in a probabilistic setting An algorithm Apr

  • is called probabilistic optimal (to level ǫ) if it minimizes

the radius of information r pr(A, y, ǫ) for any y ∈ Y and ǫ ∈ (0, 1) r pr

  • (y, ǫ) .

= r pr(Apr

  • , y, ǫ) = inf

A r pr(A, y, ǫ)

The probabilistic optimal estimator is given by ˆ zpr

  • (ǫ) .

= Apr

  • (y, ǫ)

The minimal error r pr

  • (y, ǫ) is called the probabilistic radius of information

(to level ǫ)

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 20 / 48

slide-25
SLIDE 25

Problem definition

Probabilistic setting

Problem

Computation of r pr

  • (y, ǫ) and the derivation of probabilistic optimal algorithms

Apr

  • for different probability distributions PQ and support sets Q.

In particular, we are interested in the cases: µQ is Gaussian µQ is uniform on the ℓ2 norm ball B2(R) and p = 2 µQ is uniform on the ℓ∞ norm ball B∞(R) and p = ∞

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 21 / 48

slide-26
SLIDE 26

Chance constraint formulation

Probabilistic setting

We introduce the violation probability for given A and radius r v(r, A) . = ˜ µI−

1

  • x ∈ I−1(y) | Sx − A(y) > r
  • Equation (3) can be reformulated as a chance-constrained optimization

problem r pr(A, y, ǫ) = min {r | v(r, A) ≤ ǫ} A probabilistic optimal algorithm can be computed as r pr

  • (y, ǫ)

= min

  • r | inf

A v(r, A) ≤ ǫ

  • =

min {r | vo(r) ≤ ǫ} where we the “optimal” violation probability for given radius r is vo(r) . = inf

A ˜

µI−

1

  • x ∈ I−1(y) : Sx − A(y)p > r
  • Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern)

Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 22 / 48

slide-27
SLIDE 27

Chance constraint formulation

Related literature

Notably, the probabilistic setup and its chance-constraint formulation have already been introduced in the book of Traub In [Traub et al:88] the connection with the average error setting, which has the objective to minimize the expected value of the estimation error E [g(Sx − A(y))], is outlined. To explain this connection, we see that for any r > 0 v(r, A) =

  • I−1(y)

Ir(Sx − A(y))˜ pX(x)dx where Ir(Sx − A(y)) is the indicator function of SI−1(y). Hence, the probabilistic estimator is equivalent to the average estimator that minimizes E [Ir(Sx − A(y))]

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 23 / 48

slide-28
SLIDE 28

Random Uncertainty Normally Distributed

Optimality of least squares

We consider the case when the uncertainty q is normally distributed, i.e. Q ≡ Rm and q ∼ N¯

q,W

In this case, the probabilistic optimal algorithm Apr

  • is the least squares

algorithm Als

Theorem

Letting K = Rn, Q = Rm, q ∼ N¯

q,W and W = HT H. Then,

Apr

  • (y) = Als(y) = S(IT HT HI)−1IT HT y

for any y ∈ Y and ǫ ∈ (0, 1). Moreover, the probabilistic radius r pr

  • (ǫ) .

= r pr(Apr

  • , y, ǫ) does not depend on y.

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 24 / 48

slide-29
SLIDE 29

Probabilistic radius for Gaussian measures

rpr

  • (ǫ) =???

We are not aware of explicit ways to compute the radius r pr

  • (ǫ)

In Traub the following bound on r pr

  • (ǫ) is provided

r pr

  • (ǫ) ≤
  • 2 ln 5

ǫ r avg

  • where r avg
  • is the optimal average radius of information, which can be

computed in closed form as a function of covariance matrix W Also, the probabilistic radius r pr

  • (ǫ) seems to have close relation with the

work of Campi and Weyer, where they derive non-asymptotic (i.e. based

  • n a finite number of measure, such as in our case) confidence ellipsoids

for least-squares estimates

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 25 / 48

slide-30
SLIDE 30

Random Uncertainty Uniformly Distributed

The induced measure is still uniform

We study the case when q is uniformly distributed over the set Q, i.e. q ∼ UQ and µQ ≡ µU(

Q )

We assume that Q is a compact set. In particular, we are interested in the case when Q is the ℓp norm ball Bp(R) Question: If µQ is the uniform measure over Q, what is the induced measure ˜ µI−

1 over the set I−1(y)?

Theorem

Let Q be a compact set, if q ∼ U(Q), then for any y ∈ Y ˜ µI−

1 ≡ µU(

I−1(y) )

Moreover, the measure ˜ µSI−

1 over Z is log-concave Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 26 / 48

slide-31
SLIDE 31

Random Uncertainty Uniformly Distributed

The induced measure is still uniform

We study the case when q is uniformly distributed over the set Q, i.e. q ∼ UQ and µQ ≡ µU(

Q )

We assume that Q is a compact set. In particular, we are interested in the case when Q is the ℓp norm ball Bp(R) Question: If µQ is the uniform measure over Q, what is the induced measure ˜ µI−

1 over the set I−1(y)?

Theorem

Let Q be a compact set, if q ∼ U(Q), then for any y ∈ Y ˜ µI−

1 ≡ µU(

I−1(y) )

Moreover, the measure ˜ µSI−

1 over Z is log-concave Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 26 / 48

slide-32
SLIDE 32

Weighted ℓ2 ball

Uniform case – Optimality of least squares

The random uncertainty q is uniformly distributed in a weighted ℓ2 ball of radius ρ, and the set SI−1(y) is hence an ellipsoid In this case, the center of SI−1(y) is a probabilistic optimal estimator, and coincides with the least squares estimator Als

Theorem

Letting K = Rn and µq(Q) = Uq(Q) where Q =

  • q : qT Wq ≤ ρ2

and W = HT H. Then, Apr

  • (y) = Als(y) = S(IT HT HI)−1IT HT y

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 27 / 48

slide-33
SLIDE 33

ℓ∞-ball, the case of S = In

Random Uncertainty Uniformly Distributed

We now concentrate on the very important case when Q = B∞(R) In this case, LS is not optimal anymore, and a probabilistic optimal estimator needs be derived In order to simplify our next developments, we start considering the case when S is the identity operator

Assumption (Parameter estimation problems)

We assume that S = In This corresponds to the situation when one is interested in parameter estimation The assumption can be relaxed as S being square, S ∈ Rn,n We recall that in this case SI−1(y) = I−1(y) is a polytope equipped by a uniform measure

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 28 / 48

slide-34
SLIDE 34

Computation of violation probability

ℓ∞-ball, the case of S = In

Theorem (Computation of violation probability)

Let Q be a bounded convex set with pQ = UQ, then for given r > 0 (i) The violation probability vo(r) can be computed as the solution of vo(r) = inf

xc vol [X0(xc, r)] /VX

(4) where X0(xc, r) . = I−1(y) \ Bp(xc, r) and VX = vol [X] (ii) The optimization problem (4) is a quasi-convex problem in xc, over the convex set Ω . =

  • x | I−1(y) ∩ Bp(xc, r) = ∅
  • (iii) The function vo(r) is a continuous non-increasing function of r

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 29 / 48

slide-35
SLIDE 35

Computation of violation probability

Schetch of proof

Reminding that µQ is the uniform measure over Q, from the definition of vo(r) we write vo(r) = 1 VX inf

A vol

  • x ∈ I−1(y) : x − A(y)p > r
  • =

1 VX inf

xc vol

  • x ∈ I−1(y) : x − xcp > r
  • =

1 VX inf

xc vol

  • x ∈ I−1(y) | x ∈ Bp(xc, r)
  • =

1 VX inf

xc vol

  • I−1(y) \ Bp(xc, r)
  • = 1

VX inf

xc vol [X0(xc, r)]

from which it follows point (i)

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 30 / 48

slide-36
SLIDE 36

Computation of violation probability

Schetch of proof

To prove quasi-convexity (point (ii)), note that (4) rewrites vo(r) = 1 − 1 VX sup

x

vol [D(x, r)] (5) where D(x, r) = I−1(y) ∩ Bp(x, r) This is the problem of maximizing the volume of the intersection of two convex sets, one of those can be translated by x This problem can be shown to be quasi-concave in x over the set Ω where the intersection D(x, r) is non-empty More specifically, in [Zalgaller:01] it is shown that the function φ(x) . = (vol [D(x, r)])1/n is concave over Ω (this is a direct consequence of Brunn-Minkovski ineq.) Hence, it follows immediately that the function φ(x)n is quasi-concave, since φ(x) is a nonnegative function for x ∈ Ω and the n-th power function yn is an increasing function for y ≥ 0

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 31 / 48

slide-37
SLIDE 37

Computation of probabilistic optimal estimate

Corollary

For given r > 0, a global minimizer for problem (4) can be computed as the solution of the following maximization problem sup

x∈Ω

φ(x), with φ(x) = (vol [D(x, r)])1/n (6) Moreover, problem (6) is concave over the convex set Ω, and thus any local maximizer is also global. For fixed r > 0, let xo(r) one such maximizers, than vo(r) = 1 − vol [D(xo(r), r)] VX and the probabilistic radius of information (to level ǫ) can be found as the (unique) solution of the following one-dimensional “inversion” problem r pr

  • (y, ǫ) = min {r | vo(r) ≤ ǫ}

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 32 / 48

slide-38
SLIDE 38

Probabilistic radius and probabilistic optimal estimator

For given ǫ > 0, the inversion problem can be easily solved by bisection, thus providing a way of computing the optimal probabilistic radius of information r pr

  • (y, ǫ)

The corresponding optimal estimate ˆ xpr

  • (y, ǫ) can be directly computed as

a minimizer xo(r), for r = r pr

  • (y, ǫ), of the following problem

Problem (Maximum intersection)

(P-max-int) : sup

xc

vol [D(xc, r)]

1 n

If the set I−1(y) is centrally symmetric, then it is easy to see from symmetry that the intersection with a ball is maximized when the sets are

  • concentric. Hence, the optimal estimator for the probabilistic case

coincides with that of the worst-case setting

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 33 / 48

slide-39
SLIDE 39

Solving (P-max-int) : Volume oracle and

  • racle-polynomial-time algorithm

We first consider the case when one assumes to have a volume oracle that, given x, returns the volume of D(x, r) (and a sub-gradient of the function φ(x)) Problem (P-max-int) has been considered in [FukUno:07], where they derive a strongly oracle-polynomial-time algorithm for polytopic sets Indeed, the fact the the problem is NP hard does not make the intersection maximization problem worthless to investigate, since one can compute the volume of a polytope quickly for considerably complex polytopes in modest (say up to n = 10) dimensions Hence, for small dimensional n, one can use the method proposed by [FukUno:07]. This method has also been used in our examples to compare with the other techniques proposed further on

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 34 / 48

slide-40
SLIDE 40

Solving (P-max-int) : Volume oracle and

  • racle-polynomial-time algorithm

0.2 0.4 0.6 0.8 1 1.2 1.4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Function vo(r) for a problem with n = 3 and m = 13, and corresponding sequence of

  • ptimal boxes maximizing the intersection with the polytope.

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 35 / 48

slide-41
SLIDE 41

Solving (P-max-int) : Example

Computation vo(r) for a problem with six parameters and m = 300

0.05 0.06 0.07 0.08 0.09 0.1 0.05 0.1 0.15 0.2 0.25 0.3 0.35

For ǫ = 0.05, the probabilistic radius is r pr

  • (y, ǫ) = 0.068, almost half of the

worst case radius which is r pr

  • (y) = 0.105

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 36 / 48

slide-42
SLIDE 42

Solving (P-max-int) : Randomized algorithms and stochastic optimization

We do not have a volume oracle in general! We use a randomized approach.

Algorithm (Randomized approximation of φ(x))

1

Generate N points in the ball Bp(x, r)

2

Count how many points Ng belong to the set I−1(y) (this can be done in polynomial-time for polytopes or ellipsoids)

3

An approximation of the function φ(x) is immediately obtained as

  • φ(x) ≈

Ng N vol [Bp(x, r))] 1/n Note that the expected value wrt samples E( φ(x)) coincides with φ(x).

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 37 / 48

slide-43
SLIDE 43

Solving (P-max-int) : Stochastic optimization

SPSA

We consider the following stochastic problem supxE[ˆ φ(x)], E( φ(x)) = φ(x) For its solution, we can make recourse to classical stochastic optimization algorithms (see Kushner & Yin), as for instance FPSA, SPSA, ...

Algorithm (Simultaneous Perturbations Stoch. Approx. (SPSA))

Consider a starting point θ0, and run the following algorithm θk+1 = θk + αk[∆−1

k ]

  • φ+ −

φ− 2ck where [∆k] ∈ {0, 1}n is a Bernoulli sequence, [∆−1

k ] .

=

  • ∆−1

k,1 · · · ∆−1 k,n

T and

  • φ± .

= φ(θk ± ck∆k) + η±

k

where η±

k is random noise

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 38 / 48

slide-44
SLIDE 44

Solving (P-max-int) : Stochastic optimization

Convergence of SPSA

Theorem

Assume φ(x) is concave and nondifferentiable (as in our case), and let ∂φ(x) a subgradient of φ. If ak → 0,

k ak = ∞ and ck → 0, k ck = ∞, then under

mild conditions, θk converges to a point such that 0 ∈ −∂φ(θ) Remarks: (+) Simulations show that the method works for quite large n, m (-) At present, we have convergence only for k → ∞. This is contrary to our philosophy, that aim at results valid for finite k

  • > Working direction: derive (probabilistic) bounds on θk and

φ(θk)

  • > Is it possible to find conservative but guaranteed approaches? Idea:

instead of maximizing the volume of the intersection, we maximize an appropriate lower-bound of this volume.

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 39 / 48

slide-45
SLIDE 45

Solving (P-max-int) : An SDP relaxation

Ellipsoidal approximation

Idea: to construct, for fixed r, the maximal volume ellipsoid contained in the intersection D(xc, r) i.e. to solve the problem sup

xc,xcE,PE

vol [E(xE, PE)] subject to E(xE, PE) ⊆ D(xc, r) where E(xE, PE) . = {x ∈ Rn | x = xE + PEz, z2 ≤ 1} The above problem is still non-concave, but a concave formulation is possible.

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 40 / 48

slide-46
SLIDE 46

Solving (P-max-int) : An SDP relaxation

Ellipsoidal approximation

Theorem

Let r > 0 be fixed. A global optimizer for problem (7) can be computed solving inf

xc,xcE,PE

− log det PE subject to

  • (R + yi − Ii)xEIn

PEIT

i

⋆ (R + yi − Ii)xE

  • 0,

i = 1, . . . , m

  • (R − yi + Ii)xEIn

−PEIT

i

⋆ (R − yi + Ii)xE

  • 0,

i = 1, . . . , m (r + eT

i (xc − xE))In

PEei ⋆ r + eT

i (xc − xE)

  • 0,

i = 1, . . . , n (r − eT

i (xc − xE))In

−PEei ⋆ r − eT

i (xc − xE)

  • 0,

i = 1, . . . , n Let xsdp

c (r) a solution xc the above SDP

, and define v sdp

  • (r) .

= vol

  • D(xsdp

c (r), r)

  • . Then, v sdp
  • (r) ≥ vo(r)

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 41 / 48

slide-47
SLIDE 47

Solving (P-max-int) : An SDP relaxation

Ellipsoidal approximation

This SDP relaxation thus leads to a easily computable suboptimal violation function v sdp

  • (r).

−1 −0.5 0.5 1 1.5 2 2.5 3 −0.5 0.5 1 1.5 2 2.5 3

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Ellipsoid-based SDP relaxation v sdp

  • (r)

(red) and optimal one v sdp

  • (r) (blu) for an

example with m = 200 and n = 4. The convex relaxation (red), is always above the optimal one, as expected.

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 42 / 48

slide-48
SLIDE 48

Recovering tight bounds

From our developements, it follows that the probabilist radius of information r pr

  • (y, ǫ), and the optimal estimator ˆ

xpr

  • (y, ǫ) can be computed

as the solution of a convex optimization problem These quantities are to be interpreted as the radius and center of a ℓ∞ ball guaranteed to contain 1 − ǫ of the total volume of I−1(y)

Theorem (Tight bound: ǫ-enclosing orthotope)

Let ˆ xpr

  • (y, ǫ) be a probabilistic optimal estimator guaranteeing a probabilistic

radius of information of r pr

  • (y, ǫ). Then, tight bounds on the parameters h−

i ,h+ i ,

i = 1, . . . , n, can be computed as the solution of the following 2n linear programs: h−

i

  • h+

i

  • =

inf (sup) xi i = 1, . . . , n subject to |x − ˆ xpr

  • (y, ǫ)| ≤ r pr
  • (y, ǫ)

|Iˆ xpr

  • (y, ǫ) − y| ≤ R,

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 43 / 48

slide-49
SLIDE 49

The case of S = In

When S is not the identity, our approach cannot be applied directly, since the measure on SI−1(y) is not uniform anymore However, this measure is log-concave, and hence a results similar to the

  • ne proved in the previous section can still be proved

Theorem

Let ˜ µSI−

1 the measure induced by µQ over Z, which is logconcave. Then

φ(z) = ˜ µSI−

1(DS(z))1/n

where DS(z) = SI−1(y) ∩ Bp(z, r) (7) is still a concave function. This results shows that our problem is still well posed

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 44 / 48

slide-50
SLIDE 50

The case of S = In

However, it is hard to compute the measure ˜ µSI−

1(D(x, r)), and even a

direct stochastic approximation approach would require generating samples according to this measure A solution is to go back to space X, and maximize the volume of the intersection B−1

S (z) ∩ I−1(y)

where B−1

S (z) .

= {x ∈ X | Sx ∈ Bp(z, r)} is the back-image of the ball Bp(z, r) through S Simple geometric reasoning show that B−1

S (z) is a cylinder, thus the

computation of DS(z) can be performed using the same techniques discussed in the previous sections

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 45 / 48

slide-51
SLIDE 51

Future work and extensions

A different approach is to approximately solve the chance-constraint problem using the discarded-constraints scenario approach of Calafiore Campi Generate N vectors x(i) uniformly distributed in SI−1(y) Construct the randomized optimization problem ˆ ro(N, L) = minzc Sx(i) − zc ≤ r for all i = 1, . . . , N but D discarded ones For given ǫ1 < ǫ, one obtains Prob {r pr

  • (ǫ, y) ≤ ˆ

ro(N, L) ≤ r pr

  • (ǫ1, y)} ≥ 1 − δ(N, L)

Problems: i) not easy to sample in I−1(y), ii) not easy to construct an optimal discard procedure

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 46 / 48

slide-52
SLIDE 52

Future work and extensions

Reduced complexity estimates In various work, Garulli et al. consider the conditional estimation problem. In this case, they seek an estimate of x using a lower-dimensional representation, described as a ℓ dimensional linear manifold, with ℓ < n, i.e. ˆ x ∈ M =

  • z ∈ X | x = xo + Mz, z ∈ Rℓ

. That is, they consider the class of algorithms AM that map Y → M ⊂ X. Then, an optimal conditional central estimate is given by the conditional Chebychev center xM

  • = arg inf

xc∈M

sup

x∈I−1(y)

x − xcp. This amounts at finding the point xc lying on the manifold M such that its distance from the farthest point of I−1(y) is minimized. Nonlinear systems Extensions to the case of nonlinear operators is under study.

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 47 / 48

slide-53
SLIDE 53

The end

...for now

...

THANK YOU!

Dabbene, Sznaier, Tempo (CNR-IEIIT, Northeastern) Probabilistic Optimal Estimation and Filtering Udine, WUDS 2011 48 / 48