On confidence sets for compressed sensing inference problems - - PowerPoint PPT Presentation

on confidence sets for compressed sensing inference
SMART_READER_LITE
LIVE PREVIEW

On confidence sets for compressed sensing inference problems - - PowerPoint PPT Presentation

On confidence sets for compressed sensing inference problems Richard Nickl joint work with: A. Carpentier, D. Gross, J. Eisert Paris, June 9th, 2015 Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 1 / 23


slide-1
SLIDE 1

On confidence sets for compressed sensing inference problems

Richard Nickl

joint work with: A. Carpentier, D. Gross, J. Eisert

Paris, June 9th, 2015

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 1 / 23

slide-2
SLIDE 2

The setting

Prototypical high-dimensional data

We observe noisy inner products Yi = X i, θ + εi, i = 1, . . . , n; The noise εi is distributed i.i.d. N(0, σ2), or sub-Gaussian (Bernoulli); with a known upper bound on the variance σ2 > 0.

“Vector” model X i are sensing vectors in Cp, and a, b = p

j=1 a∗ j bj.

. is the l2 norm The number n of observations is small compared to dimension (= p), but:

  • The vector θ ∈ Cp is k-sparse, θ ∈ M(k).

“Matrix” model X i are d × d sensing matrices, and A, B = tr(A∗B). . is the Frobenius norm. The number n of observations is small compared to dimension (= d2), but:

  • The d × d matrix θ has low rank k, θ ∈ M(k).

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 2 / 23

slide-3
SLIDE 3

Recovery rates in Compressed Sensing

When the design X satisfies the restricted isometry property (RIP), one can use ℓ1-regularisation estimators ˜ θ to recover the true θ from a minimal number of measurements in a computationally efficient way.

Minimax optimal performance over M(k)

For instance Lasso or Dantzig selector achieve, with high probability, recovery rates in the norm · arising from ·, · of the form θ is k-sparse : ˜ θ − θ2 k log p n ≡ r(k) θ has rank ≤ k : ˜ θ − θ2 kd n ≡ r(k).

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 3 / 23

slide-4
SLIDE 4

Recovery rates in Compressed Sensing

When the design X satisfies the restricted isometry property (RIP), one can use ℓ1-regularisation estimators ˜ θ to recover the true θ from a minimal number of measurements in a computationally efficient way.

Minimax optimal performance over M(k)

For instance Lasso or Dantzig selector achieve, with high probability, recovery rates in the norm · arising from ·, · of the form θ is k-sparse : ˜ θ − θ2 k log p n ≡ r(k) θ has rank ≤ k : ˜ θ − θ2 kd n ≡ r(k). When X is formed of i.i.d. (sub-) Gaussian variables, the RIP holds.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 3 / 23

slide-5
SLIDE 5

Quantum state estimation via Compressed Sensing

A quantum state describing an N-particle physical system is encoded in a d × d positive definite ‘density matrix’ θ of trace tr(θ) = 1. Here d = 2N.

Measurement = X Source = θ ^ ^

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 4 / 23

slide-6
SLIDE 6

Quantum state estimation via Compressed Sensing

A quantum state describing an N-particle physical system is encoded in a d × d positive definite ‘density matrix’ θ of trace tr(θ) = 1. Here d = 2N.

Measurement = X Source = θ ^ ^

Quantum measurements have ‘average outcomes’ tr(Eiθ) = Ei, θ where the Ei’s are Pauli tensor matrices. Repeated experiments approximate these expectations up to some noise εi whose variance Eε2

i = σ2 can usually be

controlled experimentally.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 4 / 23

slide-7
SLIDE 7

Quantum state estimation via Compressed Sensing

A quantum state describing an N-particle physical system is encoded in a d × d positive definite ‘density matrix’ θ of trace tr(θ) = 1. Here d = 2N.

Measurement = X Source = θ ^ ^

Quantum measurements have ‘average outcomes’ tr(Eiθ) = Ei, θ where the Ei’s are Pauli tensor matrices. Repeated experiments approximate these expectations up to some noise εi whose variance Eε2

i = σ2 can usually be

controlled experimentally. In experiments one attempts to prepare a pure quantum state: then θ ∈ M(1), at least approximately.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 4 / 23

slide-8
SLIDE 8

Random Pauli Design

Liu (2011) proved that if the (X i : i = 1, . . . , n) are random draws from {E1, . . . , Ed2}, where the Ei constitute the Pauli tensor product basis of Md(C), then the resulting design satisfies the matrix RIP.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 5 / 23

slide-9
SLIDE 9

Random Pauli Design

Liu (2011) proved that if the (X i : i = 1, . . . , n) are random draws from {E1, . . . , Ed2}, where the Ei constitute the Pauli tensor product basis of Md(C), then the resulting design satisfies the matrix RIP. Recovery of low rank quantum states θ ∈ M(k) is thus possible after n ≈ kd(log d)γ ≡ m(k) ≪ d2 measurements, a significant improvement over requiring d2 basis coefficient measurements. See Gross (2011) and Gross et al. (2010, 2012).

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 5 / 23

slide-10
SLIDE 10

Random Pauli Design

Liu (2011) proved that if the (X i : i = 1, . . . , n) are random draws from {E1, . . . , Ed2}, where the Ei constitute the Pauli tensor product basis of Md(C), then the resulting design satisfies the matrix RIP. Recovery of low rank quantum states θ ∈ M(k) is thus possible after n ≈ kd(log d)γ ≡ m(k) ≪ d2 measurements, a significant improvement over requiring d2 basis coefficient measurements. See Gross (2011) and Gross et al. (2010, 2012). Likewise, in the vector model recovery is possible after n ≈ k log p ≡ m(k) measurements whenever θ is k-sparse.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 5 / 23

slide-11
SLIDE 11

Certificates (Active Learning)

→ For implementation one would like uncertainty quantification: given any ǫ > 0 we want a data driven stopping time ˆ n such that ∀θ : ˜ θˆ

n − θ2 < ǫ, and whenever θ ∈ M(k) ⇒ ˆ

n ≈whp m(k). ε precision Time horizon M(k0) M(k1) n ^ n ^ = m(k1)/ 2 = m(k0)/ 2 ε ε → Such ‘certificates’ are closely related to ‘adaptive confidence sets’ for θ.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 6 / 23

slide-12
SLIDE 12

Relation between adaptive confidence sets and certificates

For any α > 0 we want the confidence set Cn to cover θ, inf

θ∈M(k) Pθ(θ ∈ Cn) ≥ 1 − α − o(1),

[Honest Coverage] where M(k), k ≤ d, is a ‘maximal’ model.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 7 / 23

slide-13
SLIDE 13

Relation between adaptive confidence sets and certificates

For any α > 0 we want the confidence set Cn to cover θ, inf

θ∈M(k) Pθ(θ ∈ Cn) ≥ 1 − α − o(1),

[Honest Coverage] where M(k), k ≤ d, is a ‘maximal’ model. For any sub-model M(k0), 1 ≤ k0 ≤ k, we want the diameter of Cn to satisfy sup

θ∈M(k0)

Eθ|Cn|2 minimax rate r(k0) over M(k0), [Optimal Diameter]

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 7 / 23

slide-14
SLIDE 14

Relation between adaptive confidence sets and certificates

For any α > 0 we want the confidence set Cn to cover θ, inf

θ∈M(k) Pθ(θ ∈ Cn) ≥ 1 − α − o(1),

[Honest Coverage] where M(k), k ≤ d, is a ‘maximal’ model. For any sub-model M(k0), 1 ≤ k0 ≤ k, we want the diameter of Cn to satisfy sup

θ∈M(k0)

Eθ|Cn|2 minimax rate r(k0) over M(k0), [Optimal Diameter] Certificate ⇒ Confidence set: Given adaptive certificate, one can obtain adaptive confidence sets. Idea: solve for ǫ corresponding to ˆ n = n.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 7 / 23

slide-15
SLIDE 15

Relation between adaptive confidence sets and certificates

For any α > 0 we want the confidence set Cn to cover θ, inf

θ∈M(k) Pθ(θ ∈ Cn) ≥ 1 − α − o(1),

[Honest Coverage] where M(k), k ≤ d, is a ‘maximal’ model. For any sub-model M(k0), 1 ≤ k0 ≤ k, we want the diameter of Cn to satisfy sup

θ∈M(k0)

Eθ|Cn|2 minimax rate r(k0) over M(k0), [Optimal Diameter] Certificate ⇒ Confidence set: Given adaptive certificate, one can obtain adaptive confidence sets. Idea: solve for ǫ corresponding to ˆ n = n. Confidence set ⇒ Certificate : The converse is true if a confidence sets exist ‘sequentially’ in n, with coverage guaranteed for all relevant values of n.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 7 / 23

slide-16
SLIDE 16

Design Assumptions

We now describe our results, which are proved under any of the following assumptions: The design X is (sub-) Gaussian and i.i.d.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 8 / 23

slide-17
SLIDE 17

Design Assumptions

We now describe our results, which are proved under any of the following assumptions: The design X is (sub-) Gaussian and i.i.d. In the matrix model, we also allow for random Pauli design Remark : For Pauli design, we tacitly assume the ‘quantum shape constraint’ θ ∈ Θ+ ≡ {u ∈ Cd×d, u 0, tr(u) = 1}.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 8 / 23

slide-18
SLIDE 18

Design Assumptions

We now describe our results, which are proved under any of the following assumptions: The design X is (sub-) Gaussian and i.i.d. In the matrix model, we also allow for random Pauli design Remark : For Pauli design, we tacitly assume the ‘quantum shape constraint’ θ ∈ Θ+ ≡ {u ∈ Cd×d, u 0, tr(u) = 1}. [Matrix completion problems can also be handled in principle.]

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 8 / 23

slide-19
SLIDE 19

(Negative) Results in the vector model

Vector model (θ ∈ Rp) Yi = X i, θ + εi, i = 1, . . . , n.

Theorem [N and van de Geer (2013)]

Let k ≤ p be arbitrary (going to infinity with n).

There are no adaptive and honest confidence sets over M(k). There are no adaptive certificates (also over M(k)).

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 9 / 23

slide-20
SLIDE 20

(Negative) Results in the vector model

Vector model (θ ∈ Rp) Yi = X i, θ + εi, i = 1, . . . , n.

Theorem [N and van de Geer (2013)]

Let k ≤ p be arbitrary (going to infinity with n).

There are no adaptive and honest confidence sets over M(k). There are no adaptive certificates (also over M(k)).

Related negative results from non-parametric statistics:

[Low (1997), Cai and Low (2004), Juditsky and Lambert-Lacroix (2004), Robins and van der Vaart (2006), Hoffmann and Nickl (2011), Bull and Nickl (2012)]

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 9 / 23

slide-21
SLIDE 21

Exact version of N & van de Geer (2013), vector model

Theorem

A confidence set Cn with honest coverage over M(k) exists such that sup

θ∈M(k0)

Eθ|Cn|2 k0 log p n + 1 √n for any 1 ≤ k0 ≤ k. If Cn is honest over the entire M(k) then necessarily for any k0 ≤ k, sup

θ∈M(k0)

Eθ|Cn|2 ≥ c √n . A confidence set Cn that has, for any 1 ≤ k0 ≤ k, diameter sup

θ∈M(k0)

Eθ|Cn|2 k0 log p n = o 1 √n

  • can be honest over M(k) only under the signal strength assumption

k

  • j=k0+1

θ2

(j) ≥ C

√n , θ2

(j) > θ2 (j+1) ∀j.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 10 / 23

slide-22
SLIDE 22

(Positive) Results in the matrix model

Matrix model (θ ∈ Cd×d) Yi = X i, θ + εi, i = 1, . . . , n.

Theorem [Carpentier, Eisert, Gross, N (2015)]

The two following results hold : Adaptive and honest confidence sets exist over M(d). Adaptive certificates exist. No restrictions in this setting on the model!

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 11 / 23

slide-23
SLIDE 23

Intuitions for the proofs I: decision theory

Consider the testing problem for k > 1 : H0 : θ ∈ M(1) vs H1 : θ ∈ M(k), θ − M(1) ≥ ρ. M(1) M(k)

ρ

θ

NO

r(1)

A general mechanism [Hoffmann & N (2011)] implies that if this testing rate is of greater magnitude than the estimation rate r(1) in the sub model M(1), so when ρ ≫ r(1), then adaptive confidence sets do NOT exist over M(k) and its sub-models.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 12 / 23

slide-24
SLIDE 24

Intuitions for the proofs II: decision theory

Consider now a similar testing problem for k0 < d : H0 : θ ∈ M(k0) vs H1 : θ ∈ M(d), θ − M(k0) ≥ ρ.

M(k0) M(k)

ρ

θ

OK

r(k0)

For any k0 :

If now the testing rate ρ is of smaller order than the estimation rate r(k0) in a given sub model M(k0), that is, when ρ r(k0) then valid confidence sets DO exist over M(d) that adapt to M(k0).

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 13 / 23

slide-25
SLIDE 25

Intuitions for the proofs III: vector case

Testing problem for k > k0 ≥ 1 : H0 : θ ∈ M(k0) vs H1 : θ ∈ M(k), θ − M(k0) ≥ ρ.

Lemma [N and van de Geer (2013)]

Upper bound on ρ : min

  • k log(p)

n , n−1/4, p1/4

  • 1

n

  • +
  • k0 log(p)

n . Lower bound on ρ : min

  • k log(p)

n , n−1/4, p1/4

  • 1

n

  • .

Related result: [Ingster, Tsybakov and Verzelen (2010)] → Adaptive confidence sets do NOT exist over M(k), k ≫ 1, since ρ ≫ r(1) =

  • log(p)/n.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 14 / 23

slide-26
SLIDE 26

Intuitions for the proofs IV: matrix case

The analysis in Carpentier, Eisert, Gross, N (2015) and Carpentier and N (2015+), implicitly reveals that the testing rates for the problem H0 : θ ∈ M(k0) vs H1 : θ ∈ M(d), θ − M(k0) ≥ ρ are bounded as follows:

Lemma

Upper bound on ρ : min

  • n−1/4,
  • d

n

  • +
  • k0d

n . Lower bound on ρ : min

  • n−1/4,
  • d

n

  • .

Confidence sets thus DO exist over M(d) since for any k0, k, ρ r(k0) =

  • k0d/n.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 15 / 23

slide-27
SLIDE 27

Confidence sets in practice: Unbiased risk estimation

Suppose we have two samples of size n at hand: We use one to compute our favourite estimator ˜ θ of θ – say the matrix Lasso – and freeze this output.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 16 / 23

slide-28
SLIDE 28

Confidence sets in practice: Unbiased risk estimation

Suppose we have two samples of size n at hand: We use one to compute our favourite estimator ˜ θ of θ – say the matrix Lasso – and freeze this output. We use the second sample Y = (Y1, . . . , Yn)T and its sampling operator Xθ = (tr(X 1θ), . . . , tr(X nθ))T to define ˆ rn = 1 n

  • Y − X ˜

θ

  • 2

− σ2.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 16 / 23

slide-29
SLIDE 29

Confidence sets in practice: Unbiased risk estimation

Suppose we have two samples of size n at hand: We use one to compute our favourite estimator ˜ θ of θ – say the matrix Lasso – and freeze this output. We use the second sample Y = (Y1, . . . , Yn)T and its sampling operator Xθ = (tr(X 1θ), . . . , tr(X nθ))T to define ˆ rn = 1 n

  • Y − X ˜

θ

  • 2

− σ2. Then by (pretty basic) concentration of measure tools one can bound Pθ

rn − ˜ θ − θ2| > zn,α

  • ≤ α,

∀n ∈ N, for computable quantile constants zn,α ≈ zα 1 √n + d n

  • , zα = O(log(1/α)).

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 16 / 23

slide-30
SLIDE 30

An optimal confidence set for n < d2 measurements

Assuming σ2 is known, the resulting confidence set Cn =

  • θ : θ − ˜

θ2 ≤ ˆ rn + zn,α

  • covers an arbitrary d × d matrix θ with prescribed non-asymptotic probability

Pθ(θ ∈ Cn) ≥ 1 − α ∀n ∈ N. The squared diameter of Cn is of order Eθ|Cn|2 = Eθˆ rn + zn,α = θ − ˜ θ2 + O 1 √n + kd n

  • .

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 17 / 23

slide-31
SLIDE 31

An optimal confidence set for n < d2 measurements

Assuming σ2 is known, the resulting confidence set Cn =

  • θ : θ − ˜

θ2 ≤ ˆ rn + zn,α

  • covers an arbitrary d × d matrix θ with prescribed non-asymptotic probability

Pθ(θ ∈ Cn) ≥ 1 − α ∀n ∈ N. The squared diameter of Cn is of order Eθ|Cn|2 = Eθˆ rn + zn,α = θ − ˜ θ2 + O 1 √n + kd n

  • .

The case relevant in quantum state estimation is n < d2 ⇐ ⇒ 1 √n < d n ≤ kd n , so when the matrix Lasso ˜ θ has been used with n < d2 measurements, the diameter of the above confidence set is optimal for any θ ∈ M(k), 1 ≤ k ≤ d.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 17 / 23

slide-32
SLIDE 32

Optimal confidence sets for n ≥ d2 measurements

When sampling n ≥ d2 times from the Pauli basis, there is no need to randomise the observation scheme: all d2 basis coefficients tr(Eiθ) can be

  • measured. While less interesting in quantum applications, one can use a

simple re-averaging trick to get an optimal confidence set also in this case.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 18 / 23

slide-33
SLIDE 33

Optimal confidence sets for n ≥ d2 measurements

When sampling n ≥ d2 times from the Pauli basis, there is no need to randomise the observation scheme: all d2 basis coefficients tr(Eiθ) can be

  • measured. While less interesting in quantum applications, one can use a

simple re-averaging trick to get an optimal confidence set also in this case. In the case of isotropic Gaussian design, one can replace the above RSS-statistic ˆ rn by the following U-statistic : ˆ Rn = 2 n(n − 1)

  • i<j
  • m,k

(YiX i

mk − ˜

θmk)(YjX j

mk − ˜

θmk). Using concentration inequalities for U-statistics one obtains confidence bounds as above, but now with quantile constants zα,n = zα d n , compatible with any adaptive recovery rate kd/n ≥ d/n.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 18 / 23

slide-34
SLIDE 34

Adaptive sampling certificates for quantum states

Given ǫ > 0, check for a batch of m experiments whether its diameter satisfies |Cm| ≤ ǫ or not.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 19 / 23

slide-35
SLIDE 35

Adaptive sampling certificates for quantum states

Given ǫ > 0, check for a batch of m experiments whether its diameter satisfies |Cm| ≤ ǫ or not. If not, increase the number m of experiments in the next iteration. Terminate the protocol as soon as the desired level ǫ is reached.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 19 / 23

slide-36
SLIDE 36

Adaptive sampling certificates for quantum states

Given ǫ > 0, check for a batch of m experiments whether its diameter satisfies |Cm| ≤ ǫ or not. If not, increase the number m of experiments in the next iteration. Terminate the protocol as soon as the desired level ǫ is reached.

Theorem (Carpentier, Eisert, Gross, N (2015))

Let θ ∈ Θ+, let the X i’s be sampled from the Pauli basis, and let ǫ be given. There exists an algorithm returning a d × d matrix ˆ θ after ˆ n ∈ N measurements such that with high probability, for every 1 ≤ k ≤ d and every θ ∈ M(k):

  • The recovery error is at most

ˆ θ − θ ≤ ǫ

  • and the stopping time ˆ

n satisfies, for some constant C = C(ǫ) = O(1/ǫ2), ˆ n ≤ Ckd(log d)γ.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 19 / 23

slide-37
SLIDE 37

Nuclear norm recovery

While the above theory for the Frobenius norm is already quite satisfactory, for applications to quantum information sciences, another notion of distance is of key importance: the trace-, or nuclear- norm · S1.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 20 / 23

slide-38
SLIDE 38

Nuclear norm recovery

While the above theory for the Frobenius norm is already quite satisfactory, for applications to quantum information sciences, another notion of distance is of key importance: the trace-, or nuclear- norm · S1. First: it equals the total variation distance on density matrices, but also: θ − θ′S1 = sup

Xop=1

tr(X(θ − θ′)), the maximal expected discrepancy of arbitrary measurements arising from two different quantum states θ, θ′.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 20 / 23

slide-39
SLIDE 39

Nuclear norm recovery

While the above theory for the Frobenius norm is already quite satisfactory, for applications to quantum information sciences, another notion of distance is of key importance: the trace-, or nuclear- norm · S1. First: it equals the total variation distance on density matrices, but also: θ − θ′S1 = sup

Xop=1

tr(X(θ − θ′)), the maximal expected discrepancy of arbitrary measurements arising from two different quantum states θ, θ′. The minimax rates for low rank recovery in nuclear norm are slightly worse ˜ θ − θS1 ≈ k

  • d

n , whenever θ ∈ M(k) see the paper by Ma and Wu (2013) (more or less) and Koltchinskii and Xia (2015+). Assume θ ∈ M(k) where k is such that the last quantity goes to zero, and that the design satisfies the RIP for this k.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 20 / 23

slide-40
SLIDE 40

Trace-norm confidence sets and quantum shape constraints

Theorem (Carpentier, Eisert, Gross, N (2015)

There exists a confidence set Cn that is honest over M+(k) = Θ+ ∩ M(k), inf

θ∈M+(k) Pθ(θ ∈ Cn) ≥ 1 − α − o(1),

and such that with high probability |Cn|S1 k0

  • d(log d)γ

n whenever θ ∈ M+(k0) for some 1 ≤ k0 ≤ k.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 21 / 23

slide-41
SLIDE 41

Trace-norm confidence sets and quantum shape constraints

Theorem (Carpentier, Eisert, Gross, N (2015)

There exists a confidence set Cn that is honest over M+(k) = Θ+ ∩ M(k), inf

θ∈M+(k) Pθ(θ ∈ Cn) ≥ 1 − α − o(1),

and such that with high probability |Cn|S1 k0

  • d(log d)γ

n whenever θ ∈ M+(k0) for some 1 ≤ k0 ≤ k. However: A confidence set that is honest over all of M(k) – so without the quantum shape constraint – and that adapts to M(k0) in nuclear norm, does not exist for general choices of k0, k.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 21 / 23

slide-42
SLIDE 42

Ideas behind the proof

As above we can understand the problem better by studying the intrinsic composite hypothesis testing problem H0 : θ ∈ M(k0) vs H1 : θ ∈ M(k) \ M(k0). Again key information theoretic quantity that has to be checked is the detection boundary between H0 and H1.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 22 / 23

slide-43
SLIDE 43

Ideas behind the proof

As above we can understand the problem better by studying the intrinsic composite hypothesis testing problem H0 : θ ∈ M(k0) vs H1 : θ ∈ M(k) \ M(k0). Again key information theoretic quantity that has to be checked is the detection boundary between H0 and H1. When the distance between H0 and H1 is measured in · S1-norm: θ ∈ H1 \ H0 ⇒ H0 − θS1 ≥

k

  • j=k0+1

|λj|, a k − k0 ≈ k -dimensional quantity. If we know a priori that θ ∈ Θ+ then H0 − θS1 = 1 −

  • j≤k0

λj, a k0-dimensional object: the testing problem is then fundamentally easier.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 22 / 23

slide-44
SLIDE 44

Conclusion

In some sense, for sparse recovery, dimension reduction is too radical: p → k log p, which creates problems for inference. In low rank problems we only reduce p → kd = k√p, which can be coped with nicely by risk estimation ideas.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 23 / 23

slide-45
SLIDE 45

Conclusion

In some sense, for sparse recovery, dimension reduction is too radical: p → k log p, which creates problems for inference. In low rank problems we only reduce p → kd = k√p, which can be coped with nicely by risk estimation ideas. When performing Pauli measurements, and when considering trace-norm geometry, the ‘quantum constraint’ θ ∈ Θ+ seems crucially necessary.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 23 / 23

slide-46
SLIDE 46

Conclusion

In some sense, for sparse recovery, dimension reduction is too radical: p → k log p, which creates problems for inference. In low rank problems we only reduce p → kd = k√p, which can be coped with nicely by risk estimation ideas. When performing Pauli measurements, and when considering trace-norm geometry, the ‘quantum constraint’ θ ∈ Θ+ seems crucially necessary. References

  • R. Nickl, S. van de Geer; Confidence sets in sparse regression, Annals of Statistics (2013).
  • A. Carpentier, J. Eisert, D. Gross, R. Nickl; Uncertainty quantification for matrix

compressed sensing and quantum tomography problems, arxiv 1504.03234 (2015).

  • A. Carpentier, R. Nickl, in preparation, 2015.

Richard Nickl (Univ. Cambridge) Confidence in Compressed Sensing Paris, June 2015 23 / 23