Minimax theory for a class of non-linear statistical inverse - PowerPoint PPT Presentation

Introduction Main results Minimax theory for a class of non-linear statistical inverse problems Kolyan Ray (joint work with Johannes Schmidt-Hieber) Leiden University Van Dantzig Seminar 26 February 2016 Kolyan Ray 1 / 18

Introduction Model and motivation Main results Flat H¨ older smoothness We consider the following non-linear inverse problem: dY t = ( h ◦ Kf )( t ) dt + 1 √ ndW t , t ∈ [0 , 1] , where h is a known strictly monotone link function, K is a known (possibly ill-posed) linear operator, W is a standard Brownian motion. Note that the non-linearity comes from h , which acts pointwise. If h is the identity, we recover the classical linear inverse problem with Gaussian noise. We will look at several specific choices of h (and K ) motivated by statistical applications. Kolyan Ray 2 / 18

Introduction Model and motivation Main results Flat H¨ older smoothness Asymptotic equivalence between two experiments roughly means that there is a model transformation that does not lead to an asymptotic loss of information about the parameter. It can be useful to examine such models since they are often easier to analyse. Many non-Gaussian statistical inverse problems can be rewritten as dY t = ( h ◦ Kf )( t ) dt + 1 √ ndW t , t ∈ [0 , 1] , using the notion of asymptotic equivalence. We study pointwise estimation in such models, which has been studied by numerous authors. We are particularly interested in the case where f takes small (or zero) function values. Kolyan Ray 3 / 18

Introduction Model and motivation Main results Flat H¨ older smoothness Let us first assume that K is the identity for simplicity. We consider the following examples (under certain constraints): Density estimation : we observe i.i.d. data X 1 , ..., X n ∼ f . Poisson intensity estimation : we observe a Poisson process on [0 , 1] with intensity function nf . These can both be rewritten with h ( x ) = 2 √ x to give � f ( t ) dt + n − 1 / 2 dW t . dY t = 2 Kolyan Ray 4 / 18

Introduction Model and motivation Main results Flat H¨ older smoothness Let us first assume that K is the identity for simplicity. We consider the following examples (under certain constraints): Density estimation : we observe i.i.d. data X 1 , ..., X n ∼ f . Poisson intensity estimation : we observe a Poisson process on [0 , 1] with intensity function nf . These can both be rewritten with h ( x ) = 2 √ x to give � f ( t ) dt + n − 1 / 2 dW t . dY t = 2 Binary regression : we observe n independent Bernoulli random variables with success probability P ( X i = 1) = f ( i / n ), where f : [0 , 1] → [0 , 1] is an unknown regression function. This can be rewritten with h ( x ) = 2 arcsin √ x to give � f ( t ) dt + n − 1 / 2 dW t . dY t = 2 arcsin Kolyan Ray 4 / 18

Introduction Model and motivation Main results Flat H¨ older smoothness Spectral density estimation : we observe a random vector of length n coming from a stationary Gaussian distribution with spectral density f . Gaussian variance estimation : We observe X 1 , ..., X n independent with X i ∼ N (0 , f ( i / n ) 2 ), where f ≥ 0 is unknown. This can be rewritten with h ( x ) = 2 − 1 / 2 log x to give 1 log f ( t ) dt + n − 1 / 2 dW t . √ dY t = 2 The choice of h is linked to the variance stabilizing transformation of the model. Kolyan Ray 5 / 18

Introduction Model and motivation Main results Flat H¨ older smoothness The linear operator K is typically an ill-posed operator (not continuously invertible). Perhaps the two most common examples for h ( x ) = 2 √ x are: Density deconvolution : we observe data X 1 + ǫ 1 , ..., X n + ǫ n , where X i ∼ f and ǫ i ∼ g for g a known density. Poisson intensity estimation : K is typically a convolution operator modelling the blurring of images by a so-called point spread function. The 2-dimensional version of this problem has applications in photonic imaging. In both cases we have Kf ( t ) = f ∗ g ( t ) for some known g , giving � f ∗ g ( t ) dt + n − 1 / 2 dW t . dY t = 2 Kolyan Ray 6 / 18

Introduction Model and motivation Main results Flat H¨ older smoothness We will discuss the case h ( x ) = 2 √ x (density esitmation, Poisson intensity estimation). The other cases are similar. older smoothness C β ? What happens if we assign f classical H¨ √ If f ∈ C β then f ∈ C β/ 2 for β ≤ 2. Kolyan Ray 7 / 18

Introduction Model and motivation Main results Flat H¨ older smoothness We will discuss the case h ( x ) = 2 √ x (density esitmation, Poisson intensity estimation). The other cases are similar. older smoothness C β ? What happens if we assign f classical H¨ √ If f ∈ C β then f ∈ C β/ 2 for β ≤ 2. Theorem (Bony et al. (2006)) √ There exists a function f ∈ C ∞ such that f �∈ C β for any β > 1 . So we cannot exploit higher order H¨ older regularity beyond β = 2. The problem arises due to very small non-zero function values, √ where the derivatives of f can fluctuate greatly. Kolyan Ray 7 / 18

Introduction Model and motivation Main results Flat H¨ older smoothness We propose an alternative restricted space: H β = { f ∈ C β : f ≥ 0 , � f � H β := � f � C β + | f | H β < ∞} , where � · � C β is the usual H¨ older norm and � � 1 / j � | f ( j ) | β / | f | β − j � � | f ( j ) ( x ) | β 1 / j � � | f | H β = max sup = max � | f ( x ) | β − j 1 ≤ j <β 1 ≤ j <β ∞ x ∈ [0 , 1] is a seminorm ( | f | H β = 0 for β ≤ 1). The quantity | f | H β measures the flatness of a function near 0 in the sense that if f ( x ) is small then the derivatives of f must also be small in a neighbourhood of x . This can be thought of as a shape constraint. Kolyan Ray 8 / 18

Introduction Model and motivation Main results Flat H¨ older smoothness H β contains all C β functions uniformly bounded away from 0 (the typical assumption for such problems), functions that take small values in a ’controlled’ way, e.g. ( x − x 0 ) β g ( x ) for g ≥ ε > 0 in C ∞ . Theorem √ If f ∈ H β then f ∈ H β/ 2 for all β ≥ 0 . In fact, it turns out that H β = C β for 0 < β ≤ 2 (hence why the relation holds for C β ). Kolyan Ray 9 / 18

Two-stage procedure Introduction Minimax rates Main results Extension to full inverse problem We propose a two-stage procedure: 1 Let [ h ( Kf )] HT denote the hard wavelet thresholding estimator of h ( Kf ). Estimate Kf by the estimator � Kf = h − 1 ([ h ( Kf )] HT ) (recall that h is injective). Using this we have access to � Kf ( t ) = Kf ( t ) + δ ( t ) , where δ ( t ) is the noise level (which is the minimax rate with high probability). 2 Treat the above as a deterministic inverse problem with noise level δ . Solve this for f using classical methods (e.g. Tikhonov regularization, Bayesian methods, etc.) Kolyan Ray 10 / 18

Two-stage procedure Introduction Minimax rates Main results Extension to full inverse problem For the noise level δ (step 1), without loss of generality set K = id . We consider a pointwise function-dependent rate for f ∈ H β : � log n � � � β β � � f ( x )log n β +1 2 β +1 r n ,β f ( x ) = ∨ . n n Theorem The estimator � f = h − 1 ([ h ( f )] HT ) satisfies � � | � f ( x ) − f ( x ) | ≥ 1 − n − C ′ , sup ≤ C P f r n ,β ( f ( x )) x ∈ [0 , 1] uniformly over ∪ β, R { f : � f � H β ≤ R } ( β , R in compact sets). The estimator adapts to H β -smoothness and local function size uniformly over x ∈ [0 , 1]. Kolyan Ray 11 / 18

Two-stage procedure Introduction Minimax rates Main results Extension to full inverse problem � log n � � � β β � � f ( x )log n β +1 2 β +1 r n ,β f ( x ) = ∨ n n The log n -factors are needed for adaptive estimation in pointwise estimation (as usual). β β +1 we recover the usual nonparametric For f ( x ) � (log n / n ) rate, albeit with pointwise dependence on the radius. β β +1 , we have faster than n − 1 / 2 rates for For f ( x ) � (log n / n ) β > 1, i.e. superefficiency. For small function values: variance ≫ bias. Related to irregular models: similar to nonparametric regression with one-sided errors, e.g. Jirak et al. (2014). The smaller regime is caused by the non-linearity of h ( x ) = √ x near 0. Kolyan Ray 12 / 18

Two-stage procedure Introduction Minimax rates Main results Extension to full inverse problem The same rate has recently (and independently) been proved directly in the case of density estimation by Patschkowski and Rohde (2016) for 0 < β ≤ 2. They consider classical H¨ older smoothness C β , which is why they get stuck at β = 2. Suppose we take f ( x ) = ( x − 1 / 2) 2 . Then f ∈ C ∞ ([0 , 1]) ∩ H 2 , but f �∈ H β for any β > 2. Intuitively, we see that � h ( f ( x )) = f ( x ) = | x − 1 / 2 | is C 1 , but no more regular. We recover rate based on this smoothness, which corresponds to β/ 2 = 1, but not faster. This corresponds to the correct flatness condition. We have more precise examples of such lower bounds, but they are not as intuitive. Kolyan Ray 13 / 18

Minimax theory for a class of non-linear statistical inverse - PowerPoint PPT Presentation

Introduction Main results Minimax theory for a class of non-linear statistical inverse problems Kolyan Ray (joint work with Johannes Schmidt-Hieber) Leiden University Van Dantzig Seminar 26 February 2016 Kolyan Ray 1 / 18 Introduction

4. Minimax and planning problems Optimizing piecewise linear functions Minimax problems

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Topic 5: Non-Linear Relationships and Non-Linear Least Squares Non-linear Relationships Many

Minimax Statistical Learning with Wasserstein distances Jaeho Lee & Maxim Raginsky NeurIPS

Splines linear non-linearity September 9, 2019 Splines linear non-linearity September 9,

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Spatial covariance-robust minimax prediction based on experimental design ideas Gunter Spoeck

Minimax Fixed-Design Linear Regression Peter L. Bartlett, Wouter M. Koolen, Alan Malek , Eiji

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Lecture 2: Linear Programming and Duality Lecture Outline Part I: Linear Programming and

A very complicated proof of the minimax theorem Jonathan Borwein FRSC FAAS FAA FBAS Centre for

Minimax risk of truncated series estimators over symmetric convex polytopes Adel Javanmard

Adversarial Search Volker Sorge Intro to AI: Problem of Games Lecture 4 Volker Sorge MiniMax

More on games (Ch. 5.4-5.6) Review: Minimax Afro Deli Shuang Cheng Cheese- Fried Lo Mein

Math 104 Calculus 8.4 Trigonometric Subs=tu=ons Math 104

Crash course on GLMs Richard Erickson Quantitative Ecologist DataCamp Hierarchical and Mixed

Chapter 7: The Laplace Transform Department of Electrical Engineering National Taiwan University

Topological Kondo effect in Majorana devices Reinhold Egger Institut fr Theoretische Physik

1 2 3 4 5

The remnant CP transformation Felix Gonzalez Canales Departamento de F sica CINVESTAV-IPN

Rank Revealing QR factorization F. Guyomarch, D. Mezher and B. Philippe 1 Outline

How far does a drunkard get? Graduate Student Colloquium Armin Straub Tulane University, New

Minimax theory for a class of non-linear statistical inverse - PowerPoint PPT Presentation

Introduction Main results Minimax theory for a class of non-linear statistical inverse problems Kolyan Ray (joint work with Johannes Schmidt-Hieber) Leiden University Van Dantzig Seminar 26 February 2016 Kolyan Ray 1 / 18 Introduction

4. Minimax and planning problems Optimizing piecewise linear functions Minimax problems

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Topic 5: Non-Linear Relationships and Non-Linear Least Squares Non-linear Relationships Many

Minimax Statistical Learning with Wasserstein distances Jaeho Lee &amp; Maxim Raginsky NeurIPS

Splines linear non-linearity September 9, 2019 Splines linear non-linearity September 9,

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Spatial covariance-robust minimax prediction based on experimental design ideas Gunter Spoeck

Minimax Fixed-Design Linear Regression Peter L. Bartlett, Wouter M. Koolen, Alan Malek , Eiji

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Lecture 2: Linear Programming and Duality Lecture Outline Part I: Linear Programming and

A very complicated proof of the minimax theorem Jonathan Borwein FRSC FAAS FAA FBAS Centre for

Minimax risk of truncated series estimators over symmetric convex polytopes Adel Javanmard

Adversarial Search Volker Sorge Intro to AI: Problem of Games Lecture 4 Volker Sorge MiniMax

More on games (Ch. 5.4-5.6) Review: Minimax Afro Deli Shuang Cheng Cheese- Fried Lo Mein

Math 104 Calculus 8.4 Trigonometric Subs=tu=ons Math 104

Crash course on GLMs Richard Erickson Quantitative Ecologist DataCamp Hierarchical and Mixed

Chapter 7: The Laplace Transform Department of Electrical Engineering National Taiwan University

Topological Kondo effect in Majorana devices Reinhold Egger Institut fr Theoretische Physik

1 2 3 4 5

The remnant CP transformation Felix Gonzalez Canales Departamento de F sica CINVESTAV-IPN

Rank Revealing QR factorization F. Guyomarch, D. Mezher and B. Philippe 1 Outline

How far does a drunkard get? Graduate Student Colloquium Armin Straub Tulane University, New

Minimax Statistical Learning with Wasserstein distances Jaeho Lee & Maxim Raginsky NeurIPS

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE