On oracle inequalities related to high dimensional linear models - PowerPoint PPT Presentation

Spectral regularization for high dimensional linear models The Empirical Risk Minimization An oracle inequality for a known noise variance Unknown noise variance On oracle inequalities related to high dimensional linear models Yuri Golubev CNRS, Universit´ e de Provence Conference on Applied Inverse Problems July 21, Vienna Yuri Golubev Oracle inequalities

Spectral regularization for high dimensional linear models The Empirical Risk Minimization An oracle inequality for a known noise variance Unknown noise variance Outline of the talk 1 Spectral regularization for high dimensional linear models Ordered regularizations 2 The Empirical Risk Minimization Excess risk penalties 3 An oracle inequality for a known noise variance Short discussion 4 Unknown noise variance Example: the Tikhonov-Phillips regularization Yuri Golubev Oracle inequalities

Spectral regularization for high dimensional linear models The Empirical Risk Minimization Ordered regularizations An oracle inequality for a known noise variance Unknown noise variance This talk deals with recovering θ = ( θ (1) , . . . , θ ( n )) ⊤ ∈ R n from the noisy data Y = A θ + σξ, where A is a known m × n - matrix with m ≥ n ξ ∈ R n is a standard white Gaussian noise with E ξ ( k ) ξ ( l ) = δ kl , k , l = 1 , . . . , m n is large (infinity) . σ may be known or unknown. Example: the linear model can be used to approximate the equation � y ( u ) = A ( u , v ) θ ( v ) dv + ε ( u ) . Yuri Golubev Oracle inequalities

Spectral regularization for high dimensional linear models The Empirical Risk Minimization Ordered regularizations An oracle inequality for a known noise variance Unknown noise variance Maximum likelihood estimator The standard ML estimator is defined by m where � x � 2 = ˆ � Y − A θ � 2 , � x 2 ( k ) . θ 0 = arg min θ ∈ R n k =1 With a simple algebra we obtain /Moore (1920), Penrose (1955)/ θ 0 = ( A ⊤ A ) − 1 A ⊤ Y . ˆ Yuri Golubev Oracle inequalities

Spectral regularization for high dimensional linear models The Empirical Risk Minimization Ordered regularizations An oracle inequality for a known noise variance Unknown noise variance Risk of the MP inversion The risk of this inversion is computed as follows: n θ 0 − θ � 2 = E � ( A ⊤ A ) − 1 A ⊤ ǫ � 2 = σ 2 E � ˆ � λ k , k =1 where λ k are the eigenvalues of ( A ⊤ A ) − 1 λ k A ⊤ A ψ k = ψ k , λ 1 ≤ λ 2 , . . . , ≤ λ n and ψ k ∈ R n are the eigenvectors of A ⊤ A . If A has a large condition number or n is large, the risk of ˆ θ 0 may be very large. Yuri Golubev Oracle inequalities

Spectral regularization for high dimensional linear models The Empirical Risk Minimization Ordered regularizations An oracle inequality for a known noise variance Unknown noise variance Spectral regularization The basic idea in the spectral regularization is to suppress large λ k in the risk of ˆ θ 0 . We smooth ˆ θ 0 with the help of a properly chosen matrixes H α , α ∈ R + θ α = H α ˆ ˆ ( A ⊤ A ) − 1 � ( A ⊤ A ) − 1 A ⊤ Y , � θ 0 = H α n � ( A ⊤ A ) − 1 � � where H α ( s , l ) = H α ( λ k ) ψ s ( k ) ψ l ( k ) . k =1 Typically lim α → 0 H α ( λ ) = 1 , lim λ →∞ H α ( λ ) = 0 for all α > 0. α is called regularization parameter. Yuri Golubev Oracle inequalities

Spectral regularization for high dimensional linear models The Empirical Risk Minimization Ordered regularizations An oracle inequality for a known noise variance Unknown noise variance Bias-variance decomposition For the risk of ˆ θ α we get a standard bias-variance decomposition n n θ α − θ � 2 = � 2 � θ, ψ k � 2 + σ 2 E � ˆ � � λ k H 2 � 1 − H α ( λ k ) α ( λ k ) , k =1 k =1 n � where � θ, ψ k � = θ ( l ) ψ k ( l ). l =1 Remarks: The spectral regularization may improve substantially ˆ θ 0 when � θ, ψ k � 2 are small for large k. The best regularization parameter depends on θ and therefore it should be data-driven. Yuri Golubev Oracle inequalities

Spectral regularization for high dimensional linear models The Empirical Risk Minimization Ordered regularizations An oracle inequality for a known noise variance Unknown noise variance Spectral cut-off (requires the SVD) H α ( λ ) = 1 { αλ ≤ 1 } . Tikhonov’s regularization � � Y − A θ � 2 + α � θ � 2 � ˆ θ α = arg min θ or, equivalently, ˆ θ α = [ α I + A ⊤ A ] − 1 A ⊤ Y , H α ( λ ) = (1 + αλ ) − 1 . Landweber’s iterations (solve A ⊤ Y = A ⊤ A θ ) ˆ � ˆ I − a − 1 A ⊤ A θ i − 1 + a − 1 A ⊤ Y . � θ i = The iterations converge if a λ 1 < 1. It is easy to check that 1 − ( a λ ) − 1 � 1 /α , � H α ( λ ) = 1 − α = 1 / ( i + 1) . Yuri Golubev Oracle inequalities

Spectral regularization for high dimensional linear models The Empirical Risk Minimization Ordered regularizations An oracle inequality for a known noise variance Unknown noise variance Ordered functions In the above examples the families of functions (smoothers) H α ( · ) , α ∈ R + are ordered (see Kneip (1995)) 0 ≤ H α ( λ ) ≤ 1 for all λ ∈ R + H α 1 ( λ ) ≥ H α 2 ( λ ) , α 1 ≤ α 2 . Yuri Golubev Oracle inequalities

Spectral regularization for high dimensional linear models The Empirical Risk Minimization Excess risk penalties An oracle inequality for a known noise variance Unknown noise variance Our goal is to find the best estimate within the family spectral regularization methods ˆ θ α = H α [( A ⊤ A ) − 1 ]( A ⊤ A ) − 1 A ⊤ Y , α ∈ [0 , α ◦ ] . In other words, we are looking for ˆ α that minimizes α � 2 uniformly in θ ∈ R n . E � θ − ˆ θ ˆ This idea puts into practice with the help of the empirical risk minimization principle : θ α � 2 + σ 2 Pen ( α ) , R α [ Y ] , where R α [ Y ] = � ˆ θ 0 − ˆ α = arg min ˆ α and Pen ( α ) : (0 , α ◦ ] → R + is a given function of α . Yuri Golubev Oracle inequalities

Spectral regularization for high dimensional linear models The Empirical Risk Minimization Excess risk penalties An oracle inequality for a known noise variance Unknown noise variance A good data-driven regularization should minimize in some sense the risk L α ( θ ) = E � θ − ˆ θ α � 2 . This is why, we are looking for a minimal penalty that ensures the following inequality L α ( θ ) � R α [ Y ] + C , where C is a random variable that doesn’t depend on α and θ . It is easy to check that n θ 0 � 2 = − σ 2 C = −� θ − ˆ � λ k ξ 2 ( k ) k =1 Traditional approach to solve this inequality is based on the unbiased risk estimation defining the penalty as a root of the equation L α ( θ ) = E R α [ Y ] + E C . Yuri Golubev Oracle inequalities

Spectral regularization for high dimensional linear models The Empirical Risk Minimization Excess risk penalties An oracle inequality for a known noise variance Unknown noise variance Excess risk penalties Unfortunately, thus obtained penalty is not good for ill-posed problems (see e.g. Cavalier and Golubev (2006)). The main idea in this talk is to compute the penalty in a little bit different way, namely as a minimal root of the equation � � � � E sup L α ( θ ) − R α [ Y ] − C + ≤ K E L α ◦ ( θ ) − R α ◦ [ Y ] − C + , α ≤ α ◦ where [ x ] + = max { 0 , x } and K > 1 is a constant. Heuristic motivation: we are looking for the minimal penalty balancing the all excess risks. Yuri Golubev Oracle inequalities

Spectral regularization for high dimensional linear models The Empirical Risk Minimization Short discussion An oracle inequality for a known noise variance Unknown noise variance It finds out that for ordered smoothers the penalty may be found as a solution of the marginal equation � � � � + , α ∈ [0 , α ◦ ] L α ( θ ) − R α [ Y ] − C + ≤ E L α ◦ ( θ ) − R α ◦ [ Y ] − C E To compute the penalty, we assume that it has the following structure n � Pen ( α ) = 2 λ k H α [ λ k ] + (1 + γ ) Q ( α ) , k =1 where 2 � n k =1 λ k H α [ λ k ] is the penalty related to the unbiased risk estimation. γ is a positive number and Q ( α ) , α > 0 is a positive function of α to be defined later on. Yuri Golubev Oracle inequalities

Spectral regularization for high dimensional linear models The Empirical Risk Minimization Short discussion An oracle inequality for a known noise variance Unknown noise variance The large deviation approach results in the following algorithm for computing n ρ 2 α ( k ) � Q ( α ) = 2 D ( α ) µ α 1 − 2 µ α ρ α ( k ) , k =1 where n � 2 , � D 2 ( α ) =2 λ 2 2 H α [ λ k ] − H 2 � α [ λ k ] k k =1 √ 2 D − 1 ( α ) λ k 2 H α [ λ k ] − H 2 � � ρ α ( k ) = α [ λ k ] , where µ α is a root of equation n 2 x 2 F [ µ α ρ α ( k )] = log D ( α ) D ( α ◦ ) , F ( x ) = 1 � 2 log(1 − 2 x ) + x + 1 − 2 x . k =1 Yuri Golubev Oracle inequalities

On oracle inequalities related to high dimensional linear models - PowerPoint PPT Presentation

Spectral regularization for high dimensional linear models The Empirical Risk Minimization An oracle inequality for a known noise variance Unknown noise variance On oracle inequalities related to high dimensional linear models Yuri Golubev

Oracle Buys AmberPoint Strengthens Oracle Fusion Middleware SOA Suite and Oracle Enterprise

Oracle eBusiness Suite 11i Integration Ulrich Janke Oracle Consulting Deutschland Page 1

Lecture 13: Oracle Turing Machines Arijit Bishnu 13.04.2010 Oracle Turing Machines

Inequalities for Symmetric Polynomials Curtis Greene October 24, 2009 Inequalities for Symmetric

Oracle SOA Suite Enterprise Service Bus Oracle Integration Product Management Multi Tiered

Oracle SOA Suite Enterprise Service Bus Oracle Integration Product Management Oracle ESB Header

Oracle Buys Ksplice Oracle Linux Enhanced with Zero Downtime Software Updates July 21, 2011

Oracle Database 11g Highly Available Grid made easy with Oracle Enterprise Manager Venkat

Oracle Partner Network (OPN) Specialisms Andy Butchart - Prject (EU) Ltd Frank Lauer - Oracle

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

5. Linear Inequalities and Elimination Searching for certificates Projection of polyhedra

Finite Mathematics MAT 141: Chapter 3 Notes Graphing Linear Inequalities Linear Programming

Wittens Laplacian and the Morse Inequalities Background Morse Inequalities Wittens Idea

Health inequalities slides Wirral January 2020 Version 1.1 Why health inequalities are

Health Inequalities: A postcode lottery Postcode Lottery Health Inequalities Health

Welcome Health inequalities What are health inequalities? Our presenters will be introducing the

!"#"$%"$&#'(')#+$+,("-)./(

Decision Theory, and Loss Functions CMSC 691 UMBC Some slides adapted from Hamed Pirsiavash

Using Probability of Exceedance to Compare the Resource Risk of Renewable and Gas-Fired

Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Kimball, B-11 Instructor : Karthik

Stochastic Algorithms in Machine Learning Aymeric DIEULEVEUT EPFL, Lausanne December 1st, 2017

Principled Deep Neural Network Training through Linear Programming Daniel Bienstock 1 , Gonzalo

The Shadow Cost of Bank Capital Requirements Roni Kisin Washington University in St. Louis Asaf

5. Bayesian decision theory Chlo-Agathe Azencot Centre for Computatjonal Biology, Mines

On oracle inequalities related to high dimensional linear models - PowerPoint PPT Presentation

Spectral regularization for high dimensional linear models The Empirical Risk Minimization An oracle inequality for a known noise variance Unknown noise variance On oracle inequalities related to high dimensional linear models Yuri Golubev

Oracle Buys AmberPoint Strengthens Oracle Fusion Middleware SOA Suite and Oracle Enterprise

Oracle eBusiness Suite 11i Integration Ulrich Janke Oracle Consulting Deutschland Page 1

Lecture 13: Oracle Turing Machines Arijit Bishnu 13.04.2010 Oracle Turing Machines

Inequalities for Symmetric Polynomials Curtis Greene October 24, 2009 Inequalities for Symmetric

Oracle SOA Suite Enterprise Service Bus Oracle Integration Product Management Multi Tiered

Oracle SOA Suite Enterprise Service Bus Oracle Integration Product Management Oracle ESB Header

Oracle Buys Ksplice Oracle Linux Enhanced with Zero Downtime Software Updates July 21, 2011

Oracle Database 11g Highly Available Grid made easy with Oracle Enterprise Manager Venkat

Oracle Partner Network (OPN) Specialisms Andy Butchart - Prject (EU) Ltd Frank Lauer - Oracle

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

5. Linear Inequalities and Elimination Searching for certificates Projection of polyhedra

Finite Mathematics MAT 141: Chapter 3 Notes Graphing Linear Inequalities Linear Programming

Wittens Laplacian and the Morse Inequalities Background Morse Inequalities Wittens Idea

Health inequalities slides Wirral January 2020 Version 1.1 Why health inequalities are

Health Inequalities: A postcode lottery Postcode Lottery Health Inequalities Health

Welcome Health inequalities What are health inequalities? Our presenters will be introducing the

!&quot;#&quot;$%&quot;$&amp;#'(')#*+$+,(&quot;-).*/(

Decision Theory, and Loss Functions CMSC 691 UMBC Some slides adapted from Hamed Pirsiavash

Using Probability of Exceedance to Compare the Resource Risk of Renewable and Gas-Fired

Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Kimball, B-11 Instructor : Karthik

Stochastic Algorithms in Machine Learning Aymeric DIEULEVEUT EPFL, Lausanne December 1st, 2017

Principled Deep Neural Network Training through Linear Programming Daniel Bienstock 1 , Gonzalo

The Shadow Cost of Bank Capital Requirements Roni Kisin Washington University in St. Louis Asaf

5. Bayesian decision theory Chlo-Agathe Azencot Centre for Computatjonal Biology, Mines

!"#"$%"$&#'(')#+$+,("-)./(