bayesian methods for variable selection with applications
play

Bayesian Methods for Variable Selection with Applications to - PowerPoint PPT Presentation

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 5: Functional Data & Wavelets Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA) Bayesian Variable


  1. Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 5: Functional Data & Wavelets Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 5) ABS13-Italy 06/17-21/2013 1 / 56

  2. Part 5: Functional Data & Wavelets Introduction to nonparametric regression Brief intro to wavelets Wavelet shrinkage Functional data (multiple curves) Curve regression models Curve classification Applications to near infrared spectral data from chemometrics Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 5) ABS13-Italy 06/17-21/2013 2 / 56

  3. Nonparametric Regression Models Functional form of the regression function is not a simple function of a few parameters Technically: probability models with infinitely many parameters or models on functional spaces Typical setting: Given data ( X i , Y i ) , i = 1 , ..., n we wish to estimate Y = f ( X ) + ǫ X can be a covariate, a set of covariates, time, spatial location,... Basic idea: Avoid simple (parametric) forms Approaches needed to penalize for overfitting since there are now many parameters Function estimation (avoid linearity): splines, wavelets (basis functions) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 5) ABS13-Italy 06/17-21/2013 3 / 56

  4. Basis functions A basis is a set of simple functions that can be added together in a weighted fashion for form more complicated functions Suppose X is univariate Simplest basis: polynomial basis f ( X ) = β 0 + β 1 X + ... + β p X p Set of power functions to reconstruct f based on data: { x ; x 2 ; x 3 ... } With sufficiently large number of basis functions, we have essentially a nonparametric function estimator Matrix notation: f = � p j = 0 Bj ( x ) β j , where Bj ( X ) = x j and coefficients (weights) β j Polynomial basis - easy to fit but too simplistic for local features/variations Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 5) ABS13-Italy 06/17-21/2013 4 / 56

  5. Splines Splines are one-way to model f flexibly by writing f ( X ) = B ( X ) β where B (:) are basis functions. Basis functions: there a lot choices available - truncated power basis, B-splines, thin plate splines, penalized splines ... Truncated power basis of order p K f ( X ) = β 0 + β 1 X + ... + β p X p + β k + p ( X − κ k ) p � + k = 1 with β ’s the regression coefficients, κ the knots and K the number of knots. Linear/polynomial regression for K = 0 Construction of splines involves specifying knots: both number and location. Capture non-linear relationships between variables. Kernel based methods - Ruppert, Wand and Carroll (2003) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 5) ABS13-Italy 06/17-21/2013 5 / 56

  6. Wavelets: Major Milestones The basic idea behind wavelets is to represent a generic function with simpler functions (building blocks) at different scales and locations. 1807: Fourier orthogonal decomposition of periodic signals 1946: Gabor windowed Fourier transform (STFT) 1984: A. Grossmann and J. Morlet introduce the continuous wavelet transform for the analysis of seismic signals. 1985: Y. Meyer defines discrete orthonormal wavelet bases. 1989: S. Mallat links wavelets to the theory of “multiresolution analysis” (MRA) a framework that allows to construct orthonormal bases. A discrete wavelet transform is defined as a simple recursive algorithm to compute the wavelet decomposition of a signal from its approximation at a given scale. 1989: I. Daubechies constructs wavelets with compact support and a varying degree of smoothness. 1992: D. Donoho and I. Johnstone use wavelets to remove noise from data. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 5) ABS13-Italy 06/17-21/2013 6 / 56

  7. Wavelets as “Small Waves” The basic idea behind wavelets is to represent a generic function with simpler functions (building blocks) at different scales and locations Mexican hat wavelet 1 1.5 0.8 1 √ 2 ψ (2x); a=1/2, b=0 0.6 0.5 0.4 ψ (x) 0.2 0 0 −0.5 −0.2 −0.4 −1 −10 −5 0 5 10 −10 −5 0 5 10 x x 1 1 0.8 0.8 (1/ √ 2) ψ (x/2); a=2, b=0 ψ (x+4); a=1, b=4 0.6 0.6 0.4 0.4 0.2 0.2 0 0 −0.2 −0.2 −0.4 −0.4 −10 −5 0 5 10 −10 −5 0 5 10 x x ψ ( x ) dx = 0 � Mother wavelet ψ as oscillatory function with zero mean, Wavelets as ψ j , k ( x ) = 2 j / 2 ψ ( 2 j x − k ) with j , k scale and translation parameters. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 5) ABS13-Italy 06/17-21/2013 7 / 56

  8. Wavelets vs Fourier Representations Given f and a basis { f 1 , . . . , f n } → series expansion f ( x ) = � a i f i ( x ) dx i � a i = < f , f i > = f ( x ) f i ( x ) dx Fourier transforms measure the frequency content of a signal. Wavelet transforms provide a time-scale analysis. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 5) ABS13-Italy 06/17-21/2013 8 / 56

  9. Examples of Wavelet Families Haar wavelets. The simplest family of wavelets, already known before the formulation of the wavelet theory (Haar, 1909). ψ ( x ) = 1 for 0 ≤ x < 1 / 2; − 1 for 1 / 2 ≤ x < 1 and 0 otherwise. Haar wavelets constructed from ψ via dilations and translations Given a generic function f , Haar wavelets approximate f with piecewise constant functions (not continuous) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 5) ABS13-Italy 06/17-21/2013 9 / 56

  10. Daubechies Wavelets Daubechies 2 Daubechies 4 2 1.5 1.5 1 1 ψ (x) and φ (x) 0.5 0.5 0 0 −0.5 −0.5 −1 −1.5 −1 0 1 2 3 0 2 4 6 x x Daubechies 7 Daubechies 10 1.5 1.5 1 1 ψ (x) and φ (x) 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −1.5 −1.5 0 5 10 0 5 10 15 x x Compact support (good time localization) x l ψ ( x ) dx = 0 , l = 0 , 1 , . . . , N ensure decay as � Vanishing moments < f , ψ j , k > ≤ C 2 − jN , (good for compression) Various degrees of smoothness. For large N , φ ∈ C µ N , µ ≈ 0 . 2. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 5) ABS13-Italy 06/17-21/2013 10 / 56

  11. Properties of Wavelets Wavelet series: f ( x ) = � j , k < f , ψ j , k > ψ j , k ( x ) � { W f ( j , k ) = � f , ψ j , k � = f ( x ) ψ j , k ( x ) dx } j , k ∈ Z describing features of f at different locations and scales. Properties: Small waves with zero mean Time-frequency localization Good at describing non-stationarity and discontinuities Multi-scale decomposition of functions (MRA) - sparsity, shrinkage Recursive relationships among coefficients → DWT Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 5) ABS13-Italy 06/17-21/2013 11 / 56

  12. Wavelets in Practice (DWT) Let’s consider a vector Y of observations of f at n equispaced points i = 1 , . . . , n , with n = 2 m y i = f ( t i ) , Discrete Wavelet Transforms operate via recursive filters applied to Y H H H data ≡ c ( m ) → c ( m − 1 ) → c ( m − r ) − − → · · · − G G G ց ց ց d ( m − 1 ) d ( m − r ) · · · H : c m − 1 , k = � h l − 2 k c m , l , G : d m − 1 , k = � g l − 2 k c m , l , ... l l Then, in practice Y DWT → d ( Y ) = ( c ( m − r ) , d ( m − r ) , d ( m − r − 1 ) , . . . , d ( m − 1 ) ) − discrete approx of < f , ψ j , k > at scales m − 1 , . . . , m − r Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 5) ABS13-Italy 06/17-21/2013 12 / 56

  13. Example 10 8 6 4 f+ ε 2 0 −2 −4 0 100 200 300 400 500 600 700 800 900 1000 15 10 5 DWT 0 −5 −10 −15 0 100 200 300 400 500 600 700 800 900 1000 Coefficient index data ≡ Y DWT → d ( Y ) = ( c ( 3 ) , d ( 3 ) , . . . , d ( 7 ) , d ( 8 ) , d ( 9 ) ) − → d ( Y ) = W Y , with W determined by ψ and such In matrix notation, Y − that W ′ W = I Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 5) ABS13-Italy 06/17-21/2013 13 / 56

  14. Summary on Wavelets Choose the mother wavelet ψ and obtain the wavelet family { ψ j , k } j , k ∈ Z In theory, given f , calculate the wavelet coefficients < f , ψ j , k > = f ( x ) ψ j , k ( x ) dx and construct the wavelet series � f ( x ) = � < f , ψ j , k > ψ j , k ( x ) j , k In practice, given data Y = ( y 1 , . . . , y n ) , use the DWT to calculate approximated wavelet coefficients DWT uses fast recursive filtering. Use matrix notation for convenience Z = W Y Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 5) ABS13-Italy 06/17-21/2013 14 / 56

  15. Covariance matrix of wavelet coefficients i = 1 , . . . , n , with n = 2 m y i = f ( t i ) , DWT: Y DWT → d ( Y ) = W Y , W determined by ψ , W ′ W = I − Variances and covariances of DWT coefficients: Σ Y W ′ Σ Σ d ( Y ) = W Σ Σ Σ for given Σ Y ( i , j ) = [ γ ( | i − j | )] γ ( τ ) autocovariance function of the process generating the data V ANNUCCI & C ORRADI ( JRSS, Series B , 1999). Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 5) ABS13-Italy 06/17-21/2013 15 / 56

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend