AB Introduction Functional data occurs for example in time series - - PowerPoint PPT Presentation

ab
SMART_READER_LITE
LIVE PREVIEW

AB Introduction Functional data occurs for example in time series - - PowerPoint PPT Presentation

Introduction to Functional Data Analysis Elia Liiti ainen ( eliitiai@cc.hut.fi ) Time Series Prediction Group Adaptive Informatics Research Centre Helsinki University of Technology, Finland January 30, 2007 AB Introduction Functional data


slide-1
SLIDE 1

Introduction to Functional Data Analysis

Elia Liiti¨ ainen (eliitiai@cc.hut.fi)

Time Series Prediction Group Adaptive Informatics Research Centre Helsinki University of Technology, Finland

January 30, 2007

AB

slide-2
SLIDE 2

AB

Introduction

Functional data occurs for example in time series analysis, chemometry and econometry. In many cases the amount of samples available is small. Taking the structure of the inputs into account improves results of statistical inference. FDA is a framework that provides tools for this purpose.

2 / 22

slide-3
SLIDE 3

AB

Outline

1 General Considerations 2 Correlation analysis 3 Interpolation

3 / 22

slide-4
SLIDE 4

AB

Goal of Functional Data Analysis

Exploratory data analysis: Data provides new information and sheds light on known features. Confirmatory analysis: Hypothesis testing. Prediction: Prediction of the future.

4 / 22

slide-5
SLIDE 5

AB

Functional Data

Real world phenomena are usually continuous at small enough time scale. The worst-case dimension of functional data is infinite (white noise). For smooth functions with bounded derivative the instrinsic dimension is finite. Typically for smooth functions the practical dimension is 10-20.

5 / 22

slide-6
SLIDE 6

AB

Noise

Typically in function data there is noise. In mathematical terms xi(t) = yi(t) + ǫ(t). (1) To make things worse, often Cov(ǫ(t2), ǫ(t1)) = 0 for t2 = t1.

6 / 22

slide-7
SLIDE 7

AB

Data Representation

The form of the curve is important. The first step in FDA is transformation of the inputs to remove noise. Basic tools include smoothing and interpolation.

7 / 22

slide-8
SLIDE 8

AB

Derivatives

Derivatives are important. Numerical differentiation amplifies noise. Interpolation or smoothing helps in this regard.

8 / 22

slide-9
SLIDE 9

AB

Covariance and Variance Functions

{xi(t)}N

i=1 is a sample of functions.

Mean: ¯ x(t) = N−1

N

  • i=1

xi(t). (2) Variance function: varX(t) = (N − 1)−1

N

  • i=1

[xi(t) − ¯ x(t)]2. (3) Covariance Function covX(t1, t2) = (N − 1)−1

N

  • i=1

{xi(t1) − ¯ xi(t1)}{xi(t2) − ¯ xi(t2)}. (4)

9 / 22

slide-10
SLIDE 10

AB

Correlation

Correlation function: corrX(t1, t2) = covX(t1, t2)

  • varX(t1)varX(t2)

. (5) It is often useful to examine the plot of cross-correlation.

10 / 22

slide-11
SLIDE 11

AB

Cross-correlation

Now we have pairs of functions (xi, yi). Cross-covariance: covX,Y (t1, t2) = (N −1)−1

N

  • i=1

{xi(t1)−¯ x(t1)}{yi(t1)− ¯ y(t1)}. (6) Cross-correlation: corrX(t1, t2) = covX,Y (t1, t2)

  • varX(t1)varY (t2)

. (7)

11 / 22

slide-12
SLIDE 12

AB

Case Study: Tecator Data

240 samples of absorbance spectrums. In addition to the absorbance spectrums we have fat content as output. The cross-correlation with the output can be misleading.

12 / 22

slide-13
SLIDE 13

AB

850 900 950 1000 1050 2 2.5 3 3.5 4 4.5 5 5.5 Wavelength Absorbance 850 900 950 1000 1050 0.2 0.25 0.3 0.35 Wavelength Variance 850 900 950 1000 1050 850 900 950 1000 1050 Wavelength 850 900 950 1000 1050 0.2 0.22 0.24 0.26 0.28 0.3 0.32 Wavelength Cross−correlation

Figure: From left to right: the inputs, the variance function, the correlation function and the cross-correlation with the scalar output.

13 / 22

slide-14
SLIDE 14

AB

Function Basis

A basis is a linearly independent set of function {ωi}∞

i=1 that

spans the function space. Example: the set of monomials {ti}∞

i=0.

Basis expansion: the functional inputs {xi(t)}N

i=1 are

approximated as (for some finite K > 0) xi(t) ≈

K

  • k=1

ckωk(t). (8) The weights are solved by minimizing some cost function.

14 / 22

slide-15
SLIDE 15

AB

Why to use basis expansions?

Dimension reduction. Reduces computational demand in later stages of analysis. Noise removal.

15 / 22

slide-16
SLIDE 16

AB

Fourier Basis

Fourier basis on [0, 1] is {sin 2πjt, cos 2πjt}∞

j=1.

Sometimes good for periodic data. Lack of locality. Computational complexity O(N log N).

16 / 22

slide-17
SLIDE 17

AB

Wavelets

Under some conditions, the functions ψjk(t) = 2j/2ψ(2jt − k) (9) form a basis. Wavelets are local. Fast computation.

17 / 22

slide-18
SLIDE 18

AB

Splines (1)

Consider the interval [0, 1] and the breakpoints τ = {τl}L

l=0

with τ0 = 0 and τL = 1. A spline is piecewise polynomial with degree K. At the breakpoints it is required that the values of the polynomials and derivatives up to K − 1 agree. Thus a spline is K-1 times differentiable. For K = 1, spline is a piecewise linear function.

18 / 22

slide-19
SLIDE 19

AB

Splines (2)

The number of intervals: L. Degrees of freedom: LK − (L − 1)(K − 1) = K + L − 1, (10) that is, the number of interior knots plus the order. It is not necessary to require same smoothnes in all the knots.

19 / 22

slide-20
SLIDE 20

AB

Spline Basis

Splines can be represented using a basis expansion S(t) =

K+L−1

  • k=1

ckBk(t). (11) The basis is not orthonormal the locality being determined by K (complexity grows linearly with respect to the number of data). The coefficients can be used in regression and data analysis.

20 / 22

slide-21
SLIDE 21

AB

5 10 15 20 0.2 0.4 0.6 0.8 1

Order 2 Spline x B(x)

5 10 15 20 0.2 0.4 0.6 0.8 1

Order 3 Spline x B(x)

5 10 15 20 0.2 0.4 0.6 0.8 1

Order 4 Spline x B(x)

5 10 15 20 0.2 0.4 0.6 0.8 1

Order 5 Spline x B(x)

Figure: Spline basis for different orders.

21 / 22

slide-22
SLIDE 22

AB

Conclusion

Functional data occurs in real world. Important tools include correlation plots, derivatives and basis expansions. Removal of noise is needed.

22 / 22