Modelling covariance kernels for nonstationary random fields - PowerPoint PPT Presentation

Modelling covariance kernels for nonstationary random fields Christopher G. Small University of Waterloo University of Guelph, October 2007 0-0

1. Random fields and covariance kernels 2. The role of covariance kernels in semiparametric inference 3. The Karhunen-Lo` eve expansion 4. The estimation problem reconsidered 5. An application 1

1. Random fields and covariance kernels 2

� For a random sample, inference about the mean µ of the � Moreover, heteroscedasticity is a problem for the optimal variates depends upon knowledge or estimation of the variance σ 2 . In practice the variance is harder to estimate than the mean. � For a random field X ( t ), inference about the mean function estimation of µ . To optimally estimate µ we must model or estimate the variance function. � When Γ is unknown we need to estimate it. What methods are µ ( t ) requires knowledge or estimation of the covariance kernel Γ( s , t ) = Cov[ X ( s ) , X ( t ) ]. available when the process is not second order stationary, i.e. , when Γ( s , t ) � = γ ( s − t )? 3

� By a random field we shall mean a family of random � In practice, we only observe a finite “piece” of this random variables X ( t ) indexed by some parameter t ∈ R q . � When R is a countable set—finite or denumerable, usually a field. So we shall assume that t lies in some bounded subset R of R q . � When R is an open subset of R q then X ( t ) is said to be a lattice—then we say that X ( t ) is a discrete random field . continuous random field . 4

Stochastic processes X ( t ) , t ≥ 0 are random fields .... 5

Example: Lynx Pelt Prices, HBC 1857-1911. Elton & Nicholson (1942). 6

140 120 100 80 60 40 20 0 1850 1860 1870 1880 1890 1900 1910 1920 7

Random sets are also special cases where X ( t ) ∈ { 0 , 1 } .... 8

Example: Two-dimensional random set. Integrated circuit data, Mallory et al. (1983). 9

20 18 16 14 12 10 8 6 4 2 2 4 6 8 10 12 14 16 18 20 22 24 10

2. Role of covariance kernels in semiparametric inference 11

� Let E [ X ( t )] = µ θ ( t ) and Cov[ X ( s ) , X ( t )] = Γ θ ( s , t ) � Both µ θ and Γ θ are assumed to be known real-valued functions be the mean function and covariance kernel respectively, where s , t ∈ R and θ ∈ R k . of the unknown parameter θ ∈ R k . 12

� With these semiparametric assumptions, θ can be estimated by a linear functional estimating equation of the form L ( X ; � θ ) = 0 where � L ( X ; θ ) = [ X ( t ) − µ θ ( t )] d A θ ( t ) , R where A θ is a vector-valued measure on R taking values in R k . 13

� For a discrete random field where t typically lies in a lattice, this reduces to an estimating function of the form � For a continuous random field , typically d A θ ( t ) = a θ ( t ) d t , � L ( X ; θ ) = a θ ( t ) [ X ( t ) − µ θ ( t )] . t ∈R � In both cases, a θ : R q → R k is a vector-valued coefficient so that � L ( X ; θ ) = a θ ( t ) [ X ( t ) − µ θ ( t )] d t . � In this talk, we will emphasize the continuous case. However, R function which is functionally independent of X . most remarks apply with appropriate modification to other types of random fields. 14

� The optimal estimating function is that which has a vector-valued measure A θ satisfying � Γ θ ( s , t ) d A θ ( s ) = ˙ µ θ ( t ) . R where ˙ µ θ ( t ) is the vector-valued partial derivative of µ θ ( t ) with respect to θ . 15

� Problem 1. Note that the equation for A θ must be solved for There are two problems with implementing this optimal solution: each value of the parameter θ , iteratively used within any algorithm that solves the equation L ( � θ ) = 0 . – For example, when θ ∈ R 2 , a discrete random field on a 20 × 20 lattice requiring as little as ten iterations over θ , � Problem 2. In practice, we do not know Γ θ . This must will need the solution to 800 simultaneous non-sparse linear equations, ten successive times in a row, just to produce a single approximation to � θ . usually be estimated as well!! 16

3. The Karhunen-Lo` eve expansion 17

� The solution to both of these problems can be obtained using � Let b 1 ( t ) , b 2 ( t ) , . . . be the set of eigenfunctions for the the Karhunen-Lo` eve expansion . kernel Γ satisfying � b j ( s )Γ( s , t ) d s = σ 2 j b j ( t ) R for j = 1 , 2 , . . . . Here, the parameter θ is suppressed in the � Since Γ is positive definite, the eigenvalues will be also be notation for simplicity. Since Γ is symmetric, the eigenfunctions b j can be chosen to be real and � Provided that the kernel function Γ is complete , the set of orthonormal . positive. So we can write the j th eigenvalue as σ 2 j . standardised eigenfunctions of Γ will form an orthonormal basis for L 2 ([ R ]). 18

� Using the completeness condition, we may write � ∞ X ( t ) = Y j b j ( t ) , j =1 where Y 1 , Y 2 , . . . satisfy � Let E ( Y j ) = µ j for all j . � � We have Var( Y j ) = σ 2 Y j = X ( t ) b j ( t ) d t . R � We will also need j . � ∞ µ ( t ) = ˙ µ j b j ( t ) ˙ j =1 � ˙ where ˙ µ j = µ ( t ) b j ( t ) d t . 19

� Writing out X in terms of the Karhunen-Lo` eve expansion, we obtain an equivalent expression for L ( θ ), namely � ∞ σ − 2 L ( θ ) = j ( θ ) ˙ µ j ( θ ) [ Y j ( θ ) − µ j ( θ )] , j =1 which is a rather standard looking quasi-likelihood equation , with the exception that the random variables Y j are also functions of the parameter θ . 20

4. The estimation problem reconsidered 21

� We need only sum the first few terms of the K.-L. expansion. Proposed solution to Problem 1: � Instead, choose Y ∗ Since � j σ 2 j < ∞ , we choose terms with the most significant leading eigenvalues. Say, the first m terms. j = Y j ( θ ∗ ), where θ ∗ is some simple consistent approximation to θ —possibly, but not necessarily an estimator. However, consider θ ∗ as fixed, not random. 22

� Reduce the problem of estimating θ , to that of estimation given Y ∗ 1 , Y ∗ 2 , . . . , Y ∗ m as data . The GEE has the form m � � � j ( θ ) ] − 2 ˙ [ σ ∗ µ ∗ Y ∗ j − µ ∗ j ( θ ) j ( θ ) = 0 . j =1 where µ ∗ j ( θ ) = E θ ( Y ∗ σ ∗ j ( θ ) = Var θ ( Y ∗ j ) , j ) , and j = ∂ µ ∗ ∂ θ µ ∗ ˙ j ( θ ) . 23

� If the covariance kernel Γ is an unknown function of θ , we can Proposed solution to Problem 2: � This is often done by assuming that Γ( s , t ) = γ ( s − t ). � But such a stationarity assumption is estimate it directly. – artificial if µ ( t ) is not constant; � An alternative is to use a working product kernel with – requires special constraints on γ to make Γ nonnegative definite. unknown coefficients ........ 24

5. An application 25

Example: Lynx Pelt Prices (Continued). 26

140 120 100 80 60 40 20 0 1850 1860 1870 1880 1890 1900 1910 1920 27

� Lynx populations rose and fell on a 10 year cycle. 28

� The prices looks stationary up to 1899. � There is also the 10-year oscillation in the lynx population Lynx Pelt Prices (Continued). which may have influenced lynx pelt prices. This 10-year cycle � Stationarity appears to make sense. of the lynx population can be explained by the predator-prey equations for the populations of lynx and its main prey, the � However ......... snowshoe rabbit . 29

� By 1900 and after, prices increased dramatically. This is Lynx Pelt Prices (Continued). � “The smallpox , killing off a large fraction of the Indian associated with reduced catches of lynx. population, accounts for the greatly reduced catches of the fifteen years that followed [the years 1878 to 1890].” – Elton and Nicholson (1942). 30

� It is always dangerous to assume stationarity for � Unlike the predator-prey relationships that govern the 10-year socio-historical data. cycle of the lynx and the snowshoe rabbit, socio-historical data are influenced by time-irreversible historical events. 31

� What can we deduce without assuming stationarity? � We propose a working covariance kernel ........ 32

� Let b 1 ( t ) , . . . , b m ( t ) be orthonormal functions. � We fit a covariance kernel of the form � The eigenvectors � � We choose the functions b j by using a mathematically σ 2 σ 2 Γ( s , t ) = � 1 b 1 ( s ) b 1 ( t ) + · · · + � m b m ( s ) b m ( t ) . σ 2 j are estimated from the data. � The class of covariance kernels so defined can be called tractable class of functions, e.g., trigonometric functions (which arise from the Laplacian kernel for example). working product kernels . 33

� Estimate µ ( t ) by some “rough” estimate � Fitting of Eigenvalues: � as � µ ( t ), such as a � Set moving average of X ( t ), or µ ( t ) = µ θ ∗ ( t ) if this information is available. �� 2 σ 2 � [ X ( t ) − � j = µ ( t )] b j ( t ) d t . R 34

� Let us perform a nonparametric fit to the covariance kernel of Lynx Pelt Prices (Continued). � We need to choose some sensible basis functions........ � The trigonometric functions can form an orthonormal basis for the lynx pelt data. the interval: 35

0.2 0.1 0 1860 1870 1880 1890 t -0.1 -0.2 36

Modelling covariance kernels for nonstationary random fields - PowerPoint PPT Presentation

Modelling covariance kernels for nonstationary random fields Christopher G. Small University of Waterloo University of Guelph, October 2007 0-0 1. Random fields and covariance kernels 2. The role of covariance kernels in semiparametric

Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or

Lecture 14 Covariance Functions 3/08/2018 1 More on Covariance Functions 2 Nugget Covariance

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

Uncertainty in Eddy Sources of Random Error Random Errors: . . . Covariance Measurements:

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

Resampling for nonstationary stochastic models Jacek Le skow Anna Dudek ukasz Lenart

State estimation approach to nonstationary Introduction inverse problems State estimation

Scalable Machine Learning 6. Kernels Alex Smola Yahoo! Research and ANU

SVM Kernels COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning SVM Kernels 1 /

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Random Walks, Random Fields, and Graph Kernels John Lafferty School of Computer Science

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Combining Kernels for Classification Doctoral Thesis Seminar Darrin P . Lewis

Modelling nonstationary signals using stochastic and nonstochastic approach Jacek Lekow

Power Tuning Linux: A Case Study Alexandra Yates alexandra.yates@intel.com

Data Backup for Mobile Nodes : a Cooperative Middleware and an Experimentation Platform

Workshop Facilitators Larry Shuman, Eric Hamilton, University of Pittsburgh Pepperdine

Emma Enix, Jaleel Rogers and Caden Walker Improving Transportation Through Technology Team

NLP for the Web / Tools Yves Petinot Columbia University February 4th, 2010 Yves Petinot

Structure of polyzetas and the algorithms to express them on algebraic bases on words Grard

Towards XML-oriented Internet Management Torsten Klie Frank Strau tklie@ibr.cs.tu-bs.de

Performance Andre Ryll, B.Eng. Content TCP Basics revisited Facts TCP Header TCP