Smoothing Applications for Irregular Time Series with Measurement - - PowerPoint PPT Presentation

smoothing applications for irregular time series with
SMART_READER_LITE
LIVE PREVIEW

Smoothing Applications for Irregular Time Series with Measurement - - PowerPoint PPT Presentation

Smoothing Applications for Irregular Time Series with Measurement Errors Jonathan Rathjens, Eva Becker, Arthur Kolbe, Katharina Olthoff, Michael Wilhelm, Katja Ickstadt, and Jrgen Hlzer 2 December 2016 Introduction 1 Data 2 Model


slide-1
SLIDE 1

Smoothing Applications for Irregular Time Series with Measurement Errors

Jonathan Rathjens, Eva Becker, Arthur Kolbe, Katharina Olthoff, Michael Wilhelm, Katja Ickstadt, and Jürgen Hölzer

2 December 2016

slide-2
SLIDE 2

1

Introduction

2

Data

3

Model Development Regression Kernel Smoothing Categorization

4

Results Regression Kernel Smoothing

5

Conclusions and Outlook

Rathjens et al., 2 December 2016

Introduction

1/30

slide-3
SLIDE 3

Epidemiological Background

PFASs

per- and polyfluoroalkyl substances lead substances:

perfluorooctanoic acid (PFOA) perfluorooctane sulphonic acid (PFOS)

ubiquitous in industrial and household products persistent; accumulate in organisms ◮ internal exposure of general population

Importance of Drinking Water

food as most important source of human exposure to PFASs contaminated drinking water predominant ◮ surrogate marker for internal exposure

Rathjens et al., 2 December 2016

Introduction

2/30

slide-4
SLIDE 4

Occasion

Contamination in North Rhine-Westphalia (NRW)

drinking water rivers Ruhr and Möhne affected by PFASs-polluted fertilizer prior to summer 2006 (very) high concentrations measured at water supply stations downstream motivated human biomonitoring studies

Rathjens et al., 2 December 2016

Introduction

3/30

slide-5
SLIDE 5

Long-Term Objectives

modelling state-wide PFASs exposure both temporal and spatial find regions and time periods of increased exposure use as explanatory variable for spatio-temporal NRW birth data explore/infer possible dependencies

Current

PFASs measurements from water supply stations and areas: irregular sampled, non-stationary time series find realistic estimation of mean- and var-function

regression on time interpolation, smoothing extrapolation

Rathjens et al., 2 December 2016

Introduction

4/30

slide-6
SLIDE 6

1

Introduction

2

Data

3

Model Development Regression Kernel Smoothing Categorization

4

Results Regression Kernel Smoothing

5

Conclusions and Outlook

Rathjens et al., 2 December 2016

Data

5/30

slide-7
SLIDE 7

Drinking Water PFASs Data

provided by NRW state environmental agency LANUV drinking water samples from water supply stations and network

Spatial Structure

stations (data from ca. 250 of 650 stations) water supply areas (data from ca. 200 of 450 areas) complex assignment rivers

Temporal Structure

irregular time series (ca. one value per month or less) from summer 2006; partially ongoing

Rathjens et al., 2 December 2016

Data

6/30

slide-8
SLIDE 8

Maximum PFOA 2006–2014 [ng/l]

Rathjens et al., 2 December 2016

Data

7/30

slide-9
SLIDE 9

Characteristics

irregular sampling non-stationary different patterns need for extrapolation values < LoQ (varying) extremely high values (possible) change points

Rathjens et al., 2 December 2016

Data

8/30

slide-10
SLIDE 10

Measurement Errors

additional variability depending on scale due to serial dilution in chemical analysis coefficient of variation (cv) ca. 20%

Rathjens et al., 2 December 2016

Data

9/30

slide-11
SLIDE 11

1

Introduction

2

Data

3

Model Development Regression Kernel Smoothing Categorization

4

Results Regression Kernel Smoothing

5

Conclusions and Outlook

Rathjens et al., 2 December 2016

Model Development

10/30

slide-12
SLIDE 12

Observations and Goal

measurements (xi)i=1,...,n, positive at times t1 ≤ . . . ≤ tn, unequally spaced estimate process X = f(t) (or distribution / posterior predictive) for arbitrary time t

Time-Dependent Regressions

simple regression, e.g. ln(x) = b0 + b1t + ǫ, too restrictive segmented according to change points (if known) P-splines

Rathjens et al., 2 December 2016

Model Development: Regression

11/30

slide-13
SLIDE 13

Conjugate Γ-Γ-Model

Xt|βt ∼ Γ(α, βt) for arbitrary t fixed α representing measurement error:

cv = √

Var(X) E(X)

= √

α/β2 α/β

=

1 √α

e.g. cv = 0.2 ⇒ α = 25

prior βt ∼ Γ(θt, ηt) alternative to log N-Model (no transformation)

Rathjens et al., 2 December 2016

Model Development: Kernel Smoothing

12/30

slide-14
SLIDE 14

„Weighted Posterior“

θt → θt + αn n

i=1 wi

ηt → ηt + n n

i=1 xiwi

Weights

wi ∈ [0, 1] with w := n

i=1 wi ∈ [0, 1]

from kernel, e.g. Gaussian: wi = fN(t,δ2)(ti) w ց 0: no informative data, retain prior w ր 1: data „almost everywhere“, usual Γ-Γ-update with ˜ xi := nwixi smoothing parameter δ depending on t-scale/resolution

Rathjens et al., 2 December 2016

Model Development: Kernel Smoothing

13/30

slide-15
SLIDE 15

Prior Choices

empiral Bayes from whole sample vague, e.g. with prior predictive ∈ [0, 1000] informative:

high values prior to 2006 (for 2 to 4 years) afterwards decrease

sequential: adjacent time’s posterior

Rathjens et al., 2 December 2016

Model Development: Kernel Smoothing

14/30

slide-16
SLIDE 16

Categorical Data

simplify xi’s to ordinal categories such as „not detected“, „low“, „increased“, „very high“ natural incorporation of values < LoQ magnitude less likely to be „false“ then value with measurement error useful as an epidemiological predictor ◮ model risk/rate of, e.g., increased or very high values for a period of time

Future Approaches

(cumulative) probit or logit P-splines simple case: binary regression use known change points

Rathjens et al., 2 December 2016

Model Development: Categorization

15/30

slide-17
SLIDE 17

1

Introduction

2

Data

3

Model Development Regression Kernel Smoothing Categorization

4

Results Regression Kernel Smoothing

5

Conclusions and Outlook

Rathjens et al., 2 December 2016

Results

16/30

slide-18
SLIDE 18

Simple Regression

Rathjens et al., 2 December 2016

Results: Regression

17/30

slide-19
SLIDE 19

Spline Regression

Rathjens et al., 2 December 2016

Results: Regression

18/30

slide-20
SLIDE 20

Empirical Bayes Prior

Rathjens et al., 2 December 2016

Results: Kernel Smoothing

19/30

slide-21
SLIDE 21

Vague Prior with Large δ

Rathjens et al., 2 December 2016

Results: Kernel Smoothing

20/30

slide-22
SLIDE 22

Informative Prior

Rathjens et al., 2 December 2016

Results: Kernel Smoothing

21/30

slide-23
SLIDE 23

Sequential Prior with Uniform Kernel

Rathjens et al., 2 December 2016

Results: Kernel Smoothing

22/30

slide-24
SLIDE 24

1

Introduction

2

Data

3

Model Development Regression Kernel Smoothing Categorization

4

Results Regression Kernel Smoothing

5

Conclusions and Outlook

Rathjens et al., 2 December 2016

Conclusions and Outlook

23/30

slide-25
SLIDE 25

Findings

sufficient (local) fit to present data no useful extrapolation without specific prior knowledge difficult global modelling individual solution for each series preferable

Evaluation

non-parametric solutions to incorporate changes and nonlinear trends parametric regression useful if change points known kernel smoother able to include local prior information, if available

sequential prior weighting down important maxima too little variance for data ց 0

Rathjens et al., 2 December 2016

Conclusions and Outlook

24/30

slide-26
SLIDE 26

Model Enhancement

estimate Γ-parameter α (globally, restricted) distinguish error- and process-related variability tune smoothing parameter δ

respect „data density“ in time locally adapted appropriate for chosen prediction interval (daily, monthly, . . . )

asymmetric weighting, e.g. f(t − a) < f(t + a) several series:

Rathjens et al., 2 December 2016

Conclusions and Outlook

25/30

slide-27
SLIDE 27

Supply Stations and Areas

supply from the Ruhr

  • ne station may supply several

areas

  • ne area may be supplied by

several stations different periods observed measurements from stations and network:

Rathjens et al., 2 December 2016

Conclusions and Outlook

26/30

slide-28
SLIDE 28

Network Samples

important for estimation of areas’ contamination verification of single station models unknown water source possibly mixture of stations’ waters, e.g.: X(t) = π1X1(t) + π2X2(t) + π3X3(t)

Rathjens et al., 2 December 2016

Conclusions and Outlook

27/30

slide-29
SLIDE 29

Spatial Dependence

water from contaminated rivers, esp. the Ruhr

  • ther spatial processes (e.g., for groundwater)

modelling X = f(t, s) two-dimensional Ruhr models (river × time) discrete space: supply areas (use, e.g., GMRF)

Rathjens et al., 2 December 2016

Conclusions and Outlook

28/30

slide-30
SLIDE 30

Modelling Internal Exposure

extremely slow decrease after exposure very weak effects of random fluctuations What is important?

find times and values of exposure peaks (short-term) model the sum of subsequent exposures correctly (no loss by averaging, long-term) background exposure (equilibrium, long-term)

Rathjens et al., 2 December 2016

Conclusions and Outlook

29/30

slide-31
SLIDE 31

Thanks to . . .

the NRW state environmental agency LANUV for providing water data Stiftung Mercator for funding our work all co-workers and participants

Rathjens et al., 2 December 2016

Acknowledgements

30/30