Characterizing uncertainty of the exposure point concentration based - - PowerPoint PPT Presentation

characterizing uncertainty of the exposure point
SMART_READER_LITE
LIVE PREVIEW

Characterizing uncertainty of the exposure point concentration based - - PowerPoint PPT Presentation

1 Characterizing uncertainty of the exposure point concentration based on left-censored data Niloofar Shoari PhD candidate Jean-Sbastien Dub, Ph.D Department of Construction Engineering 27 April 2016 Montreal, Canada 2 What are


slide-1
SLIDE 1

Characterizing uncertainty of the exposure point concentration based on left-censored data

1

Niloofar Shoari PhD candidate Jean-Sébastien Dubé, Ph.D Department of Construction Engineering

27 April 2016 Montreal, Canada

slide-2
SLIDE 2

What are left-censored data?

2

24 18 12 6 20 15 10 5

As Concentration (mg/kg) Frequency

Max Detection limit =1.3

As concentration data: <0.7, <0.7, <0.7, <0.9, <0.9, <1.3, <1.3, 1.8, 2.1, 2.2, …, 13, 14, 14, 14, 15, 18, 19, 21, 24

slide-3
SLIDE 3

Left-censored observation are real data

3

<DL

slide-4
SLIDE 4

Data uncertainty comes from a variety

  • f sources:

4

Sampling uncertainty: e.g., inherent heterogeneity, improper collection and handling of samples (outside scope). Analytical uncertainty (outside scope). Data management uncertainty: uncertainty associated with data sets and the importance of statistics comes into play.

slide-5
SLIDE 5

Various sources of uncertainty

5

Sampling Analytical Data management

Field sample Sampling Field subsample

www.greenskeeperlawncare.com

Laboratory analysis

slide-6
SLIDE 6

Methods for estimating the EPC based

  • n left-censored data

6

The Substitution method (e.g., DL/2) The Kaplan-Meier method(K-M)

  • Non-parametric: No need to assume a parametric distribution

for concentration data;

The Maximum likelihood method

  • Parametric: Assuming a parametric distribution;
  • lognormal, Weibull, and gamma

The Regression on order statistics (ROS)

  • Assuming a parametric distribution for uncensored data and

predicting censored values;

  • rROS (lognormal), GROS (gamma).
slide-7
SLIDE 7

Recommendations of previous simulation studies

7

Substitution provides biased estimates (Helsel, 2006). KM performs well for <50% censoring (Antweiler, 2007). MLE performs well when sample size is >50 (Helsel, 2012).

  • MLE(lognormal) has optimization problem in highly-skewed

data with small to medium sample size (Shoari et al. 2015) rROS and GROS seem to be robust to distribution (Helsel, 1986, Shoari et al. 2015) misspecification.

slide-8
SLIDE 8

8

Basics of bootstrap

Real world Bootstrap world

slide-9
SLIDE 9

Data-based simulation used to quantify the uncertainty in the mean estimates

9

n

x x x ,..., ,

2 1 x

Sample

µ

Real world Bootstrap world

) 1 ( ) 1 ( 2 ) 1 ( 1

,..., ,

n

x x x

) 2 ( ) 2 ( 2 ) 2 ( 1

,..., ,

n

x x x

) 1000 ( ) 1000 ( 2 ) 1000 ( 1

,..., ,

n

x x x

) 2 (

x

) 1 (

x

) 1000 (

x

) 3 ( ) 3 ( 2 ) 3 ( 1

,..., ,

n

x x x

) 3 (

x

slide-10
SLIDE 10

Description of data

10

Concentrations of soil samples collected for characterization of a Brownfield site in Montreal. Sample were collected between 1998 and 2009 from a total of 242 boreholes dispersed over the site. Concentrations of 15 metals and 22 polycyclic aromatic hydrocarbons (PAH) Concentration data are characterized by left-censored

  • bservations.
slide-11
SLIDE 11

11

Scenario 1)

Large sample size Small censoring percent Low skewed Comparable estimates of uncertainty.

Contaminant n Censoring % CV Cobalt 409 ¡ 31% ¡ 0.6 ¡

slide-12
SLIDE 12

12

Scenario 2)

Large sample size Medium censoring percent Highly skewed

Contaminant n Censoring % CV Benzo(a)pyrene 517 ¡ 51% ¡ 5.4 ¡

Still similar estimates of uncertainty.

slide-13
SLIDE 13

13

Scenario 3)

Large sample size Highly skewed High censoring percent

Contaminant n Censoring % CV Fluorene 517 ¡ 63% ¡ 5.6 ¡

Inflated uncertainty of the mean estimates obtained by MLE (lognormal)

slide-14
SLIDE 14

14

Decrease in sample size leads to

  • verestimation of

uncertainty by MLE (lognormal).

Scenario 4)

slide-15
SLIDE 15

Some examples:

15

Contaminant MLE (lognormal) MLE (Weibull) MLE (gamma) KM rROS GROS Cobalt 8.23±6% 8.15±6% 8.22±7% 8.28±7% 8.32±7% 8.26±7% Arsenic 9.30±18% 8.05±13% 7.90±13% 7.88±24% 8.53±13% 7.20±16% Chrome 16.67±8% 17.04±10% 17.11±11% 16.77±10% 16.92±11% 16.98±11% Benzo.a.pyrene 1.08±49% 0.88±39% 1.25±48% 1.27±47% 1.26±47% 1.24±48% Fluorene 1.86±67% 0.93±44% 1.02±49% 1.04±48% 1.03±48% 1.01±49% Naphtalene 0.83±51% 0.74±63% 1.27±45% 1.29±63% 1.28±63% 1.26±64%

Mean ± uncertainty percent

slide-16
SLIDE 16

Lessons learned:

16

Some amount of uncertainty is caused by left-censored concentration data. In the case of large concentration data, uncertainty of all methods is comparable. Practitioners are cautioned about using the MLE method under lognormal assumption when

  • Concentration data are highly skewed;
  • Sample size is small;
  • Censoring percent is large.
slide-17
SLIDE 17

Our recommendation

17

Appropriate use of the MLE method depends on the sample size and our knowledge about the distribution

  • f concentration data.

The methods of rROS, GROS, and KM generally perform well because

  • robust to data skewness;
  • robust to sample size;
  • robust to censoring percent.
slide-18
SLIDE 18

Reference

18 Antweiler, R.C. and Taylor, H.E., 2008. Evaluation of statistical treatments of left- censored environmental data using coincident uncensored data sets: I. Summary

  • statistics. Environmental science & technology, 42(10):3732-3738.

Gilliom, R. J.; Helsel, D. R. 1986. Estimation of distributional parameters for censored trace level water quality data 1. Estimation techniques. Water Resour. Res. 22, 135-146. Helsel, D. R. 2006 Fabricating data: How substituting values for nondetects can ruin results, and what can be done about it. Chemosphere, 65:2434 -2439 Helsel, D. R. Statistics for censored environmental data using Minitab and R; John Wiley & Sons, 2012; Vol. 77. Shoari N, Dubé J-S, Chenouri S. 2015. Estimating the mean and standard deviation of environmental data with below detection limit observations: Considering highly skewed data and model misspecification. Chemosphere 138: 599-608