Characterizing uncertainty of the exposure point concentration based - - PowerPoint PPT Presentation

▶

Jan 11, 2024 571 likes •768 views

1 Characterizing uncertainty of the exposure point concentration based on left-censored data Niloofar Shoari PhD candidate Jean-Sbastien Dub, Ph.D Department of Construction Engineering 27 April 2016 Montreal, Canada 2 What are

SLIDE 1

Characterizing uncertainty of the exposure point concentration based on left-censored data

Niloofar Shoari PhD candidate Jean-Sébastien Dubé, Ph.D Department of Construction Engineering

27 April 2016 Montreal, Canada

SLIDE 2

What are left-censored data?

24 18 12 6 20 15 10 5

As Concentration (mg/kg) Frequency

Max Detection limit =1.3

As concentration data: <0.7, <0.7, <0.7, <0.9, <0.9, <1.3, <1.3, 1.8, 2.1, 2.2, …, 13, 14, 14, 14, 15, 18, 19, 21, 24

SLIDE 3

Left-censored observation are real data

<DL

SLIDE 4

Data uncertainty comes from a variety

f sources:

Sampling uncertainty: e.g., inherent heterogeneity, improper collection and handling of samples (outside scope). Analytical uncertainty (outside scope). Data management uncertainty: uncertainty associated with data sets and the importance of statistics comes into play.

SLIDE 5

Various sources of uncertainty

Sampling Analytical Data management

Field sample Sampling Field subsample

www.greenskeeperlawncare.com

Laboratory analysis

SLIDE 6

Methods for estimating the EPC based

n left-censored data

The Substitution method (e.g., DL/2) The Kaplan-Meier method(K-M)

Non-parametric: No need to assume a parametric distribution

for concentration data;

The Maximum likelihood method

Parametric: Assuming a parametric distribution;
lognormal, Weibull, and gamma

The Regression on order statistics (ROS)

Assuming a parametric distribution for uncensored data and

predicting censored values;

rROS (lognormal), GROS (gamma).

SLIDE 7

Recommendations of previous simulation studies

Substitution provides biased estimates (Helsel, 2006). KM performs well for <50% censoring (Antweiler, 2007). MLE performs well when sample size is >50 (Helsel, 2012).

MLE(lognormal) has optimization problem in highly-skewed

data with small to medium sample size (Shoari et al. 2015) rROS and GROS seem to be robust to distribution (Helsel, 1986, Shoari et al. 2015) misspecification.

SLIDE 8

Basics of bootstrap

Real world Bootstrap world

SLIDE 9

Data-based simulation used to quantify the uncertainty in the mean estimates

x x x ,..., ,

2 1 x

Sample

µ

Real world Bootstrap world

) 1 ( ) 1 ( 2 ) 1 ( 1

,..., ,

x x x

) 2 ( ) 2 ( 2 ) 2 ( 1

,..., ,

x x x

) 1000 ( ) 1000 ( 2 ) 1000 ( 1

,..., ,

x x x

) 2 (

x

) 1 (

x

) 1000 (

x

) 3 ( ) 3 ( 2 ) 3 ( 1

,..., ,

x x x

) 3 (

x

SLIDE 10

Description of data

Concentrations of soil samples collected for characterization of a Brownfield site in Montreal. Sample were collected between 1998 and 2009 from a total of 242 boreholes dispersed over the site. Concentrations of 15 metals and 22 polycyclic aromatic hydrocarbons (PAH) Concentration data are characterized by left-censored

bservations.

SLIDE 11

Scenario 1)

Large sample size Small censoring percent Low skewed Comparable estimates of uncertainty.

Contaminant n Censoring % CV Cobalt 409 ¡ 31% ¡ 0.6 ¡

SLIDE 12

Scenario 2)

Large sample size Medium censoring percent Highly skewed

Contaminant n Censoring % CV Benzo(a)pyrene 517 ¡ 51% ¡ 5.4 ¡

Still similar estimates of uncertainty.

SLIDE 13

Scenario 3)

Large sample size Highly skewed High censoring percent

Contaminant n Censoring % CV Fluorene 517 ¡ 63% ¡ 5.6 ¡

Inflated uncertainty of the mean estimates obtained by MLE (lognormal)

SLIDE 14

Decrease in sample size leads to

verestimation of

uncertainty by MLE (lognormal).

Scenario 4)

SLIDE 15

Some examples:

Contaminant MLE (lognormal) MLE (Weibull) MLE (gamma) KM rROS GROS Cobalt 8.23±6% 8.15±6% 8.22±7% 8.28±7% 8.32±7% 8.26±7% Arsenic 9.30±18% 8.05±13% 7.90±13% 7.88±24% 8.53±13% 7.20±16% Chrome 16.67±8% 17.04±10% 17.11±11% 16.77±10% 16.92±11% 16.98±11% Benzo.a.pyrene 1.08±49% 0.88±39% 1.25±48% 1.27±47% 1.26±47% 1.24±48% Fluorene 1.86±67% 0.93±44% 1.02±49% 1.04±48% 1.03±48% 1.01±49% Naphtalene 0.83±51% 0.74±63% 1.27±45% 1.29±63% 1.28±63% 1.26±64%

Mean ± uncertainty percent

SLIDE 16

Lessons learned:

Some amount of uncertainty is caused by left-censored concentration data. In the case of large concentration data, uncertainty of all methods is comparable. Practitioners are cautioned about using the MLE method under lognormal assumption when

Concentration data are highly skewed;
Sample size is small;
Censoring percent is large.

SLIDE 17

Our recommendation

Appropriate use of the MLE method depends on the sample size and our knowledge about the distribution

f concentration data.

The methods of rROS, GROS, and KM generally perform well because

robust to data skewness;
robust to sample size;
robust to censoring percent.

SLIDE 18

Reference

18 Antweiler, R.C. and Taylor, H.E., 2008. Evaluation of statistical treatments of left- censored environmental data using coincident uncensored data sets: I. Summary

statistics. Environmental science & technology, 42(10):3732-3738.

Gilliom, R. J.; Helsel, D. R. 1986. Estimation of distributional parameters for censored trace level water quality data 1. Estimation techniques. Water Resour. Res. 22, 135-146. Helsel, D. R. 2006 Fabricating data: How substituting values for nondetects can ruin results, and what can be done about it. Chemosphere, 65:2434 -2439 Helsel, D. R. Statistics for censored environmental data using Minitab and R; John Wiley & Sons, 2012; Vol. 77. Shoari N, Dubé J-S, Chenouri S. 2015. Estimating the mean and standard deviation of environmental data with below detection limit observations: Considering highly skewed data and model misspecification. Chemosphere 138: 599-608