[PPT] - Introducing a zero-modified negative binomial regression for PowerPoint Presentation

SLIDE 1

Introducing a zero-modified negative binomial regression for estimating the effect of chilling on Escherichia coli plate counts from Irish beef carcasses

Dr. Ursula Gonzales Barron
Prof. Francis Butler

Biosystems Engineering, UCD School of Agriculture, Food Science and Vet. Med. University College Dublin, Ireland

SLIDE 2

Introduction

 Traditionally, inferential statistical

analysis of bacterial counts are conducted

n „log10 cfu‟ values.

 Assumption behind: Logarithmic

transformation will induce normality of data which is fundamental for conventional ANOVA.

SLIDE 3

Histogram of frequencies for total viable counts on beef carcasses pre-chill (n=690)

 TVC can

be approxi- mated to normal distribu- tion

X <= 0.051 2.5% X <= 4.846 97.5% 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40

2
1

1 2 3 4 5 6 7

Total viable counts on beef carcasses (Log cfu/cm

2)

Probability density

SLIDE 4

However…

 Depending on the detection frequency of

bacteria, data normality cannot always be achieved.

 In situations where bacteria are not

detected in considerable proportions (>15%), normality cannot be assumed.

SLIDE 5

 Can normal

distribution be assumed? Histogram of frequencies for coliforms on beef carcasses pre-chill (n=690)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

3
2
1

1 2 3 4 5 6 7

Coliforms on beef carcasses (log cfu/cm2) Frequency

SLIDE 6

 Histogram

for E. coli has even a more dramatic shape because the detection frequency is lower

Histogram of frequencies for Escherichia coli on beef carcasses pre-chill (n=690)

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8

3
2
1

1 2 3 4

Coliforms on beef carcasses (log cfu/cm2)

Frequency

SLIDE 7

How to do inferential stats in this type

f data?

 Can log normality be assumed in these

cases?

SLIDE 8

Ways to approach it?

 Transformations to induce normality

 Box-Cox transformation did not work for

coliforms nor E. coli (too skewed!)

 Categorisation of the outcome  loss of

information

 Rank statistics  loss of information

SLIDE 9

Generalised Poisson models?

 Work with discrete data: Log cfu/cm2 

CFU.

 Modifications are done to the baseline

Poisson model to address restrictive equi- dispersion (mean=variance)

!

i Y i i i

Y Exp Y f

i

SLIDE 10

Histogram of frequencies of plate counts for E. coli on beef carcasses pre-chill (CFU)

X <= 31.0 97.5% X <= 0.0 2.5% 0.00 0.04 0.08 0.12 0.16 10 20 30 40 50

E. coli plate counts

Frequency

 Certainly

does not look like a Poisson

SLIDE 11

Generalised Poisson model

 In practice, heterogeneity causes over-

dispersion (variance>>mean)  clustering

 Poisson can be generalised by a dispersion

parameter ε that accommodates the unobserved heterogeneity in the count data.

 If exp(εi) follows a gamma distribution

Γ(1/α, α) . Then GP negative binomial

) exp(

P GP

SLIDE 12

Histogram of frequencies for plate counts

f Escherichia coli

0.00 0.04 0.08 0.12 0.16 0.20 0.24 0.28 0.32 0.36 10 20 30 40 50

E. coli plate counts

Probability

ε may then account for over-dispersion

SLIDE 13

Histogram of frequencies for plate counts

f Escherichia coli

0.00 0.04 0.08 0.12 0.16 0.20 0.24 0.28 0.32 0.36 10 20 30 40 50

E. coli plate counts

Probability

And the extra zero counts? ε may then account for over-dispersion

SLIDE 14

Zero-modified generalised Poisson

 We can hypothesise that for each observation, there

are two possible data generation processes. The result of a Bernoulli trial determines which process is used:



Process 1 only generates zero counts with probability „ω0‟



Process 2 occurs with probability „1- ω0‟ and generates positive counts from a negative binomial

 This is the hurdle negative binomial (HNB)

regression model

SLIDE 15

Hurdle negative binomial model

 Notice the two components  ω0 is the probability of zero count, and is

determined by a logistic model

 ω0=f(covariates), λi=f(covariates)

1 1 1 1 1 Pr

1 1

1 1 1 1 i i Y i Y i i i i i

Y for Y Y Y for Y

i i

SLIDE 16

Methodology

 Group level



Y = CFU pre-chill and post- chill



X = Coded variable: pre- chill (1), post-chill (2)

 Carcass level



Y = CFU post-chill



X = CFU pre-chill

 Two HNB regression models were fitted to E. coli

plate counts from beef carcasses pre-chill and post- chill

i i i

e X exp exp

1 i i i

e X exp exp

1

i

X b b Log

1

i

X b b Log

1

SLIDE 17

Results

Regression parameters Group-level regression model Estimate

St. error

Pr > |t| Neg Bin β 0 (int) β 1 (covariate) Logit b 0 (int) b 1 (covariate) 1.151 0.451

2.131

1.797 0.381 0.160 0.131 0.089 ** ** *** *** Other estimates OR (int) OR (treat) λ1 (pre-chill) λ2 (post-chill) ω0 (pre-chill) ω0 (post-chill) 0.118 6.034 4.965 7.795 0.417 0.812 0.015 0.542 1.606 2.648 0.013 0.011 *** *** ** ** *** ***

Significant logit and NB Prob of zero count Chilling increases odds For (+) counts: pre-chill<post-chill

??

SLIDE 18

Regression parameters Group-level regression model Estimate

St. error

Pr > |t| Neg Bin β 0 (int) β 1 (covariate) Logit b 0 (int) b 1 (covariate) 1.151 0.451

2.131

1.797 0.381 0.160 0.131 0.089 ** ** *** *** Other estimates OR (int) OR (treat) λ1 (pre-chill) λ2 (post-chill) ω0 (pre-chill) ω0 (post-chill) 0.118 6.034 4.965 7.795 0.417 0.812 0.015 0.542 1.606 2.648 0.013 0.011 *** *** ** ** *** *** Carcass-level regression model Estimate

St. error

Pr > |t| 2.042

0.004

1.564

0.006

0.701 0.004 0.078 0.002 ** ns *** ** 4.777 0.993

6.916
0.798

0.373 0.002

4.854
0.013

*** ***

ns
***

Non-sign NB and sign logit

Prob zero count post-chill decreases for a 1 colony increase in the pre-chill (+) count Pre-chill counts can predict ω0 in post-chill group (actual ω0=0.81)

SLIDE 19

Escherichia coli plate counts as modelled by the group-level HNB regression

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 5 10 15 20

Escherichia coli plate counts from beef carcasses (CFU)

Pre-chill E(Y)=10.85 CFU

Probability

SLIDE 20

Escherichia coli plate counts as modelled by the group-level HNB regression



As treatment covariate was a coded variable, probabilities take the shape of a PMF

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 5 10 15 20

Escherichia coli plate counts from beef carcasses (CFU)

Pre-chill E(Y)=10.85 CFU Post-chill E(Y)=5.05 CFU

Probability

SLIDE 21

Conclusions

 In this application, the proportion of zero-counts post-chill

could be predicted from the positive counts pre-chill.

 It was the larger number of zero counts (significant logit) in

the post-chill counts – and not a potential lower positive count (non-significant negative binomial), which explained the decrease in the E(Y) from 10.85 CFU in the pre-chill to 5.05 CFU in the post-chill.

 This zero-modified heterogeneous Poisson model showed to

be very flexible and proved promising to perform inferential stats on plate counts of microorganisms of infrequent recovery.

SLIDE 22

Additional notes

SLIDE 23

Significance of the generalised Poisson models – where to go from here?

 Distribution fitting  stochastic modelling  Another variant: ZINB could separate true

zero counts (absence of m.o) from “false” zeros (presence of m.o. in low concentration but not detected due to dilutions)

 Effect on sampling criteria performance  Exposure assessment modelling  Mixed models

SLIDE 24

Distribution fitting

0.03 0.06 0.09 0.12 0.15 10 20 30 40 50 60 70 80 90 100

Coliform plate counts from pre-chill Irish beef carcasses Probability

Observed data

SLIDE 25

Distribution fitting

0.03 0.06 0.09 0.12 0.15 10 20 30 40 50 60 70 80 90 100

Coliform plate counts from pre-chill Irish beef carcasses Probability mass function

Neg Bin Observed data

SLIDE 26

Distribution fitting

0.03 0.06 0.09 0.12 0.15 10 20 30 40 50 60 70 80 90 100

Coliform plate counts from pre-chill Irish beef carcasses Probability mass function

Neg Bin ZINB Observed data

SLIDE 27

Distribution fitting

0.03 0.06 0.09 0.12 0.15 10 20 30 40 50 60 70 80 90 100

Coliform plate counts from pre-chill Irish beef carcasses Probability mass function

Neg Bin ZINB Hurdle NB Observed data

SLIDE 28

Exposure assessment simulation data

0.994 0.995 0.996 0.997 0.998 0.999 1.000 500 1000 1500 2000 2500 3000

Exposure to Salmonella Typhimurium (CFU) per serving of a cooked meat product

Grilled

Cumulative probability

Fried

SLIDE 29

Zero-inflated negative binomial model

1 1 1 1 Pr

1 1 1 1 1 1 1

1 1

i Y i i i i

Y for Y Y p Y for p p Y

i

Zero count from the NB compon. λ>0 and Yi takes 0 for the random process  Not real absence Real absence of bacteria?  Prevalence = (1-p0)

SLIDE 30

Acknowledgments

 Irish Department of Agriculture, Fisheries and Food