SLIDE 16 2015-07-21 16
estimate the distribution
- 4. Bootstrapping & Bagging!
!! !!!!!!!!!!!!!!!!!!!!!!!!!!⋯!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!⋯!!!!!!!!!! !!
! ! > 1|!, !! !!!!!!!! ! ! > 1|!, !! !!!!!⋯!!!!!!! ! ! > 1|!, !! ! Original(data:""X"is"an"alignment"with" h = 1 … N"codon"sites" Bootstrapping:""sample"N"sites"with" replacement"to"get"B"bootstrap"replicates"" MLE(distribu4on:"analyze"each"bootstrap" replicate"to"es6mate"parameters"θi Posterior(prob:"use"θi to"compute"PP" for"each"site,"h,"of"the"original(data( "
We use bagging to aggregate information across the bootstrap replicates. Bagging can improve classification when there is a large amount of variability in a statistic [4]. !"# !!
! ! > 1|!,!! !!!!!!!!!!!"!!!!!!!!!!"# !! ! ! > 1|!,!! !!!!!!!
aggregate"by"averaging" the"posteriors aggregate"by"using"the" median(posterior
We use bootstrapping to estimate the variability of the MLEs:
new methods (SBA): classify sites based on these statistics
Example: mixture parameters (p0 + p1 + p2 = 1.0) for codon model M2a
p0
5 5 5 5 10 15 20 25
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.4 0.8
Example: bootstrap distribution for the mixture parameters p0 and p1 of codon model M2a Cause: too complex a model and too little signal in the data
p0 ¡ p1 ¡
Kernel smoothing is a technique for estimating the density of a random variable from “noisy” data.
p0
5 5 5 5 10 15 20 25
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.4 0.8
10 20 30 40 40 50 60 70 80 90
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.4 0.8
bootstrap distribution smoothed distribution (λ = 0.4)
p0 ¡ p1 ¡
MLE instabilities
Much better approximation
bootstrap distribution
p0 ¡ p1 ¡
kernal smoothing