 
              How to choose summary statistics for model selection and model checking. Sarah Filippi Imperial College London Theoretical Systems Biology Group 13/02/2012
Choice of summary statistics Sarah Filippi 1 of 22
Model selection vs model checking • Model Selection: Which moutain is it a representation of ? Kilimanjaro Uluru Mount Everest Choice of summary statistics Sarah Filippi 2 of 22
Model selection vs model checking • Model Selection: Which moutain is it a representation of ? Kilimanjaro Uluru Mount Everest • Model Checking: Is it a representation of the Kilimanjaro ? Choice of summary statistics Sarah Filippi 2 of 22
Summary statistics for parameter inference • Ideally, we should use a sufficient statistic to summarize the data x ∗ : p ( θ | x ∗ ) = p ( θ |S ( x ∗ )) • When using an ABC method to approximate the posterior p ( θ | x ∗ ) , the choice of the sufficient statistic is particularly important: a statistic with a small dimension is more efficient. Choice of summary statistics Sarah Filippi 3 of 22
Summary statistics for parameter inference • Ideally, we should use a sufficient statistic to summarize the data x ∗ : p ( θ | x ∗ ) = p ( θ |S ( x ∗ )) • When using an ABC method to approximate the posterior p ( θ | x ∗ ) , the choice of the sufficient statistic is particularly important: a statistic with a small dimension is more efficient. Information theoretical perspective The idea of using a summary statistic instead of the whole data is to compress this information into a vector of minimum size. The information content may be measured by the mutual information. If S is a sufficient statistic then p ( x .θ ) � � p ( x ) p ( θ ) dx d θ = I (Θ , S ( X )) I (Θ , X ) = p ( x , θ ) log Choice of summary statistics Sarah Filippi 3 of 22
What is the role of a summary statistic. Two distinct perspectives: • a summary statistic to compress a specific data x ∗ for the given model; ideally such that p ( θ | x ∗ ) = p ( θ |S ( x ∗ )) Joyce and Marjoram, SAGMB (2008); Nunes and Balding, SAGMB (2010) • a summary statistic to compress the data for a given model (data-independent); ideally such that I (Θ , X |S ( X )) = 0 Fearnhead and Prangle, J. R. Statist. Soc. B (2012) Choice of summary statistics Sarah Filippi 4 of 22
What is the role of a summary statistic. Two distinct perspectives: • a summary statistic to compress a specific data x ∗ for the given model; ideally such that p ( θ | x ∗ ) = p ( θ |S ( x ∗ )) Joyce and Marjoram, SAGMB (2008); Nunes and Balding, SAGMB (2010) • a summary statistic to compress the data for a given model (data-independent); ideally such that I (Θ , X |S ( X )) = 0 Fearnhead and Prangle, J. R. Statist. Soc. B (2012) Link between the two perspectives: I (Θ , X |S ( X )) = E X { KL [ p ( θ | X ); p ( θ |S ( X ))] } Choice of summary statistics Sarah Filippi 4 of 22
What is the role of a summary statistic. Two distinct perspectives: • a summary statistic to compress a specific data x ∗ for the given model; ideally such that p ( θ | x ∗ ) = p ( θ |S ( x ∗ )) Joyce and Marjoram, SAGMB (2008); Nunes and Balding, SAGMB (2010) • a summary statistic to compress the data for a given model (data-independent); ideally such that I (Θ , X |S ( X )) = 0 Fearnhead and Prangle, J. R. Statist. Soc. B (2012) Link between the two perspectives: I (Θ , X |S ( X )) = E X { KL [ p ( θ | X ); p ( θ |S ( X ))] } Our approach Construct, from a set of candidate summary statistics, a set of minimal cardinality that describes the data x ∗ in a compact but lossless form, using the mutual information as a tool. Choice of summary statistics Sarah Filippi 4 of 22
Selection of summary statistics • Suppose we have a set of statistics S = {S 1 , · · · , S w } • Aim: determine the subset of S with minimum cardinality which contains all the information provided by S ( x ∗ ) about Θ . • If S contains a sufficient statistic (or the data x ∗ itself) then the constructed subset is a minimal sufficient statistic. Choice of summary statistics Sarah Filippi 5 of 22
Selection of summary statistics • Suppose we have a set of statistics S = {S 1 , · · · , S w } • Aim: determine the subset of S with minimum cardinality which contains all the information provided by S ( x ∗ ) about Θ . • If S contains a sufficient statistic (or the data x ∗ itself) then the constructed subset is a minimal sufficient statistic. An impossible algorithm • for all subsets T ⊂ S , perform ABC to obtain estimates of p ǫ ( θ |T ( x ∗ )) • determine the set Q = {T ⊂ S such that KL [ p ǫ ( θ |S ( x ∗ )); p ǫ ( θ |T ( x ∗ ))] = 0 } • the desired subset is argmin T ∈Q |T | Choice of summary statistics Sarah Filippi 5 of 22
An incremental algorithm • Start with an informative statistic Z ← argmax 1 ≤ k ≤ w log E Θ [ p ǫ (Θ |S k ( x ∗ ))] • Add step by step statistics which contains new information compared to the already selected statistics: add to Z argmax U KL [ p ǫ (Θ |Z ( x ∗ ) , U ( x ∗ )); p ǫ (Θ |Z ( x ∗ ))] Choice of summary statistics Sarah Filippi 6 of 22
An incremental algorithm • Start with an informative statistic Z ← argmax 1 ≤ k ≤ w log E Θ [ p ǫ (Θ |S k ( x ∗ ))] • Add step by step statistics which contains new information compared to the already selected statistics: add to Z argmax U KL [ p ǫ (Θ |Z ( x ∗ ) , U ( x ∗ )); p ǫ (Θ |Z ( x ∗ ))] Idea Given a set of already selected statistics Z , we aim to determine a statistic U which minimizes I (Θ; S ( X ) |Z ( X ) , U ( X )) = I (Θ; S ( X ) |Z ( X )) − I (Θ; Z ( X ) , U ( X ) |Z ( X )) ⇒ select the statistic U that maximises I (Θ; Z ( X ) , U ( X ) |Z ( X )) . Choice of summary statistics Sarah Filippi 6 of 22
An incremental algorithm • Start with an informative statistic Z ← argmax 1 ≤ k ≤ w log E Θ [ p ǫ (Θ |S k ( x ∗ ))] • Add step by step statistics which contains new information compared to the already selected statistics: add to Z argmax U KL [ p ǫ (Θ |Z ( x ∗ ) , U ( x ∗ )); p ǫ (Θ |Z ( x ∗ ))] • Stop the algorithm as soon as the newly added statistic does not bring enough information i.e. KL [ p ǫ (Θ |Z ( x ∗ ) , U ( x ∗ )); p ǫ (Θ |Z ( x ∗ ))] ≤ δ Barnes et al , Arxiv (2011) Choice of summary statistics Sarah Filippi 7 of 22
In practise • Estimation of KL [ p ǫ (Θ |Z ( x ∗ ) , U ( x ∗ )); p ǫ (Θ |Z ( x ∗ ))] from weighted samples ( θ i , w i ) 1 ≤ i ≤ N s and ( θ ′ i , w ′ i ) 1 ≤ i ≤ N s by � N s N s j = 1 w ′ j K h ( θ i ; θ ′ j ) w i log w i � , where ¯ w ′ i = . ¯ � N s w ′ i i , j = 1 w ′ j K h ( θ i ; θ ′ j ) i = 1 K h ( . ; µ ) is the normal probability density with mean µ and variance 1 / h . • δ reflects how small the estimated KL divergence between two similar probability distribution should be. • A stochastic version of the algorithm may be used if the set of statistic is large; a test for order dependency is then required. Choice of summary statistics Sarah Filippi 8 of 22
Summary statistics for model selection • As pointed out recently, sufficiency within models is not enough to reliably perform model choice in the ABC framework. Robert et al , PNAS (2011) • In particular it is not straightforward to determine a set of sufficient statistics for model selection even if sufficient statistics for parameter inference are available for each model. Choice of summary statistics Sarah Filippi 9 of 22
Summary statistics for model selection • As pointed out recently, sufficiency within models is not enough to reliably perform model choice in the ABC framework. Robert et al , PNAS (2011) • In particular it is not straightforward to determine a set of sufficient statistics for model selection even if sufficient statistics for parameter inference are available for each model. Information Theory perspective Consider q models; we require a statistic that is sufficient for the joint space { M , { Θ i } 1 ≤ i ≤ q } . For all statistics S , q � I ( M , Θ 1 , . . . , Θ q ; X | S ) = I ( M ; X | Θ 1 , . . . , Θ q , S ) + I (Θ i ; X | S ) i = 1 where S = S ( X ) . Barnes et al , Arxiv (2011) Choice of summary statistics Sarah Filippi 9 of 22
Summary statistics for model selection Information Theory perspective Consider q models; we require a statistic that is sufficient for the joint space { M , { Θ i } 1 ≤ i ≤ q } . For all statistics S , q � I ( M , Θ 1 , . . . , Θ q ; X | S ) = I ( M ; X | Θ 1 , . . . , Θ q , S ) + I (Θ i ; X | S ) i = 1 Method • For each model 1 ≤ m ≤ q , determine the set of statistics S ( m ) which minimizes I (Θ i ; X |S ( X )) • Add sequentially statistics to ∪ 1 ≤ m ≤ q S ( m ) using the previously described algorithm on the joint space. Choice of summary statistics Sarah Filippi 10 of 22
Examples: Normal Distributions y 1 , ... y d ∼ N ( µ, σ 2 1 ) and y 1 , ... y d ∼ N ( µ, σ 2 2 ) ; σ 2 1 � = σ 2 2 Statistics chosen for parameter Additional statistics chosen for inference model selection 100 100 80 80 60 60 Run Run 40 40 20 20 mean S2 range max random mean S2 range max random Choice of summary statistics Sarah Filippi 11 of 22
Recommend
More recommend