How to choose summary statistics for model selection and model - PowerPoint PPT Presentation

How to choose summary statistics for model selection and model checking. Sarah Filippi Imperial College London Theoretical Systems Biology Group 13/02/2012

Choice of summary statistics Sarah Filippi 1 of 22

Model selection vs model checking • Model Selection: Which moutain is it a representation of ? Kilimanjaro Uluru Mount Everest Choice of summary statistics Sarah Filippi 2 of 22

Model selection vs model checking • Model Selection: Which moutain is it a representation of ? Kilimanjaro Uluru Mount Everest • Model Checking: Is it a representation of the Kilimanjaro ? Choice of summary statistics Sarah Filippi 2 of 22

Summary statistics for parameter inference • Ideally, we should use a sufficient statistic to summarize the data x ∗ : p ( θ | x ∗ ) = p ( θ |S ( x ∗ )) • When using an ABC method to approximate the posterior p ( θ | x ∗ ) , the choice of the sufficient statistic is particularly important: a statistic with a small dimension is more efficient. Choice of summary statistics Sarah Filippi 3 of 22

Summary statistics for parameter inference • Ideally, we should use a sufficient statistic to summarize the data x ∗ : p ( θ | x ∗ ) = p ( θ |S ( x ∗ )) • When using an ABC method to approximate the posterior p ( θ | x ∗ ) , the choice of the sufficient statistic is particularly important: a statistic with a small dimension is more efficient. Information theoretical perspective The idea of using a summary statistic instead of the whole data is to compress this information into a vector of minimum size. The information content may be measured by the mutual information. If S is a sufficient statistic then p ( x .θ ) � � p ( x ) p ( θ ) dx d θ = I (Θ , S ( X )) I (Θ , X ) = p ( x , θ ) log Choice of summary statistics Sarah Filippi 3 of 22

What is the role of a summary statistic. Two distinct perspectives: • a summary statistic to compress a specific data x ∗ for the given model; ideally such that p ( θ | x ∗ ) = p ( θ |S ( x ∗ )) Joyce and Marjoram, SAGMB (2008); Nunes and Balding, SAGMB (2010) • a summary statistic to compress the data for a given model (data-independent); ideally such that I (Θ , X |S ( X )) = 0 Fearnhead and Prangle, J. R. Statist. Soc. B (2012) Choice of summary statistics Sarah Filippi 4 of 22

What is the role of a summary statistic. Two distinct perspectives: • a summary statistic to compress a specific data x ∗ for the given model; ideally such that p ( θ | x ∗ ) = p ( θ |S ( x ∗ )) Joyce and Marjoram, SAGMB (2008); Nunes and Balding, SAGMB (2010) • a summary statistic to compress the data for a given model (data-independent); ideally such that I (Θ , X |S ( X )) = 0 Fearnhead and Prangle, J. R. Statist. Soc. B (2012) Link between the two perspectives: I (Θ , X |S ( X )) = E X { KL [ p ( θ | X ); p ( θ |S ( X ))] } Choice of summary statistics Sarah Filippi 4 of 22

What is the role of a summary statistic. Two distinct perspectives: • a summary statistic to compress a specific data x ∗ for the given model; ideally such that p ( θ | x ∗ ) = p ( θ |S ( x ∗ )) Joyce and Marjoram, SAGMB (2008); Nunes and Balding, SAGMB (2010) • a summary statistic to compress the data for a given model (data-independent); ideally such that I (Θ , X |S ( X )) = 0 Fearnhead and Prangle, J. R. Statist. Soc. B (2012) Link between the two perspectives: I (Θ , X |S ( X )) = E X { KL [ p ( θ | X ); p ( θ |S ( X ))] } Our approach Construct, from a set of candidate summary statistics, a set of minimal cardinality that describes the data x ∗ in a compact but lossless form, using the mutual information as a tool. Choice of summary statistics Sarah Filippi 4 of 22

Selection of summary statistics • Suppose we have a set of statistics S = {S 1 , · · · , S w } • Aim: determine the subset of S with minimum cardinality which contains all the information provided by S ( x ∗ ) about Θ . • If S contains a sufficient statistic (or the data x ∗ itself) then the constructed subset is a minimal sufficient statistic. Choice of summary statistics Sarah Filippi 5 of 22

Selection of summary statistics • Suppose we have a set of statistics S = {S 1 , · · · , S w } • Aim: determine the subset of S with minimum cardinality which contains all the information provided by S ( x ∗ ) about Θ . • If S contains a sufficient statistic (or the data x ∗ itself) then the constructed subset is a minimal sufficient statistic. An impossible algorithm • for all subsets T ⊂ S , perform ABC to obtain estimates of p ǫ ( θ |T ( x ∗ )) • determine the set Q = {T ⊂ S such that KL [ p ǫ ( θ |S ( x ∗ )); p ǫ ( θ |T ( x ∗ ))] = 0 } • the desired subset is argmin T ∈Q |T | Choice of summary statistics Sarah Filippi 5 of 22

An incremental algorithm • Start with an informative statistic Z ← argmax 1 ≤ k ≤ w log E Θ [ p ǫ (Θ |S k ( x ∗ ))] • Add step by step statistics which contains new information compared to the already selected statistics: add to Z argmax U KL [ p ǫ (Θ |Z ( x ∗ ) , U ( x ∗ )); p ǫ (Θ |Z ( x ∗ ))] Choice of summary statistics Sarah Filippi 6 of 22

An incremental algorithm • Start with an informative statistic Z ← argmax 1 ≤ k ≤ w log E Θ [ p ǫ (Θ |S k ( x ∗ ))] • Add step by step statistics which contains new information compared to the already selected statistics: add to Z argmax U KL [ p ǫ (Θ |Z ( x ∗ ) , U ( x ∗ )); p ǫ (Θ |Z ( x ∗ ))] Idea Given a set of already selected statistics Z , we aim to determine a statistic U which minimizes I (Θ; S ( X ) |Z ( X ) , U ( X )) = I (Θ; S ( X ) |Z ( X )) − I (Θ; Z ( X ) , U ( X ) |Z ( X )) ⇒ select the statistic U that maximises I (Θ; Z ( X ) , U ( X ) |Z ( X )) . Choice of summary statistics Sarah Filippi 6 of 22

An incremental algorithm • Start with an informative statistic Z ← argmax 1 ≤ k ≤ w log E Θ [ p ǫ (Θ |S k ( x ∗ ))] • Add step by step statistics which contains new information compared to the already selected statistics: add to Z argmax U KL [ p ǫ (Θ |Z ( x ∗ ) , U ( x ∗ )); p ǫ (Θ |Z ( x ∗ ))] • Stop the algorithm as soon as the newly added statistic does not bring enough information i.e. KL [ p ǫ (Θ |Z ( x ∗ ) , U ( x ∗ )); p ǫ (Θ |Z ( x ∗ ))] ≤ δ Barnes et al , Arxiv (2011) Choice of summary statistics Sarah Filippi 7 of 22

In practise • Estimation of KL [ p ǫ (Θ |Z ( x ∗ ) , U ( x ∗ )); p ǫ (Θ |Z ( x ∗ ))] from weighted samples ( θ i , w i ) 1 ≤ i ≤ N s and ( θ ′ i , w ′ i ) 1 ≤ i ≤ N s by � N s N s j = 1 w ′ j K h ( θ i ; θ ′ j ) w i log w i � , where ¯ w ′ i = . ¯ � N s w ′ i i , j = 1 w ′ j K h ( θ i ; θ ′ j ) i = 1 K h ( . ; µ ) is the normal probability density with mean µ and variance 1 / h . • δ reflects how small the estimated KL divergence between two similar probability distribution should be. • A stochastic version of the algorithm may be used if the set of statistic is large; a test for order dependency is then required. Choice of summary statistics Sarah Filippi 8 of 22

Summary statistics for model selection • As pointed out recently, sufficiency within models is not enough to reliably perform model choice in the ABC framework. Robert et al , PNAS (2011) • In particular it is not straightforward to determine a set of sufficient statistics for model selection even if sufficient statistics for parameter inference are available for each model. Choice of summary statistics Sarah Filippi 9 of 22

Summary statistics for model selection • As pointed out recently, sufficiency within models is not enough to reliably perform model choice in the ABC framework. Robert et al , PNAS (2011) • In particular it is not straightforward to determine a set of sufficient statistics for model selection even if sufficient statistics for parameter inference are available for each model. Information Theory perspective Consider q models; we require a statistic that is sufficient for the joint space { M , { Θ i } 1 ≤ i ≤ q } . For all statistics S , q � I ( M , Θ 1 , . . . , Θ q ; X | S ) = I ( M ; X | Θ 1 , . . . , Θ q , S ) + I (Θ i ; X | S ) i = 1 where S = S ( X ) . Barnes et al , Arxiv (2011) Choice of summary statistics Sarah Filippi 9 of 22

Summary statistics for model selection Information Theory perspective Consider q models; we require a statistic that is sufficient for the joint space { M , { Θ i } 1 ≤ i ≤ q } . For all statistics S , q � I ( M , Θ 1 , . . . , Θ q ; X | S ) = I ( M ; X | Θ 1 , . . . , Θ q , S ) + I (Θ i ; X | S ) i = 1 Method • For each model 1 ≤ m ≤ q , determine the set of statistics S ( m ) which minimizes I (Θ i ; X |S ( X )) • Add sequentially statistics to ∪ 1 ≤ m ≤ q S ( m ) using the previously described algorithm on the joint space. Choice of summary statistics Sarah Filippi 10 of 22

Examples: Normal Distributions y 1 , ... y d ∼ N ( µ, σ 2 1 ) and y 1 , ... y d ∼ N ( µ, σ 2 2 ) ; σ 2 1 � = σ 2 2 Statistics chosen for parameter Additional statistics chosen for inference model selection 100 100 80 80 60 60 Run Run 40 40 20 20 mean S2 range max random mean S2 range max random Choice of summary statistics Sarah Filippi 11 of 22

How to choose summary statistics for model selection and model - PowerPoint PPT Presentation

How to choose summary statistics for model selection and model checking. Sarah Filippi Imperial College London Theoretical Systems Biology Group 13/02/2012 Choice of summary statistics Sarah Filippi 1 of 22 Model selection vs model checking

f TAB 2/13/2012 1 1 CHOOSE BUDGET MANAGEMENT CHOOSE BUDGET MANAGEMENT 2/13/2012 2 CHOOSE

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

MODEL SELECTION AND REGULARISATION MODEL SELECTION ESTIMATING THE ACCURACY OF THE MODEL We

Conditional Predictive Inference Post Model Selection Hannes Leeb Department of Statistics Yale

Model Selection and Assumptions November 15, 2019 November 15, 2019 1 / 32 Forward Selection

STAT 213 Multicollinearity and Model Selection Colin Reimer Dawson Oberlin College 7 April 2016

Demo (Step 1, Selection) Demo (Step 1, Optimization) Demo (Step 2, Selection) Demo (Step 2,

Conference Site Selection Stephanie Sabal Program Coordinator: Site Selection sabal@acm.org

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

Selection Rules: Selection Rules Each of the spectroscopies have associated selection

Analytical Validation and IDE submission a researchers perspective Jonathan S. Berg,

As I urged you when I was going to Macedonia, remain at Ephesus so that you may charge certain

Computational Thinking Artificial Intelligence Computational Thinking www.ugrad.cs.ubc.ca/~cs100

3515ICT Theory of Computation Turing Machines (Based loosely on slides by Harald Sndergaard of

High-dimensional data-sets and the problems they cause Paul Marjoram, Dept. of Preventive

Mendelian Genetics Slide 2 / 43 1 Where do you get your traits from? Slide 3 / 43 2 True or

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge

Outline Finish aCGH + HMM. Introduc4on to networks. 1 4/24/09 CGH Analysis (1) Divide

Sambuz

Useful Links

Newsletter

Mail Us

How to choose summary statistics for model selection and model - PowerPoint PPT Presentation

How to choose summary statistics for model selection and model checking. Sarah Filippi Imperial College London Theoretical Systems Biology Group 13/02/2012 Choice of summary statistics Sarah Filippi 1 of 22 Model selection vs model checking

f TAB 2/13/2012 1 1 CHOOSE BUDGET MANAGEMENT CHOOSE BUDGET MANAGEMENT 2/13/2012 2 CHOOSE

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

MODEL SELECTION AND REGULARISATION MODEL SELECTION ESTIMATING THE ACCURACY OF THE MODEL We

Conditional Predictive Inference Post Model Selection Hannes Leeb Department of Statistics Yale

Model Selection and Assumptions November 15, 2019 November 15, 2019 1 / 32 Forward Selection

STAT 213 Multicollinearity and Model Selection Colin Reimer Dawson Oberlin College 7 April 2016

Demo (Step 1, Selection) Demo (Step 1, Optimization) Demo (Step 2, Selection) Demo (Step 2,

Conference Site Selection Stephanie Sabal Program Coordinator: Site Selection sabal@acm.org

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

Selection Rules: Selection Rules Each of the spectroscopies have associated selection

Analytical Validation and IDE submission a researchers perspective Jonathan S. Berg,

As I urged you when I was going to Macedonia, remain at Ephesus so that you may charge certain

Computational Thinking Artificial Intelligence Computational Thinking www.ugrad.cs.ubc.ca/~cs100

3515ICT Theory of Computation Turing Machines (Based loosely on slides by Harald Sndergaard of

High-dimensional data-sets and the problems they cause Paul Marjoram, Dept. of Preventive

Mendelian Genetics Slide 2 / 43 1 Where do you get your traits from? Slide 3 / 43 2 True or

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge

Outline Finish aCGH + HMM. Introduc4on to networks. 1 4/24/09 CGH Analysis (1) Divide

Sambuz

Useful Links

Newsletter

Mail Us

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?