Selecting explanatory variables with the modified version of - PowerPoint PPT Presentation

Selecting explanatory variables with the modified version of Bayesian Information Criterion Małgorzata Bogdan Institute of Mathematics and Computer Science, Wrocław University of Technology, Poland in cooperation with J.K.Ghosh, R.W.Doerge, R. Cheng – Purdue University A. Baierl, F. Frommlet, A. Futschik – Vienna University A. Chakrabarti - Indian Statistical Institute P. Biecek, A. Ochman, M. Żak – Wrocław University of Technology Vienna, 24/07/2008 Małgorzata Bogdan Modified BIC

Searching large data bases Y - the quantitative variable of interest (fruit size, survival time, process yield) Małgorzata Bogdan Modified BIC

Searching large data bases Y - the quantitative variable of interest (fruit size, survival time, process yield) Aim – identify factors influencing Y Małgorzata Bogdan Modified BIC

Searching large data bases Y - the quantitative variable of interest (fruit size, survival time, process yield) Aim – identify factors influencing Y Properties of the data base – number of potential factors, m, may be much larger than the number of cases, n Małgorzata Bogdan Modified BIC

Searching large data bases Y - the quantitative variable of interest (fruit size, survival time, process yield) Aim – identify factors influencing Y Properties of the data base – number of potential factors, m, may be much larger than the number of cases, n Assumption of Sparsity - only a small proportion of potential explanatory variables influences Y Małgorzata Bogdan Modified BIC

Specific application - Locating Quantitative Trait Loci Małgorzata Bogdan Modified BIC

Data for QTL mapping in backcross population and recombinant inbred lines Only two genotypes possible at a given locus Małgorzata Bogdan Modified BIC

Data for QTL mapping in backcross population and recombinant inbred lines Only two genotypes possible at a given locus X ij - dummy variable encoding the genotype of i-th individual at locus j Małgorzata Bogdan Modified BIC

Data for QTL mapping in backcross population and recombinant inbred lines Only two genotypes possible at a given locus X ij - dummy variable encoding the genotype of i-th individual at locus j X ij ∈ {− 1 / 2 , 1 / 2 } Małgorzata Bogdan Modified BIC

Data for QTL mapping in backcross population and recombinant inbred lines Only two genotypes possible at a given locus X ij - dummy variable encoding the genotype of i-th individual at locus j X ij ∈ {− 1 / 2 , 1 / 2 } Multiple regression model: m � (0.1) Y i = β 0 + β j X ij + ǫ i , j = 1 where i ∈ { 1 , . . . , n } and ǫ i ∼ N ( 0 , σ 2 ) Małgorzata Bogdan Modified BIC

Data for QTL mapping in backcross population and recombinant inbred lines Only two genotypes possible at a given locus X ij - dummy variable encoding the genotype of i-th individual at locus j X ij ∈ {− 1 / 2 , 1 / 2 } Multiple regression model: m � (0.1) Y i = β 0 + β j X ij + ǫ i , j = 1 where i ∈ { 1 , . . . , n } and ǫ i ∼ N ( 0 , σ 2 ) Problem : estimation of the number of influential genes Małgorzata Bogdan Modified BIC

Bayesian Information Criterion (1) M i - i -th linear model with k i < n regressors Małgorzata Bogdan Modified BIC

Bayesian Information Criterion (1) M i - i -th linear model with k i < n regressors θ i = ( β 0 , β 1 , . . . , β k i , σ ) - vector of model parameters Małgorzata Bogdan Modified BIC

Bayesian Information Criterion (1) M i - i -th linear model with k i < n regressors θ i = ( β 0 , β 1 , . . . , β k i , σ ) - vector of model parameters Bayesian Information Criterion (Schwarz, 1978) – maximize BIC = log L ( Y | M i , ˆ θ i ) − 1 2 k i log n Małgorzata Bogdan Modified BIC

Bayesian Information Criterion (1) M i - i -th linear model with k i < n regressors θ i = ( β 0 , β 1 , . . . , β k i , σ ) - vector of model parameters Bayesian Information Criterion (Schwarz, 1978) – maximize BIC = log L ( Y | M i , ˆ θ i ) − 1 2 k i log n If m is fixed, n → ∞ and X ′ X / n → Q , where Q is a positive definite matrix, then BIC is consistent - the probability of choosing the proper model converges to 1. Małgorzata Bogdan Modified BIC

Bayesian Information Criterion (1) M i - i -th linear model with k i < n regressors θ i = ( β 0 , β 1 , . . . , β k i , σ ) - vector of model parameters Bayesian Information Criterion (Schwarz, 1978) – maximize BIC = log L ( Y | M i , ˆ θ i ) − 1 2 k i log n If m is fixed, n → ∞ and X ′ X / n → Q , where Q is a positive definite matrix, then BIC is consistent - the probability of choosing the proper model converges to 1. When n ≥ 8 BIC never chooses more regressors than AIC and is usually considered as one of the most restrictive model selection criteria. Małgorzata Bogdan Modified BIC

Bayesian Information Criterion (1) M i - i -th linear model with k i < n regressors θ i = ( β 0 , β 1 , . . . , β k i , σ ) - vector of model parameters Bayesian Information Criterion (Schwarz, 1978) – maximize BIC = log L ( Y | M i , ˆ θ i ) − 1 2 k i log n If m is fixed, n → ∞ and X ′ X / n → Q , where Q is a positive definite matrix, then BIC is consistent - the probability of choosing the proper model converges to 1. When n ≥ 8 BIC never chooses more regressors than AIC and is usually considered as one of the most restrictive model selection criteria. Surprise ? : - Broman and Speed (JRSS, 2002) report that BIC overestimates the number of regressors when applied to QTL mapping. Małgorzata Bogdan Modified BIC

Explanation - Bayesian roots of BIC (1) f ( θ i ) – prior density of θ i , π ( M i ) – prior probability of M i Małgorzata Bogdan Modified BIC

Explanation - Bayesian roots of BIC (1) f ( θ i ) – prior density of θ i , π ( M i ) – prior probability of M i L ( Y | M i , θ i ) f ( θ i ) d θ i – integrated likelihood of the data � m i ( Y ) = given the model M i Małgorzata Bogdan Modified BIC

Explanation - Bayesian roots of BIC (1) f ( θ i ) – prior density of θ i , π ( M i ) – prior probability of M i L ( Y | M i , θ i ) f ( θ i ) d θ i – integrated likelihood of the data � m i ( Y ) = given the model M i posterior probability of M i : P ( M i | Y ) ∝ m i ( Y ) π ( M i ) Małgorzata Bogdan Modified BIC

Explanation - Bayesian roots of BIC (1) f ( θ i ) – prior density of θ i , π ( M i ) – prior probability of M i L ( Y | M i , θ i ) f ( θ i ) d θ i – integrated likelihood of the data � m i ( Y ) = given the model M i posterior probability of M i : P ( M i | Y ) ∝ m i ( Y ) π ( M i ) BIC neglects π ( M i ) and uses approximation log m i ( Y ) ≈ log L ( Y | M i , ˆ θ i ) − 1 / 2 ( k i + 2 ) log n + R i , R i is bounded in n . Małgorzata Bogdan Modified BIC

Explanation - Bayesian roots of BIC (2) neglecting π ( M i ) ≡ assuming all the models have the same prior probability Małgorzata Bogdan Modified BIC

Explanation - Bayesian roots of BIC (2) neglecting π ( M i ) ≡ assuming all the models have the same prior probability ≡ assigning a large prior probability to the event that the true model contains approximately m 2 regressors Małgorzata Bogdan Modified BIC

Explanation - Bayesian roots of BIC (2) neglecting π ( M i ) ≡ assuming all the models have the same prior probability ≡ assigning a large prior probability to the event that the true model contains approximately m 2 regressors � � 200 m=200, 200 models with one regressor, = 19900 models 2 � 200 � = 9 × 10 58 models with 100 regressors with two regressors, 100 Małgorzata Bogdan Modified BIC

Modified version of BIC, mBIC (1) M. Bogdan, J.K. Ghosh,R.W. Doerge, Genetics (2004) Proposed solution - supplementing BIC with an informative prior distribution on the set of possible models, proposed in George and McCulloch (1993) Małgorzata Bogdan Modified BIC

Modified version of BIC, mBIC (1) M. Bogdan, J.K. Ghosh,R.W. Doerge, Genetics (2004) Proposed solution - supplementing BIC with an informative prior distribution on the set of possible models, proposed in George and McCulloch (1993) p - prior probability that a randomly chosen regressor influences Y π ( M i ) = p k i ( 1 − p ) m − k i Małgorzata Bogdan Modified BIC

Modified version of BIC, mBIC (1) M. Bogdan, J.K. Ghosh,R.W. Doerge, Genetics (2004) Proposed solution - supplementing BIC with an informative prior distribution on the set of possible models, proposed in George and McCulloch (1993) p - prior probability that a randomly chosen regressor influences Y π ( M i ) = p k i ( 1 − p ) m − k i � 1 − p � log π ( M i ) = m log ( 1 − p ) − k i log p Małgorzata Bogdan Modified BIC

Modified version of BIC, mBIC (1) M. Bogdan, J.K. Ghosh,R.W. Doerge, Genetics (2004) Proposed solution - supplementing BIC with an informative prior distribution on the set of possible models, proposed in George and McCulloch (1993) p - prior probability that a randomly chosen regressor influences Y π ( M i ) = p k i ( 1 − p ) m − k i � 1 − p � log π ( M i ) = m log ( 1 − p ) − k i log p Modified version of BIC recommends choosing the model maximizing θ i ) − 1 � 1 − p � log L ( Y | M i , ˆ 2 k i log n − k i log p Małgorzata Bogdan Modified BIC

mBIC (2) c = mp - expected number of true regressors Małgorzata Bogdan Modified BIC

Selecting explanatory variables with the modified version of - PowerPoint PPT Presentation

Selecting explanatory variables with the modified version of Bayesian Information Criterion Magorzata Bogdan Institute of Mathematics and Computer Science, Wrocaw University of Technology, Poland in cooperation with J.K.Ghosh, R.W.Doerge,

R04 - Regression with Categorical Explanatory Variables STAT 587 (Engineering) Iowa State

Lecture 15 Decide variables roles, explanatory & response Put explanatory in rows,

Dotmetrics Exclusive Users Selecting basic dimensions (country, devices) Selecting timeframe

YCL Week 3 Lets talk about variables! Variables Variables are containers for data. Variables

Closures & Scoping Variables Parameters Local variables Free variables

1 2 3 State R&D Graphic, Version 1 Version 1 4 State R&D Graphic, Version 1,

The Explanatory Value of Category Theory Ellen Lehet University of Notre Dame Ellen Lehet The

Covid-19 Explanatory Model: A Decomposition V2 20 July 2020 V2 V2 Schield: 2020 Covid19 Explain

Learning Explanatory Rules from Noisy Data Richard Evans, Ed Grefenstette Overview Our system,

Product Training Modified 3M-Matic Case Sealers Overview General Description Modified

Century SAGA Century SAGA Version 7.6 / Version 7.6 / Version 8.2 Version 8.2 Purpose

Fonctionnalits de la version 11 Nouveauts de la version 12 Version 11 and version 12 in a

Explanatory and Response Variables People make claims about their world all the time. If

Twin data analysis with ACE-decomposed explanatory variables using Stata German Stata Users Group

Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the

Regression Models Response Variable (Y). Explanatory (or predictor) Variables (X j ; j =

TRAUMA MODEL THERAPY A Treatment Approach for Domestic Violence and Addictions Colin A. Ross,

Connected Communities: Transportations Role in Building Great Cities LOCUS Leadership Summit

User-centered design of Business Communities. The Influence of diversity factors an usage

Loci of Compe++on, Market Power, and the Evolving Internet

Bioinformatics Algorithms (Fundamental Algorithms, module 2) Zsuzsanna Lipt ak Masters in

Case in 2017: some thoughts Omer Preminger UMD Department of Linguistics & Maryland Language

Non-density of stability for holomorphic endomorphisms of CP k Romain Dujardin Universit e

Phylogenomic perspectives on reproductive Phylogenomic perspectives on reproductive isolation and

Sambuz

Useful Links

Newsletter

Mail Us

Selecting explanatory variables with the modified version of - PowerPoint PPT Presentation

Selecting explanatory variables with the modified version of Bayesian Information Criterion Magorzata Bogdan Institute of Mathematics and Computer Science, Wrocaw University of Technology, Poland in cooperation with J.K.Ghosh, R.W.Doerge,

R04 - Regression with Categorical Explanatory Variables STAT 587 (Engineering) Iowa State

Lecture 15 Decide variables roles, explanatory &amp; response Put explanatory in rows,

Dotmetrics Exclusive Users Selecting basic dimensions (country, devices) Selecting timeframe

YCL Week 3 Lets talk about variables! Variables Variables are containers for data. Variables

Closures &amp; Scoping Variables Parameters Local variables Free variables

1 2 3 State R&amp;D Graphic, Version 1 Version 1 4 State R&amp;D Graphic, Version 1,

The Explanatory Value of Category Theory Ellen Lehet University of Notre Dame Ellen Lehet The

Covid-19 Explanatory Model: A Decomposition V2 20 July 2020 V2 V2 Schield: 2020 Covid19 Explain

Learning Explanatory Rules from Noisy Data Richard Evans, Ed Grefenstette Overview Our system,

Product Training Modified 3M-Matic Case Sealers Overview General Description Modified

Century SAGA Century SAGA Version 7.6 / Version 7.6 / Version 8.2 Version 8.2 Purpose

Fonctionnalits de la version 11 Nouveauts de la version 12 Version 11 and version 12 in a

Explanatory and Response Variables People make claims about their world all the time. If

Twin data analysis with ACE-decomposed explanatory variables using Stata German Stata Users Group

Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the

Regression Models Response Variable (Y). Explanatory (or predictor) Variables (X j ; j =

TRAUMA MODEL THERAPY A Treatment Approach for Domestic Violence and Addictions Colin A. Ross,

Connected Communities: Transportations Role in Building Great Cities LOCUS Leadership Summit

User-centered design of Business Communities. The Influence of diversity factors an usage

Loci of Compe++on, Market Power, and the Evolving Internet

Bioinformatics Algorithms (Fundamental Algorithms, module 2) Zsuzsanna Lipt ak Masters in

Case in 2017: some thoughts Omer Preminger UMD Department of Linguistics &amp; Maryland Language

Non-density of stability for holomorphic endomorphisms of CP k Romain Dujardin Universit e

Phylogenomic perspectives on reproductive Phylogenomic perspectives on reproductive isolation and

Sambuz

Useful Links

Newsletter

Mail Us

Lecture 15 Decide variables roles, explanatory & response Put explanatory in rows,

Closures & Scoping Variables Parameters Local variables Free variables

1 2 3 State R&D Graphic, Version 1 Version 1 4 State R&D Graphic, Version 1,

Case in 2017: some thoughts Omer Preminger UMD Department of Linguistics & Maryland Language