Selecting explanatory variables with the modified version of - - PowerPoint PPT Presentation

selecting explanatory variables with the modified version
SMART_READER_LITE
LIVE PREVIEW

Selecting explanatory variables with the modified version of - - PowerPoint PPT Presentation

Selecting explanatory variables with the modified version of Bayesian Information Criterion Magorzata Bogdan Institute of Mathematics and Computer Science, Wrocaw University of Technology, Poland in cooperation with J.K.Ghosh, R.W.Doerge,


slide-1
SLIDE 1

Selecting explanatory variables with the modified version of Bayesian Information Criterion

Małgorzata Bogdan

Institute of Mathematics and Computer Science, Wrocław University of Technology, Poland in cooperation with J.K.Ghosh, R.W.Doerge, R. Cheng – Purdue University

  • A. Baierl, F. Frommlet, A. Futschik – Vienna University
  • A. Chakrabarti - Indian Statistical Institute
  • P. Biecek, A. Ochman, M. Żak – Wrocław University of Technology

Vienna, 24/07/2008

Małgorzata Bogdan Modified BIC

slide-2
SLIDE 2

Searching large data bases Y - the quantitative variable of interest (fruit size, survival time,

process yield)

Małgorzata Bogdan Modified BIC

slide-3
SLIDE 3

Searching large data bases Y - the quantitative variable of interest (fruit size, survival time,

process yield)

Aim – identify factors influencing Y

Małgorzata Bogdan Modified BIC

slide-4
SLIDE 4

Searching large data bases Y - the quantitative variable of interest (fruit size, survival time,

process yield)

Aim – identify factors influencing Y Properties of the data base – number of potential factors,

m, may be much larger than the number of cases, n

Małgorzata Bogdan Modified BIC

slide-5
SLIDE 5

Searching large data bases Y - the quantitative variable of interest (fruit size, survival time,

process yield)

Aim – identify factors influencing Y Properties of the data base – number of potential factors,

m, may be much larger than the number of cases, n Assumption of Sparsity - only a small proportion of potential explanatory variables influences Y

Małgorzata Bogdan Modified BIC

slide-6
SLIDE 6

Specific application - Locating Quantitative Trait Loci

Małgorzata Bogdan Modified BIC

slide-7
SLIDE 7

Data for QTL mapping in backcross population and recombinant inbred lines

Only two genotypes possible at a given locus

Małgorzata Bogdan Modified BIC

slide-8
SLIDE 8

Data for QTL mapping in backcross population and recombinant inbred lines

Only two genotypes possible at a given locus Xij - dummy variable encoding the genotype of i-th individual at locus j

Małgorzata Bogdan Modified BIC

slide-9
SLIDE 9

Data for QTL mapping in backcross population and recombinant inbred lines

Only two genotypes possible at a given locus Xij - dummy variable encoding the genotype of i-th individual at locus j Xij ∈ {−1/2, 1/2}

Małgorzata Bogdan Modified BIC

slide-10
SLIDE 10

Data for QTL mapping in backcross population and recombinant inbred lines

Only two genotypes possible at a given locus Xij - dummy variable encoding the genotype of i-th individual at locus j Xij ∈ {−1/2, 1/2} Multiple regression model: Yi = β0 +

m

  • j=1

βjXij + ǫi , (0.1) where i ∈ {1, . . . , n} and ǫi ∼ N(0, σ2)

Małgorzata Bogdan Modified BIC

slide-11
SLIDE 11

Data for QTL mapping in backcross population and recombinant inbred lines

Only two genotypes possible at a given locus Xij - dummy variable encoding the genotype of i-th individual at locus j Xij ∈ {−1/2, 1/2} Multiple regression model: Yi = β0 +

m

  • j=1

βjXij + ǫi , (0.1) where i ∈ {1, . . . , n} and ǫi ∼ N(0, σ2)

Problem : estimation of the number of influential genes

Małgorzata Bogdan Modified BIC

slide-12
SLIDE 12

Bayesian Information Criterion (1)

Mi - i-th linear model with ki < n regressors

Małgorzata Bogdan Modified BIC

slide-13
SLIDE 13

Bayesian Information Criterion (1)

Mi - i-th linear model with ki < n regressors θi = (β0, β1, . . . , βki, σ) - vector of model parameters

Małgorzata Bogdan Modified BIC

slide-14
SLIDE 14

Bayesian Information Criterion (1)

Mi - i-th linear model with ki < n regressors θi = (β0, β1, . . . , βki, σ) - vector of model parameters

Bayesian Information Criterion (Schwarz, 1978) –

maximize BIC = log L(Y |Mi, ˆ θi) − 1

2ki log n

Małgorzata Bogdan Modified BIC

slide-15
SLIDE 15

Bayesian Information Criterion (1)

Mi - i-th linear model with ki < n regressors θi = (β0, β1, . . . , βki, σ) - vector of model parameters

Bayesian Information Criterion (Schwarz, 1978) –

maximize BIC = log L(Y |Mi, ˆ θi) − 1

2ki log n

If m is fixed, n → ∞ and X ′X/n → Q, where Q is a positive definite matrix, then BIC is consistent - the probability of choosing the proper model converges to 1.

Małgorzata Bogdan Modified BIC

slide-16
SLIDE 16

Bayesian Information Criterion (1)

Mi - i-th linear model with ki < n regressors θi = (β0, β1, . . . , βki, σ) - vector of model parameters

Bayesian Information Criterion (Schwarz, 1978) –

maximize BIC = log L(Y |Mi, ˆ θi) − 1

2ki log n

If m is fixed, n → ∞ and X ′X/n → Q, where Q is a positive definite matrix, then BIC is consistent - the probability of choosing the proper model converges to 1. When n ≥ 8 BIC never chooses more regressors than AIC and is usually considered as one of the most restrictive model selection criteria.

Małgorzata Bogdan Modified BIC

slide-17
SLIDE 17

Bayesian Information Criterion (1)

Mi - i-th linear model with ki < n regressors θi = (β0, β1, . . . , βki, σ) - vector of model parameters

Bayesian Information Criterion (Schwarz, 1978) –

maximize BIC = log L(Y |Mi, ˆ θi) − 1

2ki log n

If m is fixed, n → ∞ and X ′X/n → Q, where Q is a positive definite matrix, then BIC is consistent - the probability of choosing the proper model converges to 1. When n ≥ 8 BIC never chooses more regressors than AIC and is usually considered as one of the most restrictive model selection criteria.

Surprise ? : - Broman and Speed (JRSS, 2002) report that BIC

  • verestimates the number of regressors when applied to QTL

mapping.

Małgorzata Bogdan Modified BIC

slide-18
SLIDE 18

Explanation - Bayesian roots of BIC (1)

f (θi) – prior density of θi, π(Mi) – prior probability of Mi

Małgorzata Bogdan Modified BIC

slide-19
SLIDE 19

Explanation - Bayesian roots of BIC (1)

f (θi) – prior density of θi, π(Mi) – prior probability of Mi mi(Y ) =

  • L(Y |Mi, θi)f (θi)dθi – integrated likelihood of the data

given the model Mi

Małgorzata Bogdan Modified BIC

slide-20
SLIDE 20

Explanation - Bayesian roots of BIC (1)

f (θi) – prior density of θi, π(Mi) – prior probability of Mi mi(Y ) =

  • L(Y |Mi, θi)f (θi)dθi – integrated likelihood of the data

given the model Mi posterior probability of Mi : P(Mi|Y ) ∝ mi(Y )π(Mi)

Małgorzata Bogdan Modified BIC

slide-21
SLIDE 21

Explanation - Bayesian roots of BIC (1)

f (θi) – prior density of θi, π(Mi) – prior probability of Mi mi(Y ) =

  • L(Y |Mi, θi)f (θi)dθi – integrated likelihood of the data

given the model Mi posterior probability of Mi : P(Mi|Y ) ∝ mi(Y )π(Mi) BIC neglects π(Mi) and uses approximation log mi(Y ) ≈ log L(Y |Mi, ˆ θi) − 1/2(ki + 2) log n + Ri, Ri is bounded in n.

Małgorzata Bogdan Modified BIC

slide-22
SLIDE 22

Explanation - Bayesian roots of BIC (2)

neglecting π(Mi) ≡ assuming all the models have the same prior probability

Małgorzata Bogdan Modified BIC

slide-23
SLIDE 23

Explanation - Bayesian roots of BIC (2)

neglecting π(Mi) ≡ assuming all the models have the same prior probability ≡ assigning a large prior probability to the event that the true model contains approximately m

2 regressors

Małgorzata Bogdan Modified BIC

slide-24
SLIDE 24

Explanation - Bayesian roots of BIC (2)

neglecting π(Mi) ≡ assuming all the models have the same prior probability ≡ assigning a large prior probability to the event that the true model contains approximately m

2 regressors

m=200, 200 models with one regressor,

  • 200

2

  • = 19900 models

with two regressors, 200 100

  • = 9 × 1058 models with 100 regressors

Małgorzata Bogdan Modified BIC

slide-25
SLIDE 25

Modified version of BIC, mBIC (1)

  • M. Bogdan, J.K. Ghosh,R.W. Doerge, Genetics (2004)

Proposed solution - supplementing BIC with an informative

prior distribution on the set of possible models, proposed in George and McCulloch (1993)

Małgorzata Bogdan Modified BIC

slide-26
SLIDE 26

Modified version of BIC, mBIC (1)

  • M. Bogdan, J.K. Ghosh,R.W. Doerge, Genetics (2004)

Proposed solution - supplementing BIC with an informative

prior distribution on the set of possible models, proposed in George and McCulloch (1993) p - prior probability that a randomly chosen regressor influences Y π(Mi) = pki(1 − p)m−ki

Małgorzata Bogdan Modified BIC

slide-27
SLIDE 27

Modified version of BIC, mBIC (1)

  • M. Bogdan, J.K. Ghosh,R.W. Doerge, Genetics (2004)

Proposed solution - supplementing BIC with an informative

prior distribution on the set of possible models, proposed in George and McCulloch (1993) p - prior probability that a randomly chosen regressor influences Y π(Mi) = pki(1 − p)m−ki log π(Mi) = m log(1 − p) − ki log 1 − p p

  • Małgorzata Bogdan

Modified BIC

slide-28
SLIDE 28

Modified version of BIC, mBIC (1)

  • M. Bogdan, J.K. Ghosh,R.W. Doerge, Genetics (2004)

Proposed solution - supplementing BIC with an informative

prior distribution on the set of possible models, proposed in George and McCulloch (1993) p - prior probability that a randomly chosen regressor influences Y π(Mi) = pki(1 − p)m−ki log π(Mi) = m log(1 − p) − ki log 1 − p p

  • Modified version of BIC recommends choosing the model

maximizing log L(Y |Mi, ˆ θi) − 1 2ki log n − ki log 1 − p p

  • Małgorzata Bogdan

Modified BIC

slide-29
SLIDE 29

mBIC (2)

c = mp - expected number of true regressors

Małgorzata Bogdan Modified BIC

slide-30
SLIDE 30

mBIC (2)

c = mp - expected number of true regressors mBIC = log L(Y |Mi, ˆ θi) − 1 2ki log n − ki log m c − 1

  • Małgorzata Bogdan

Modified BIC

slide-31
SLIDE 31

mBIC (2)

c = mp - expected number of true regressors mBIC = log L(Y |Mi, ˆ θi) − 1 2ki log n − ki log m c − 1

  • Standard version of mBIC uses c = 4 to control the overall type I

error at the level below 10%

Małgorzata Bogdan Modified BIC

slide-32
SLIDE 32

mBIC (2)

c = mp - expected number of true regressors mBIC = log L(Y |Mi, ˆ θi) − 1 2ki log n − ki log m c − 1

  • Standard version of mBIC uses c = 4 to control the overall type I

error at the level below 10% A similar log m penalty appears also in RIC of Foster and George (1994)

Małgorzata Bogdan Modified BIC

slide-33
SLIDE 33

Relationship to multiple testing (1)

Orthogonal design: X TX = nI(m+1)×(m+1),

(1)

Małgorzata Bogdan Modified BIC

slide-34
SLIDE 34

Relationship to multiple testing (1)

Orthogonal design: X TX = nI(m+1)×(m+1),

(1)

BIC chooses those Xj’s for which n ˆ β2

j

σ2 > log n

Małgorzata Bogdan Modified BIC

slide-35
SLIDE 35

Relationship to multiple testing (1)

Orthogonal design: X TX = nI(m+1)×(m+1),

(1)

BIC chooses those Xj’s for which n ˆ β2

j

σ2 > log n Under H0j : βj = 0, Zj =

√n ˆ βj σ

∼ N(0, 1)

Małgorzata Bogdan Modified BIC

slide-36
SLIDE 36

Relationship to multiple testing (1)

Orthogonal design: X TX = nI(m+1)×(m+1),

(1)

BIC chooses those Xj’s for which n ˆ β2

j

σ2 > log n Under H0j : βj = 0, Zj =

√n ˆ βj σ

∼ N(0, 1) Since for c > 0, 1 − Φ(c) = φ(c)

c (1 + oc)

Małgorzata Bogdan Modified BIC

slide-37
SLIDE 37

Relationship to multiple testing (1)

Orthogonal design: X TX = nI(m+1)×(m+1),

(1)

BIC chooses those Xj’s for which n ˆ β2

j

σ2 > log n Under H0j : βj = 0, Zj =

√n ˆ βj σ

∼ N(0, 1) Since for c > 0, 1 − Φ(c) = φ(c)

c (1 + oc)

It holds that for large values of n αn = 2P(Zj >

  • log n) ≈
  • 2

πn log n.

Małgorzata Bogdan Modified BIC

slide-38
SLIDE 38

Relationship to multiple testing (2)

When n and m go to infinity and the number of true signals remains fixed, the expected number of “false discoveries” is of the rate

m

n log n.

Małgorzata Bogdan Modified BIC

slide-39
SLIDE 39

Relationship to multiple testing (2)

When n and m go to infinity and the number of true signals remains fixed, the expected number of “false discoveries” is of the rate

m

n log n.

Corollary: BIC is not consistent when

m

n log n → ∞

Małgorzata Bogdan Modified BIC

slide-40
SLIDE 40

. Bonferroni correction for multiple testing : αn,m = αn

m

Małgorzata Bogdan Modified BIC

slide-41
SLIDE 41

Bonferroni correction for multiple testing : αn,m = αn

m

probability of detecting at least one “false positive”: FWER ≤ αn

Małgorzata Bogdan Modified BIC

slide-42
SLIDE 42

Bonferroni correction for multiple testing : αn,m = αn

m

probability of detecting at least one “false positive”: FWER ≤ αn 2(1 − Φ(√cBon)) = αn

m

Małgorzata Bogdan Modified BIC

slide-43
SLIDE 43

Bonferroni correction for multiple testing : αn,m = αn

m

probability of detecting at least one “false positive”: FWER ≤ αn 2(1 − Φ(√cBon)) = αn

m

cBon = 2 log m αn

  • (1 + on,m) = (log n + 2 log m)(1 + on,m)

where on,m converges to zero when n or m tends to infinity.

Małgorzata Bogdan Modified BIC

slide-44
SLIDE 44

Bonferroni correction for multiple testing : αn,m = αn

m

probability of detecting at least one “false positive”: FWER ≤ αn 2(1 − Φ(√cBon)) = αn

m

cBon = 2 log m αn

  • (1 + on,m) = (log n + 2 log m)(1 + on,m)

where on,m converges to zero when n or m tends to infinity. cmBIC = log n + 2 log m

c − 1

  • ≈ log n + 2 log m − 2 log c

Małgorzata Bogdan Modified BIC

slide-45
SLIDE 45

Properties of mBIC

  • 1. FWER ≈
  • 2

π c

n(log n+2 log m−2 log c)

Małgorzata Bogdan Modified BIC

slide-46
SLIDE 46

Properties of mBIC

  • 1. FWER ≈
  • 2

π c

n(log n+2 log m−2 log c)

  • 2. The power of detecting the explanatory variable with βj = 0 is

given by 1 − P

  • −√cmBIC −

√nβj σ < √n(ˆ βj − βj) σ < √cmBIC − √nβj σ

  • > 1 − Φ

√cmBIC −

  • √nβj

σ

  • → 1

,

Małgorzata Bogdan Modified BIC

slide-47
SLIDE 47

Properties of mBIC

  • 1. FWER ≈
  • 2

π c

n(log n+2 log m−2 log c)

  • 2. The power of detecting the explanatory variable with βj = 0 is

given by 1 − P

  • −√cmBIC −

√nβj σ < √n(ˆ βj − βj) σ < √cmBIC − √nβj σ

  • > 1 − Φ

√cmBIC −

  • √nβj

σ

  • → 1

,

Corollary: Independently on the choice of c mBIC is consistent

Małgorzata Bogdan Modified BIC

slide-48
SLIDE 48

Properties of mBIC

  • 1. FWER ≈
  • 2

π c

n(log n+2 log m−2 log c)

  • 2. The power of detecting the explanatory variable with βj = 0 is

given by 1 − P

  • −√cmBIC −

√nβj σ < √n(ˆ βj − βj) σ < √cmBIC − √nβj σ

  • > 1 − Φ

√cmBIC −

  • √nβj

σ

  • → 1

,

Corollary: Independently on the choice of c mBIC is consistent

The standard version of mBIC uses c = 4 to control FWER at the level below 10%, when n ≥ 200.

Małgorzata Bogdan Modified BIC

slide-49
SLIDE 49

Asymptotic optimality of mBIC (1)

γ0 - cost of the false discovery, γA - cost of missing the true signals

Małgorzata Bogdan Modified BIC

slide-50
SLIDE 50

Asymptotic optimality of mBIC (1)

γ0 - cost of the false discovery, γA - cost of missing the true signals βj ∼ (1 − p)δ0 + pN(0, τ 2)

Małgorzata Bogdan Modified BIC

slide-51
SLIDE 51

Asymptotic optimality of mBIC (1)

γ0 - cost of the false discovery, γA - cost of missing the true signals βj ∼ (1 − p)δ0 + pN(0, τ 2) Expected value of the experiment cost: R = m(γ0t1(1 − p) + γAt2p), where t1 and t2 are type I and type II errors

Małgorzata Bogdan Modified BIC

slide-52
SLIDE 52

Asymptotic optimality of mBIC (1)

γ0 - cost of the false discovery, γA - cost of missing the true signals βj ∼ (1 − p)δ0 + pN(0, τ 2) Expected value of the experiment cost: R = m(γ0t1(1 − p) + γAt2p), where t1 and t2 are type I and type II errors Optimal rule: Bayes oracle fA(ˆ βj) f0(ˆ βj) > (1 − p)γ0 pγA , where fA(ˆ βj) ∼ N(0, τ 2 + σ2

n ) and f0(ˆ

βj) ∼ N(0, σ2

n )

Małgorzata Bogdan Modified BIC

slide-53
SLIDE 53

Asymptotic optimality of mBIC (2)

Bayes oracle n ˆ β2

j

σ2 > σ2 + nτ 2 nτ 2

  • log

nτ 2 + σ2 σ2

  • + 2 log

1 − p p

  • + 2 log

γ0 γA

  • .

Małgorzata Bogdan Modified BIC

slide-54
SLIDE 54

Asymptotic optimality of mBIC (2)

Bayes oracle n ˆ β2

j

σ2 > σ2 + nτ 2 nτ 2

  • log

nτ 2 + σ2 σ2

  • + 2 log

1 − p p

  • + 2 log

γ0 γA

  • Asymptotic Optimality: the model selection rule V is

asymptotically optimal if lim

n→∞,m→∞

RV RBO = 1 . .

Małgorzata Bogdan Modified BIC

slide-55
SLIDE 55

Asymptotic optimality of mBIC (2)

Bayes oracle n ˆ β2

j

σ2 > σ2 + nτ 2 nτ 2

  • log

nτ 2 + σ2 σ2

  • + 2 log

1 − p p

  • + 2 log

γ0 γA

  • Asymptotic Optimality: the model selection rule V is

asymptotically optimal if lim

n→∞,m→∞

RV RBO = 1 . Theorem 1 (Bogdan, Chakrabarti, Ghosh, 2008). Under

  • rthogonal design (1) mBIC is asymptotically optimal when

limm→∞ mp = s, where s ∈ R.

Małgorzata Bogdan Modified BIC

slide-56
SLIDE 56

Asymptotic optimality of mBIC (2)

Bayes oracle n ˆ β2

j

σ2 > σ2 + nτ 2 nτ 2

  • log

nτ 2 + σ2 σ2

  • + 2 log

1 − p p

  • + 2 log

γ0 γA

  • Asymptotic Optimality: the model selection rule V is

asymptotically optimal if lim

n→∞,m→∞

RV RBO = 1 . Theorem 1 (Bogdan, Chakrabarti, Ghosh, 2008). Under

  • rthogonal design (1) mBIC is asymptotically optimal when

limm→∞ mp = s, where s ∈ R. Conjecture (Frommlet, Bogdan, 2008). Theorem 1 holds also when βj ∼ (1 − p)δ0 + pFA, where FA has a positive density at 0.

Małgorzata Bogdan Modified BIC

slide-57
SLIDE 57

Computer simulations(1)

Setting : n = 200, m = 300, entries of X ∼ N(0, σ = 0.5), k ∼ Binomial(m, p), with p = 1

30 (mp = 10), βi ∼ N(0, σ = 1.5),

ε ∼ N(0, 1) and Tukey’s gross error model: ε ∼ Tukey(0.95, 100, 1) = 0.95 ∗ N(0, 1) + 0.05 ∗ N(0, 10).

Małgorzata Bogdan Modified BIC

slide-58
SLIDE 58

Computer simulations(1)

Setting : n = 200, m = 300, entries of X ∼ N(0, σ = 0.5), k ∼ Binomial(m, p), with p = 1

30 (mp = 10), βi ∼ N(0, σ = 1.5),

ε ∼ N(0, 1) and Tukey’s gross error model: ε ∼ Tukey(0.95, 100, 1) = 0.95 ∗ N(0, 1) + 0.05 ∗ N(0, 10). Characteristics : Power, FDR = FP

AP , MR = FP + FN,

l2 = m

j=1(βj − ˆ

βj)2 mean value of the absolute prediction error based on 50 additional

  • bservations, d

Małgorzata Bogdan Modified BIC

slide-59
SLIDE 59

Computer simulations

Table: Results for 1000 replications.

noise N(0,1) Tukey(0.95, 100, 1) citerion BIC mBIC rBIC BIC mBIC rBIC FP 13.3 0.073 0.08 12.5 0.08 0.1 FN 1.84 2.97 3.45 3.95 6.11 4.29 Power 0.8155 0.7030 0.6586 0.6087 0.3923 0.5806 FDR 0.5889 0.0107 0.0116 0.6487 0.0210 0.0162 MR 15.1480 3.0410 3.5310 16.4440 6.1910 4.3910 l2 2.3610 0.6025 0.8500 13.51 4.732 1.597 d 0.9460 0.8505 0.8687 1.714 1.503 1.298 E|ε1| ≈ 0.8 , E|ε2| ≈ 1.16

Małgorzata Bogdan Modified BIC

slide-60
SLIDE 60

Applications for QTL mapping

Yi = µ +

  • j∈I

βjXij +

  • (u,v)∈U

γuvXiuXiv + εi, I - a certain subset of the set N = {1, . . . , m}, U - a certain subset of N × N

Małgorzata Bogdan Modified BIC

slide-61
SLIDE 61

Applications for QTL mapping

Yi = µ +

  • j∈I

βjXij +

  • (u,v)∈U

γuvXiuXiv + εi, I - a certain subset of the set N = {1, . . . , m}, U - a certain subset of N × N

Standard version of mBIC - minimize n log(RSS)+(p+r) log(n)+2p log(m/2.2−1)+2r log(Ne/2.2−1) p - number of main effects, r - number of interactions, Ne = m(m − 1)/2

Małgorzata Bogdan Modified BIC

slide-62
SLIDE 62

Further applications for QTL mapping

  • 1. Extending to more complicated genetic scenarios + iterative

version of mBIC : Baierl, Bogdan, Frommlet, Futschik Genetics, 2006

  • 2. Robust versions based on M-estimates: Baierl, Futschik,

Bogdan, Biecek CSDA, 2007

  • 3. Rank version: Żak, Baierl, Bogdan, Futschik Genetics, 2007
  • 4. Taking into account the correlations between neighboring

markers: Bogdan, Frommlet, Biecek, Cheng, Ghosh, Doerge, Biometrics, 2008

Małgorzata Bogdan Modified BIC

slide-63
SLIDE 63

Real Data Analysis (1)

Huttunen et al (2004) - data on the variation in male courtship song characters in Drosophila virilis.

Małgorzata Bogdan Modified BIC

slide-64
SLIDE 64

Real Data Analysis (2)

Drosophila "sing" by vibrating their wings. The most common song type is pulse song and consists of rapid transients (short-lived

  • scillations) of low frequency.

Małgorzata Bogdan Modified BIC

slide-65
SLIDE 65

Real Data Analysis (2)

Drosophila "sing" by vibrating their wings. The most common song type is pulse song and consists of rapid transients (short-lived

  • scillations) of low frequency.

Quantitative trait PN - number of pulses in a pulse train.

Małgorzata Bogdan Modified BIC

slide-66
SLIDE 66

Real Data Analysis (2)

Drosophila "sing" by vibrating their wings. The most common song type is pulse song and consists of rapid transients (short-lived

  • scillations) of low frequency.

Quantitative trait PN - number of pulses in a pulse train. Data - 24 markers on three chromosomes, n=520 males Huttunen et al (2004) used single marker analysis and composite interval mapping. They found one QTL on chromosome 2, five QTL on chromosome 3 (not sure if there are only 2) and another QTL on chromosome 4.

Małgorzata Bogdan Modified BIC

slide-67
SLIDE 67

Real Data Analysis (2)

Drosophila "sing" by vibrating their wings. The most common song type is pulse song and consists of rapid transients (short-lived

  • scillations) of low frequency.

Quantitative trait PN - number of pulses in a pulse train. Data - 24 markers on three chromosomes, n=520 males Huttunen et al (2004) used single marker analysis and composite interval mapping. They found one QTL on chromosome 2, five QTL on chromosome 3 (not sure if there are only 2) and another QTL on chromosome 4. We use mBIC supplied with Haley and Knott regression. We impute the genotypes inside intermarker intervals so the distance between tested positions does not exceed 10 cM. We penalize these imputed locations as real markers. In the results m = 59 and Ne = 1711.

Małgorzata Bogdan Modified BIC

slide-68
SLIDE 68

Real Data Analysis (3)

Małgorzata Bogdan Modified BIC

slide-69
SLIDE 69

Real Data Analysis (4)

Zeng et al. (2000) data on the morphological differences between two species of Drosophila, Drosophila simulans and Drosophila mauritana

Małgorzata Bogdan Modified BIC

slide-70
SLIDE 70

Real Data Analysis (4)

Zeng et al. (2000) data on the morphological differences between two species of Drosophila, Drosophila simulans and Drosophila mauritana Trait - the size and the shape of the posterior lobe of the male genital arch, quantified by a morphometric descriptor.

Małgorzata Bogdan Modified BIC

slide-71
SLIDE 71

Real Data Analysis (4)

Zeng et al. (2000) data on the morphological differences between two species of Drosophila, Drosophila simulans and Drosophila mauritana Trait - the size and the shape of the posterior lobe of the male genital arch, quantified by a morphometric descriptor. n1 = 471, n2 = 491, m = 193, genotypes at neighboring positions are closely correlated, Ne = 18, 528

Małgorzata Bogdan Modified BIC

slide-72
SLIDE 72

Real Data Analysis, BM

Forward

s r

mBIC

s r

Zeng

s s ❞ ❞

20 40 60 Forward

s s r r r

mBIC

s s s q s

Zeng

s t s s q s

20 40 60 80 100 120 140 Forward

t ✉ q ✈ q t s

mBIC

t ✉ ✈ r s s

Zeng

s t t ✈ s s s ❞ ❞ ❞ ❞ ❞ ❞

Małgorzata Bogdan Modified BIC

slide-73
SLIDE 73

Real Data Analysis, BS

Forward

✈ s

mBIC

t t

Zeng

t t

20 40 60 Forward

s ✈ s s q

mBIC

s ✈ s s r

Zeng

s s t t q ❞

20 40 60 80 100 120 140 Forward

s t s ✈ s s t s ✉ t

mBIC

s t ✈ s ✈ t t s

Zeng

s t ✈ t t s t t t ❞

Małgorzata Bogdan Modified BIC

slide-74
SLIDE 74

Further work

  • 1. Relaxing the penalty so as to control FDR instead of FWER,

expected optimality for a wider range of values of p - with F. Frommlet, J. K. Ghosh, A. Chakrabarti and M. Murawska.

  • 2. Application for association mapping - with F. Frommlet and
  • M. Murawska.
  • 3. Application for GLM and Zero Inflated Generalized Poisson

Regression, with M.Zak, C. Czado, V. Earhardt.

  • 4. Application for model selection in logic regression and

comparison with Bayesian Regression Trees - with M. Malina,

  • K. Ickstadt, H. Schwender.

Małgorzata Bogdan Modified BIC

slide-75
SLIDE 75

References

  • 1. Baierl, A., Bogdan, M., Frommlet, F., Futschik, A., 2006. On Locating multiple interacting

quantitative trait loci in intercross designs. Genetics 173, 1693-1703.

  • 2. Baierl, A., Futschik, A.,Bogdan, M.,Biecek, P., 2007. Locating multiple interacting quantitative trait

loci using robust model selection, Computational Statistics and Data Analysis 51, 6423-6434.

  • 3. Bogdan, M., Ghosh, J.K., Doerge, R.W., 2004. Modifying the Schwarz Bayesian Information

Criterion to locate multiple interacting quantitative trait loci. Genetics 167, 989–999.

  • 4. Bogdan, M., Frommlet, F., Biecek, P., Cheng, R., Ghosh, J. K., Doerge R. W. 2008 Extending the

Modified Bayesian Information Criterion (mBIC) to dense markers and multiple interval mapping. Biometrics, doi: 10.1111/j.1541-0420.2008.00989.x.

  • 5. Broman, K.W., Speed, T.P., 2002. A model selection approach for the identification of quantitative

trait loci in experimental crosses. J. Roy. Stat. Soc. B 64, 641–656.

  • 6. George, E.I., McCulloch, R.E., 1993. Variable Selection Via Gibbs Sampling. J. Amer. Statist.
  • Assoc. 88 : 881-889.
  • 7. Żak, M., Baierl, A., Bogdan, M., Futschik, A., 2007. Locating multiple interacting quantitative trait

loci using rank-based model selection. Genetics 176, 1845-1854. Małgorzata Bogdan Modified BIC