High Dimensional Data, Covariance Matrices High Dimensional Data - - PowerPoint PPT Presentation

high dimensional data covariance matrices
SMART_READER_LITE
LIVE PREVIEW

High Dimensional Data, Covariance Matrices High Dimensional Data - - PowerPoint PPT Presentation

Outline Motivation High Dimensional Data, Covariance Matrices High Dimensional Data Examples and Application to Genetics Theoretical Underpinnings Random Matrices Shrinkage Estimation Decision Theory Samprit Banerjee, Ph.D Bayesian


slide-1
SLIDE 1

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

High Dimensional Data, Covariance Matrices and Application to Genetics

Samprit Banerjee, Ph.D

  • Div. of Biostatistics

Department of Public Health Weill Medical College of Cornell University UW-M 22-Apr-2010

slide-2
SLIDE 2

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Motivation High Dimensional Data Examples Theoretical Underpinnings Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation QTL Mapping Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

slide-3
SLIDE 3

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Data Deluge

“The coming century is surely the century of data” David Donoho, 2000 “..industrial revolution of data.” The Economist, 2010 Sources of high dimensional data

◮ Genetics and Genomics ◮ Internet portals: e.g Netflix ◮ Financial data

slide-4
SLIDE 4

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

High Dimensional Data

In statistics,

◮ Observations: instances of a particular phenomenon

◮ Example of instances ↔ human beings ◮ Typically, n denotes the number of observations.

◮ Variable or Random variable is vector of values these

  • bservations are measured on

◮ Example: blood pressure, weight, height. ◮ Typically, p denotes the number of variables.

◮ Recent trend: explosive growth of p, ↔ dimensionality. ◮ p ≫ n classical methods of statistics fail!

slide-5
SLIDE 5

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Example 1: Principal Component Analysis

Let Xn×p = [X1 : X2 : · · · : Xp] be i.i.d variables. Goal: reduce dimensionality by constructing a smaller number of “derived” variables. w1 = arg max

||w||=1 var(W ′X)

Spectral decomposition: X ′X = WLW ′, where L = diag{ℓ1, ..., ℓp} are the eigenvalues.

slide-6
SLIDE 6

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Population Structure within Europe

1

1J Novembre et al. Nature 000, 1-4 (2008) doi:10.1038/nature07331

slide-7
SLIDE 7

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Example 2: Multivariate Regression

One of the most common use of the covariance matrix in statistics is during a multivariate regression. Yn×p = Xn×qβq×p + En×p where ei ∼ Np(0, Σ), i = 1, · · · , n and Σ is p × p.

◮ Historically p < n; High Dimensional data p >> n or

q >> n

◮ Estimates can be obtained by maximizing the likelihood

L(β, Σ|X, Y ) ∝

n

  • i=1

|Σ|−1/2exp

  • −1

2(Yi − Xiβ)′Σ−1(Yi − Xiβ)

slide-8
SLIDE 8

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Seemingly Unrelated Regression

Zellner, 1962 introduced the Seemingly Unrelated Regression model. Y∗np×1 = X∗np×pqβ∗

pq×1 + e∗ np×1

where Y = vec(Y), X∗ = diag{X1, · · · , Xp}, β∗ = vec(β) ,e∗ = vec(E) and vec() is the vectorizing operator.

◮ e∗ ∼ N(0, Σ ⊗ In) ◮ GLS estimates: ˆ

β = (X ∗′Ω−1X ∗)−1(X ∗′Ω−1Y )

◮ where Ω = Σ ⊗ In and ⊗ is the Kronecker product.

slide-9
SLIDE 9

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Random Matrix Theory

◮ Covariance matrix Σp×p is a random matrix ◮ Eigenvalues of Σ, {λ1, · · · , λp} are random ◮ Properties of interest: joint distribution of eigenvalues,

number of eigenvalues falling below a given value

◮ Beginning in 1950s, physicists began to use random

matrices to study energy levels of a system in quantum mechanics.

◮ Wigner proposed a statistical description of an

“ensemble” of energy levels - properties empirical distribution and distribution of spacings.

slide-10
SLIDE 10

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Covariance Matrices

In statistics: X1, · · · , Xn ∼ Np(0, Σ) and Xn×p = [X1, · · · , Xn]′ The usual estimator is

Sample Covariance Matrix

S = X ′X/n

Bayesian Estimation

π(Σ|X) ∝ p(X|Σ)π(Σ) ˆ Σ = Eπ(.|X)(Σ)

slide-11
SLIDE 11

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Gaussian and Wishart Distributions

If X1, X2, · · · , Xn are n i.i.d samples from a p-variate or p-dimensional Gaussian distribution Np(0, Σ) with density. f (X) = | √ 2πΣ|−1/2exp

  • −1

2X ′Σ−1X

  • S = X ′X follows a Wishart distribution (named after John

Wishart, 1928) f (S) = cn,p|Σ|−n/2|S|(n−p−1)/2etr

  • −1

2Σ−1S

  • where etr() = exp(tr())
slide-12
SLIDE 12

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Eigenstructure of sample covariance matrix

It is well known that the eigenvalues of the sample covariance matrix are more spread out compared to the true eigenvalues of the population covariance matrix

slide-13
SLIDE 13

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Spread of Sample Eigenvalues

p= 2 p= 3 p= 4 p= 5 p= 6 p= 7 p= 8 p= 9 p= 10 p= 11 p= 12 p= 13 p= 14 p= 15 p= 16 p= 17 p= 18 p= 19 p= 20 p= 21 p= 22 p= 23 p= 24 p= 25 p= 26 p= 27 p= 28 p= 29 p= 30 p= 31 p= 32 p= 33 p= 34 p= 35 p= 36 p= 37 p= 38 p= 39 p= 40 p= 41 p= 42 p= 43 p= 44 p= 45 p= 46 p= 47 p= 48 p= 49 p= 50 p= 51 p= 52 p= 53 p= 54 p= 55 p= 56 p= 57 p= 58 p= 59 p= 60 p= 61 p= 62 p= 63 p= 64 p= 65 p= 66 p= 67 p= 68 p= 69 p= 70 p= 71 p= 72 p= 73 p= 74 p= 75 p= 76 p= 77 p= 78 p= 79 p= 80 p= 81 p= 82 p= 83 p= 84 p= 85 p= 86 p= 87 p= 88 p= 89 p= 90 p= 91 p= 92 p= 93 p= 94 p= 95 p= 96 p= 97 p= 98 p= 99 p= 100 n= 100 n= 99 n= 98 n= 97 n= 96 n= 95 n= 94 n= 93 n= 92 n= 91 n= 90 n= 89 n= 88 n= 87 n= 86 n= 85 n= 84 n= 83 n= 82 n= 81 n= 80 n= 79 n= 78 n= 77 n= 76 n= 75 n= 74 n= 73 n= 72 n= 71 n= 70 n= 69 n= 68 n= 67 n= 66 n= 65 n= 64 n= 63 n= 62 n= 61 n= 60 n= 59 n= 58 n= 57 n= 56 n= 55 n= 54 n= 53 n= 52 n= 51 n= 50 n= 49 n= 48 n= 47 n= 46 n= 45 n= 44 n= 43 n= 42 n= 41 n= 40 n= 39 n= 38 n= 37 n= 36 n= 35 n= 34 n= 33 n= 32 n= 31 n= 30 n= 29 n= 28 n= 27 n= 26 n= 25 n= 24 n= 23 n= 22 n= 21 n= 20 n= 19 n= 18 n= 17 n= 16 n= 15 n= 14 n= 13 n= 12 n= 11 n= 10 n= 9 n= 8 n= 7 n= 6 n= 5 n= 4 n= 3 n= 2

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

◮ Counting the number of times the sample eigenvalues are spread. ◮ ℓ1 < λ1|ℓp > λp ◮ ℓ1 > ℓ2 > · · · > ℓp are the eigenvalues

  • f the sample

covariance matrix S ◮ λ1 > λ2 > · · · > λp are the eigenvalues

  • f the population

covariance matrix Σ

slide-14
SLIDE 14

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Joint Distribution of Eigen Values

Fisher (Cambridge), Girshik (Columbia), Hsu (London), Mood (Princeton) and Roy (Calcutta)

Theorem

If S is Wp(n, Σ) with n ≥ p the joint density function of the eigenvalues ℓ1, ℓ2, · · · , ℓp of S is ∝

p

  • j=1

ℓ(n−p−1)/2

j

  • j<k

(ℓj − ℓk) ×

  • O(p)

etr

  • −1

2Σ−1HLH′

  • dH

where Op is the orthogonal group of p × p matrices, dH is the normalized Haar measure and L is the diagonal matrix diag(ℓ1, ℓ2, · · · , ℓp). Assume ℓ1 > ℓ2 > · · · > ℓp. The integral over Op can be expanded by zonal polynomials. If Σ = I then the joint density simplifies ∝

p

  • j=1

ℓ(n−p−1)/2

j

  • j<k

(ℓj − ℓk)exp  −1 2

  • j

ℓj  

slide-15
SLIDE 15

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Eigenspectrum

◮ Empirical Spectrum: how many eigenvalues fall below

a given value.

◮ Wigner’s Semi-Circle Law: Wigner showed the

limiting density of the “empirical spectrum” of real symmetric matrices A with i.i.d entries is a semi-circle

◮ Mar˘

cenko-Pastur gave the limiting density of the “empirical spectrum” of the sample eigenvalues for a special case A ∼ Wp(n, I)

slide-16
SLIDE 16

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Eigenspectrum

Study of eigenvalue distributions can be distinguished into

◮ Bulk: Refers to the

properties of the full set ℓ1, ℓ2, · · · , ℓp

◮ Extremes: Addresses the

(first few) largest and smallest eigenvalues

slide-17
SLIDE 17

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Largest Eigenvalue

Theorem (Johnstone, 2001)

Let ℓ1 >, · · · , > ℓp denote the eigenvalues of the sample covariance matrix X ′X ∼ Wp(n, I). Then ℓ1 − µnp σnp D→W1 ∼ F1 where µnp = (√n − 1 + √p)2 σnp = (√n − 1 + √p)

  • 1

√n−1 + 1 √p

1/3 F1 is the Tracy-Widom law of order 1 and has the distribution function defined by F1(s) = exp

  • − 1

2

s

q(x) + (x − s)q2(x)dx

  • ,

sǫR where q solves the (nonlinear) Painlev´ e II differential equation q(x) = xq(x) + 2q3(x), q(x) ∼ Ai(x) as x → +∞ where Ai(x) denotes the Airy function.

slide-18
SLIDE 18

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Lessons learned

◮ The Vandermonde determinant j>k(ℓj − ℓk) of the

joint eigenvalue induces repulsion

◮ The eigenstructure of the sample covariance is more

spread out compared to that of the population covariance matrix

◮ This is less pronounced when p is small ◮ Both Bulk and Extreme distribution of eigenvalues are

complicated for computation.

slide-19
SLIDE 19

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Stein’s Estimator

The sample covariance matrix S can be decomposed into VLV ′, where V is an orthogonal matrix and L = diag(ℓ1, · · · , ℓp) with ℓ1 ≥ ℓ2 ≥ · · · ≥ ℓp. Stein (1975) considered the orthogonal invariant estimator: ˆ Σ = V Φ(L)V ′ where Φ(L) = diag(φ1, · · · , φp) with φi = ℓi/αi, αi = (n − p + 1) + 2ℓi

  • j=i

1 ℓi − ℓj

slide-20
SLIDE 20

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Stein’s Estimator contd...

Issues with Stein’s estimator:

◮ The intuitive ordering of φ1 ≥ φ2 ≥ · · · φp is frequently

violated.

◮ Sometimes φi can be negative

◮ Stein suggested an isotonizing algorithm to avoid this

problem by pooling adjacent pairs.

Haff (1991) formally minimized the Bayes risk for an

  • rthogonally invariant prior by a variational technique.
slide-21
SLIDE 21

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Decision Theoretic Tools

Definition (Decision Theory)

Decision theory in philosophy, mathematics and statistics is concerned with identifying the values, uncertainties and other issues relevant in a given decision, its rationality, and the resulting optimal

  • decision. It is very closely related to the field of game theory. (source: Wikipedia)

Definition (Loss function)

A loss function is any function L from Θ × D in [0, +∞) We will consider the following Loss functions for Σ

◮ Stein’s Loss: L1(Σ, ˆ

Σ) = tr(ˆ ΣΣ−1) − log|ˆ ΣΣ−1| − p.

◮ Quadratic Loss: L2(Σ, ˆ

Σ) = tr(ˆ ΣΣ−1 − I)2

slide-22
SLIDE 22

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Decision Theoretic Tools contd...

Frequentist Risk R(θ, δ) =

  • X

L(θ, δ(x))f (x|θ)dx Bayesian Paradigm

◮ Posterior Expected Loss

ρ(π, d|x) =

  • Θ

L(θ, δ(x))π(θ|x)dθ

◮ Integrated Risk

r(π, δ) =

  • Θ
  • X

L(θ, δ(x))f (x|θ)dxπ(θ)dθ

◮ Bayes estimator δπ is that which minimized r(π, δ) and

the corresponding risk is the Bayes risk.

slide-23
SLIDE 23

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

...“To average over all possible values of x, when we know the observed value of x, seems to be a waste of information” ...“Such an evaluation may be satisfactory for the statistician, but it is not so appealing for a client, who wants optimal results for her data x, not that

  • f another’s”

Christian Robert, 2007 (The Bayesian Choice)

slide-24
SLIDE 24

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Bayesian Paradigm

π(Σ|X) ∝ p(X|Σ)π(Σ)

◮ Posterior mean, maximum a posteriori ◮ Decision theoretic approach ◮ Bayes estimator: minimize the integrated risk based on

a certain prior and loss function

slide-25
SLIDE 25

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Jeffreys Prior

Jeffreys’ invariant principle: Sir Harold Jeffreys (1961) suggested any non-informative prior distribution should be justified on the grounds of its invariance under parameter transformation. So, if θ ∼ π a priori, for any

  • ne-to-one transformation φ = φ(θ) the prior on φ should be π(φ).

π(θ) ∝ I(θ)1/2 where I(θ) = Ex|θ

  • − ∂2L

∂θ2

  • This is easy to see since I(φ) = I(θ)(dθ/dφ)2

◮ Jeffreys prior for the covariance matrix is π(Σ) ∝ |Σ|−(p+1)/2 ◮ Under Stein’s loss (L1), the Bayes estimator for the covariance matrix is the usual unbiased estimator, the sample covariance matrix S/n

slide-26
SLIDE 26

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Reference Prior

Reference Prior Principle: (Bernardo, 1992) Let x be the result of an experiment ǫ = {X, Θ, p(x|θ)} and let C be the class of admissible priors. The reference posterior of θ after x has been observed is defined to be π(θ|x) = lim πk (θ|x) where πk (θ|x) ∝ p(x|θ)πk (θ) is the posterior density corresponding to the prior πk (θ) which maximizes Iθ{ǫ(k), p(θ)} =

  • p(x)
  • p(θ|x)log p(θ|x)

p(θ) dθdx

the expected information (expected Kullback-Leibler divergence of the posterior with respect to the prior) about θ.

The Reference prior was derived by Yang and Berger (1995). Let Σ = OΛO′ where O is an orthogonal matrix and Λ is a diagonal matrix. The reference prior formulation is as follows π(Λ, O)(dΛ)(dO) ∝ 1 |Λ| (dΛ)(dH) ∝ 1 |Σ|

i<j(λi − λj) (dΣ)

where (dH) is the conditional invariant Haar measure over the space of

  • rthogonal matrices.
slide-27
SLIDE 27

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Sampling from the Reference Posterior

The posterior resulting from the reference prior is πR(Σ|S)(dΣ) ∝ etr(−1

2Σ−1S)

|Σ|n/2+1

i<j(λi − λj)(dΣ)

A Metropolis-Hastings Sampler:

◮ Generate Σnew ∼ Wp(n, S) ◮ Accept Σnew with probability ◮ α = min

  • 1,

|Σold|(p+1)/2

i<j(λold i

−λold

j

) |Σnew|(p+1)/2

i<j(λnew i

−λnew

j

)

slide-28
SLIDE 28

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Reference and Jeffreys comparison

Simulation

◮ n=50,100 ◮ p=2,5,10 ◮ correlation structure: correlated and independent ◮ 50 replicated

Frequentist Risks of the posterior mean are approximated by average Loss under the following Loss functions.

◮ Stein’s Loss: L1(Σ, ˆ

Σ) = tr(ˆ ΣΣ−1) − log|ˆ ΣΣ−1| − p

◮ Quadratic Loss: L2(Σ, ˆ

Σ) = tr(ˆ ΣΣ−1 − I)2

slide-29
SLIDE 29

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Reference and Jeffreys comparison

slide-30
SLIDE 30

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

What is QTL Mapping?

Quantitative Trait Loci (QTL) Mapping

slide-31
SLIDE 31

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

What is QTL Mapping?

Quantitative Trait Loci (QTL) Mapping QT y1 y2 y3 y4 y5 y6 y7 y8 y9 y10

◮ Quantitative Traits e.g. Blood

pressure, BMI, FatMass, complex diseases (Alzhiemers) etc.

slide-32
SLIDE 32

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

What is QTL Mapping?

Quantitative Trait Loci (QTL) Mapping QT y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 L

◮ Loci → Genomic positions

influencing the traits

slide-33
SLIDE 33

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

What is QTL Mapping?

Quantitative Trait Loci (QTL) Mapping QT y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 L Mapping

◮ Information from Quantitative

traits combined with genetic information

◮ Try to map the positions of the

genome influencing the traits

slide-34
SLIDE 34

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Genetic Design (Backcross Experiment)

  • Broman, 1997

◮ Controlled experiments → not possible with humans ◮ Example of traits: BMI, fatmass, Obesity related traits

etc.

◮ Big impact on public health

slide-35
SLIDE 35

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Importance of QTL Mapping

◮ Identifying QTL in experimental animals is critical for

the understanding biochemical pathways underlying complex traits.

◮ These understanding translate to drug targets and

eventual clinical trials.

◮ QTL mapping is also important for animal/plant

breeding.

slide-36
SLIDE 36

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Data

y1 C1M1 C1M2 C2M1 C2M2 C15M2 C16M1 C19M1 8.8 AA AA AB AA AA AB AB 9.6 AA AA AB AB AB AB AB 10.6 AB AB AA AA AB AA AA 11.1 AB AB AA AB AB AA AA

slide-37
SLIDE 37

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Interval Mapping

  • bserved markers

Chromosome

slide-38
SLIDE 38

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Interval Mapping

  • bserved markers

pseudomarkers

Chromosome

◮ Insert arbitrary positions (pseudomarkers) into marker intervals

slide-39
SLIDE 39

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Interval Mapping

  • bserved markers

pseudomarkers

Chromosome

◮ Insert arbitrary positions (pseudomarkers) into marker intervals ◮ Enables us to detect QTL within marker intervals

slide-40
SLIDE 40

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Interval Mapping

  • bserved markers

pseudomarkers

Chromosome

◮ Insert arbitrary positions (pseudomarkers) into marker intervals ◮ Enables us to detect QTL within marker intervals ◮ Pseudomarkers and markers are considered as putative QTL

slide-41
SLIDE 41

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Interval Mapping

  • bserved markers

pseudomarkers

Chromosome

◮ Insert arbitrary positions (pseudomarkers) into marker intervals ◮ Enables us to detect QTL within marker intervals ◮ Pseudomarkers and markers are considered as putative QTL ◮ Pseudomarkers not observed – Hidden Markov Model to

reconstruct genotypes

slide-42
SLIDE 42

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Challenges in QTL Mapping

Complex Traits

slide-43
SLIDE 43

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Challenges in QTL Mapping

Complex Traits

◮ interacting network of

multiple genes and environmental factors

slide-44
SLIDE 44

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Challenges in QTL Mapping

Complex Traits

◮ interacting network of

multiple genes and environmental factors

◮ small-to-moderate sized

effects

slide-45
SLIDE 45

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Challenges in QTL Mapping

Complex Traits

◮ interacting network of

multiple genes and environmental factors

◮ small-to-moderate sized

effects

◮ high sample size required

slide-46
SLIDE 46

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Challenges in QTL Mapping

Complex Traits

◮ interacting network of

multiple genes and environmental factors

◮ small-to-moderate sized

effects

◮ high sample size required

Question

What combination of genes and interactions should one consider?

slide-47
SLIDE 47

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Challenges in QTL Mapping

Complex Traits

◮ interacting network of

multiple genes and environmental factors

◮ small-to-moderate sized

effects

◮ high sample size required

Question

What combination of genes and interactions should one consider?

Model Selection

slide-48
SLIDE 48

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Challenges in QTL Mapping

Complex Traits

◮ interacting network of

multiple genes and environmental factors

◮ small-to-moderate sized

effects

◮ high sample size required

Question

What combination of genes and interactions should one consider?

Model Selection

◮ For a BC (backcross)

population with 40 genetic markers

slide-49
SLIDE 49

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Challenges in QTL Mapping

Complex Traits

◮ interacting network of

multiple genes and environmental factors

◮ small-to-moderate sized

effects

◮ high sample size required

Question

What combination of genes and interactions should one consider?

Model Selection

◮ For a BC (backcross)

population with 40 genetic markers

◮ 240 = 1012 =

1, 000, 000, 000, 000 models

slide-50
SLIDE 50

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Statistical structure

QTL Markers Two aspects of the QTL mapping problem

  • 1. The missing data problem: Markers ↔ QTL
slide-51
SLIDE 51

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Statistical structure

QTL Markers Traits Two aspects of the QTL mapping problem

  • 1. The missing data problem: Markers ↔ QTL
  • 2. The model selection problem: QTL → Traits
slide-52
SLIDE 52

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Bayesian Interval Mapping Framework

  • bserved

M y ց ր ↑ missing Q | ր ↑ | unknown λ | β տ | ր H

slide-53
SLIDE 53

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Bayesian Interval Mapping Framework

  • bserved

M y ց ր ↑ missing Q | ր ↑ | unknown λ | β տ | ր H

◮ Observed: y (traits) and M

(marker and linkage map)

slide-54
SLIDE 54

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Bayesian Interval Mapping Framework

  • bserved

M y ց ր ↑ missing Q | ր ↑ | unknown λ | β տ | ր H

◮ Observed: y (traits) and M

(marker and linkage map)

◮ Missing markers and QTL

genotypes (Q)

slide-55
SLIDE 55

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Bayesian Interval Mapping Framework

  • bserved

M y ց ր ↑ missing Q | ր ↑ | unknown λ | β տ | ր H

◮ Observed: y (traits) and M

(marker and linkage map)

◮ Missing markers and QTL

genotypes (Q)

◮ Unknown parameters

(λ, β, H, Q)

slide-56
SLIDE 56

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Bayesian Interval Mapping Framework

  • bserved

M y ց ր ↑ missing Q | ր ↑ | unknown λ | β տ | ր H

◮ Observed: y (traits) and M

(marker and linkage map)

  • trait model

p(y | Q, β, λ, H)

◮ Missing markers and QTL

genotypes (Q)

◮ Unknown parameters

(λ, β, H, Q)

slide-57
SLIDE 57

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Bayesian Interval Mapping Framework

  • bserved

M y ց ր ↑ missing Q | ր ↑ | unknown λ | β տ | ր H

◮ Observed: y (traits) and M

(marker and linkage map)

  • trait model

p(y | Q, β, λ, H)

◮ Missing markers and QTL

genotypes (Q)

  • genetic model

p(Q | M, λ, H)

◮ Unknown parameters

(λ, β, H, Q)

slide-58
SLIDE 58

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Bayesian Interval Mapping Framework

  • bserved

M y ց ր ↑ missing Q | ր ↑ | unknown λ | β տ | ր H

◮ Observed: y (traits) and M

(marker and linkage map)

  • trait model

p(y | Q, β, λ, H)

◮ Missing markers and QTL

genotypes (Q)

  • genetic model

p(Q | M, λ, H)

◮ Unknown parameters

(λ, β, H, Q)

posterior = likelihood × prior

p(λ, β, H, Q | y, M) ∝ p(y | β, λ, Q, H)p(Q | M, λ, H)p(β, λ, H)

slide-59
SLIDE 59

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Why Multiple Traits?

y1 y2 C1M1 C1M2 C2M1 C2M2 C15M2 C16M1 C19M1 8.8 7.8 AA AA AB AA AA AB AB 9.6 10.1 AA AA AB AB AB AB AB 10.6 9.9 AB AB AA AA AB AA AA 11.1 10.9 AB AB AA AB AB AA AA

◮ Typically data on more than one phenotype (correlated)

are collected e.g. BMI, fatmass etc.

slide-60
SLIDE 60

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Why Multiple Traits?

y1 y2 C1M1 C1M2 C2M1 C2M2 C15M2 C16M1 C19M1 8.8 7.8 AA AA AB AA AA AB AB 9.6 10.1 AA AA AB AB AB AB AB 10.6 9.9 AB AB AA AA AB AA AA 11.1 10.9 AB AB AA AB AB AA AA

◮ Typically data on more than one phenotype (correlated)

are collected e.g. BMI, fatmass etc.

◮ Higher power to detect weak main and/or epistatic

effects

slide-61
SLIDE 61

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Why Multiple Traits?

y1 y2 C1M1 C1M2 C2M1 C2M2 C15M2 C16M1 C19M1 8.8 7.8 AA AA AB AA AA AB AB 9.6 10.1 AA AA AB AB AB AB AB 10.6 9.9 AB AB AA AA AB AA AA 11.1 10.9 AB AB AA AB AB AA AA

◮ Typically data on more than one phenotype (correlated)

are collected e.g. BMI, fatmass etc.

◮ Higher power to detect weak main and/or epistatic

effects

◮ Higher precision of estimates

slide-62
SLIDE 62

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Why Multiple Traits?

y1 y2 C1M1 C1M2 C2M1 C2M2 C15M2 C16M1 C19M1 8.8 7.8 AA AA AB AA AA AB AB 9.6 10.1 AA AA AB AB AB AB AB 10.6 9.9 AB AB AA AA AB AA AA 11.1 10.9 AB AB AA AB AB AA AA

◮ Typically data on more than one phenotype (correlated)

are collected e.g. BMI, fatmass etc.

◮ Higher power to detect weak main and/or epistatic

effects

◮ Higher precision of estimates ◮ Separate close linkage from pleiotropy

slide-63
SLIDE 63

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Why Multiple Traits?

y1 y2 C1M1 C1M2 C2M1 C2M2 C15M2 C16M1 C19M1 8.8 7.8 AA AA AB AA AA AB AB 9.6 10.1 AA AA AB AB AB AB AB 10.6 9.9 AB AB AA AA AB AA AA 11.1 10.9 AB AB AA AB AB AA AA

◮ Typically data on more than one phenotype (correlated)

are collected e.g. BMI, fatmass etc.

◮ Higher power to detect weak main and/or epistatic

effects

◮ Higher precision of estimates ◮ Separate close linkage from pleiotropy

◮ pleiotropy

slide-64
SLIDE 64

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Why Multiple Traits?

y1 y2 C1M1 C1M2 C2M1 C2M2 C15M2 C16M1 C19M1 8.8 7.8 AA AA AB AA AA AB AB 9.6 10.1 AA AA AB AB AB AB AB 10.6 9.9 AB AB AA AA AB AA AA 11.1 10.9 AB AB AA AB AB AA AA

◮ Typically data on more than one phenotype (correlated)

are collected e.g. BMI, fatmass etc.

◮ Higher power to detect weak main and/or epistatic

effects

◮ Higher precision of estimates ◮ Separate close linkage from pleiotropy

◮ pleiotropy ◮ one gene, affecting both traits indicating common

biochemical pathways

slide-65
SLIDE 65

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Why Multiple Traits?

y1 y2 C1M1 C1M2 C2M1 C2M2 C15M2 C16M1 C19M1 8.8 7.8 AA AA AB AA AA AB AB 9.6 10.1 AA AA AB AB AB AB AB 10.6 9.9 AB AB AA AA AB AA AA 11.1 10.9 AB AB AA AB AB AA AA

◮ Typically data on more than one phenotype (correlated)

are collected e.g. BMI, fatmass etc.

◮ Higher power to detect weak main and/or epistatic

effects

◮ Higher precision of estimates ◮ Separate close linkage from pleiotropy

◮ pleiotropy ◮ one gene, affecting both traits indicating common

biochemical pathways

◮ close linkage

slide-66
SLIDE 66

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Why Multiple Traits?

y1 y2 C1M1 C1M2 C2M1 C2M2 C15M2 C16M1 C19M1 8.8 7.8 AA AA AB AA AA AB AB 9.6 10.1 AA AA AB AB AB AB AB 10.6 9.9 AB AB AA AA AB AA AA 11.1 10.9 AB AB AA AB AB AA AA

◮ Typically data on more than one phenotype (correlated)

are collected e.g. BMI, fatmass etc.

◮ Higher power to detect weak main and/or epistatic

effects

◮ Higher precision of estimates ◮ Separate close linkage from pleiotropy

◮ pleiotropy ◮ one gene, affecting both traits indicating common

biochemical pathways

◮ close linkage ◮ two tightly linked genes resulting in collinear genotypes

slide-67
SLIDE 67

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

QTL SUR Model

The QTL SUR Model: yti = µt + Xtiβt + eti, i = 1, · · · , n; t = 1, · · · , T where t corresponds to the phenotypes or traits or dependent variables and i corresponds to the individuals. It is assumed the ei = {e1i, · · · , eTi} ∼ NT(0, Σ)

slide-68
SLIDE 68

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Model Parameters

Following Godsill (2001) fix the total number of loci/independent variables that can be selected to L Then define:

◮ Model Indicators : γ = {γt1, · · · , γtL} ◮ Locus Indices : λ = {λt1, · · · , λtL}

Following special cases of the SURd model can be obtained below:

◮ SURs : λti = λi∀t = 1, · · · , T ◮ Tranditional Multivariate Model (TMV):

γti = γt∀t = 1, · · · , T

◮ Single Trait Analysis (STA): Σ = I will reduce to univariate

trait-by-trait analysis

slide-69
SLIDE 69

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Choice of Priors

Prior on β

◮ batches

k=add,dom,add-add interaction etc.

◮ βk ∼ N(0, σ2 k) and

σ2

k ∼ Inv − χ2(νk, s2 k) ◮ s2 k controls the prior

heritability per effect s2

k =

(νk − 2)E(hj)Vp/(νkVj)

slide-70
SLIDE 70

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Choice of Priors

Prior on β

◮ batches

k=add,dom,add-add interaction etc.

◮ βk ∼ N(0, σ2 k) and

σ2

k ∼ Inv − χ2(νk, s2 k) ◮ s2 k controls the prior

heritability per effect s2

k =

(νk − 2)E(hj)Vp/(νkVj)

Prior on number of QTL (ℓ)

◮ ℓ ∼ Poission(ℓ0) ◮ Choice of L = ℓ0 + 3

√ ℓ0

slide-71
SLIDE 71

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Choice of Priors

Prior on β

◮ batches

k=add,dom,add-add interaction etc.

◮ βk ∼ N(0, σ2 k) and

σ2

k ∼ Inv − χ2(νk, s2 k) ◮ s2 k controls the prior

heritability per effect s2

k =

(νk − 2)E(hj)Vp/(νkVj)

Prior on number of QTL (ℓ)

◮ ℓ ∼ Poission(ℓ0) ◮ Choice of L = ℓ0 + 3

√ ℓ0

Prior on λ and γ

◮ independent priors on

QTL positions and indicators

slide-72
SLIDE 72

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Choice of Priors

Prior on β

◮ batches

k=add,dom,add-add interaction etc.

◮ βk ∼ N(0, σ2 k) and

σ2

k ∼ Inv − χ2(νk, s2 k) ◮ s2 k controls the prior

heritability per effect s2

k =

(νk − 2)E(hj)Vp/(νkVj)

Prior on Σ

◮ p(Σ) ∝ 1 |Σ|

i<j(di−dj)

Prior on number of QTL (ℓ)

◮ ℓ ∼ Poission(ℓ0) ◮ Choice of L = ℓ0 + 3

√ ℓ0

Prior on λ and γ

◮ independent priors on

QTL positions and indicators

slide-73
SLIDE 73

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Composite Model Space Approach

◮ The idea is to circumvent the trans-dimensional

character of the problem by modeling all parameters simultaneously.

◮ The joint posterior distribution:

p(γ, λ, θ, Σ|Y , X) ∝ p(Y |X, γ, λ, θ, Σ)p(λγ, θγ|γ, Σ) × p(λ−γ, θ−γ|γ, Σ)p(γ)p(Σ, θ)

◮ where θ = {β, σ2} and λ−γ is the collection of all λti’s

for which γti = 0.

◮ Assume a priori independence

p(λ−γ, θ−γ|λγ, θγ, γ, Σ) ∝ p(λ−γ, θ−γ|γ, Σ)

slide-74
SLIDE 74

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Real Data Set

slide-75
SLIDE 75

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Trait Phenotype

◮ GONFAT → Right Gonadal fat pad ◮ SUBFAT → Subcutaneous fat pad

slide-76
SLIDE 76
slide-77
SLIDE 77

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Pleiotropic Effect

slide-78
SLIDE 78

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Future Research

Pleiotropy vs. Coincident linkage

◮ SURd: Models the coincident linkage hypothesis ◮ TMV: Models pleiotropy ◮ Bayes Factor comparison of pleiotropy vs coincident

linkage Variety of traits

◮ Ordinal traits using threshold model ◮ Survival traits

slide-79
SLIDE 79

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Future Research

eQTL (expression QTL)

◮ mRNA expression are considered traits ◮ Tens of thousands of traits (T) ◮ Lot of attention recently by researchers ◮ NIH RFAs ◮ http://grants.nih.gov/grants/guide/rfa-files/RFA-RM-

09-006.html Covariance matrix modeling

◮ Current implementation breaks down for large T ◮ Investigation of different priors

slide-80
SLIDE 80

Outline Motivation

High Dimensional Data Examples

Theoretical Underpinnings

Random Matrices Shrinkage Estimation Decision Theory Bayesian Estimation

QTL Mapping

Background Statistical Challenges Bayesian Solution Bayesian Multiple Traits

Acknowledgements

◮ Stefano Monni (Weill Cornell) ◮ Nengjun Yi (University of Alabama at Birmingham) ◮ Brian Yandell (University of Wisconsin - Madison) ◮ CTSC grant