A Bayesian Method for Partially Paired High Dimensional Data Fei - PowerPoint PPT Presentation

A Bayesian Method for Partially Paired High Dimensional Data Fei Liu, Feng Liang, Woncheol Jang Institute of Statistics and Decision Sciences Duke University SAMSI, Summer 2006

Outline ◮ Bayesian methods have been developed for paired high dimensional data such as gene expression data. ◮ For partially paired data, however, excluding those unpaired observations for the analysis may lead to significant information loss. ◮ Using test statistics with FDR control is a possible solution. ◮ We provides a generalized Bayesian method for partially paired high dimensional data.

Statistical Model ◮ The data for j th gene are arranged as: X ∗ 1 j , . . . , X ∗ X 1 j , . . . , X nj ; n 1 j ; Y ∗ 1 j . . . , Y ∗ Y 1 j , . . . , Y nj ; n 2 j . ( X ij , Y ij ) : paired gene expressions. X ∗ ij , Y ∗ ij : unpaired observations. ◮ ( X ij , Y ij ) T ∼ N ( µ j , Σ j ) σ 2 � � � � µ j ρ j σ 1 σ 2 1 µ j = , Σ j = . σ 2 µ j + δ j ρ j σ 1 σ 2 2 ◮ For the incomplete data X ∗ ij ∼ N ( µ j , σ 2 1 ) , Y ∗ ij ∼ N ( µ j + δ j , σ 2 2 )

Review the FDR Control Accept Reject Total U V (False positive) m 0 True Null T (False negative) S m − m 0 Untrue Null W R m Total FDR = E ( V / R | R � = 0 ) Benjamini and Hochberg(1995) procedure to control FDR at q ∗ : � H 1 � � H ( 1 ) � H 2 . . . H m H ( 2 ) . . . H ( m ) = ⇒ p 1 p 2 . . . p m p ( 1 ) p ( 2 ) . . . p ( m ) m q ∗ , for all j ≤ i ) , Reject H ( 1 ):( k ) . j k = max ( i , p ( j ) ≤

Test Statistics for Partially Paired Data in One Dimensional Space ◮ Lin and Stivers (1974) use the test statistic when n is small and | ρ |≤ 0 . 5: ( n + n 1 ) − ¯ ( n + n 2 ) ¯ x 1 x 2 T = r q 1 1 2 nr ( a ∗ n + n 1 + n + n 2 − 11 + b 22 ) / ( N − 2 ) ( n + n 1 )( n + n 2 ) where T ∼ t N − 4 approximately, and N = n + n 1 + n 2 . ◮ Another test statistic is given by the mixed effect model: z ik = µ + α i + β k + ǫ ik , where β k ∼ N ( 0 , σ 2 β ) , ǫ ij ∼ N ( 0 , σ 2 ǫ ) Perform ANOVA to test the fixed effect α i = 0.

Scott and Berger (2003) Noticing the built-in penalty (“Ockham’s razor effect”) of the Bayesian method, Scott and Berger (2003) propose A Bayesian Hierarchical model for multiple comparisons, Observe x = ( x 1 , . . . , x M ) : N ( µ j , σ 2 ) ∼ x j γ j = 1 − δ ( µ j = 0 ) M � − ( x j − γ j µ j ) 2 1 � � f ( x | σ 2 , γ , µ ) √ = 2 πσ 2 exp 2 σ 2 j = 1 µ j ∼ N ( 0 , V ) π ( V , σ 2 ) ( V + σ 2 ) − 2 ∝ γ j ∼ Bernoulli ( p ) π ( p ) ∼ Beta ( α, β )

EBarrays Method Kendziorski et. al (2004) propose Parametric Empirical Bayes Method to account for replicating arrays (multiple conditions as well). Observe x = ( x 1 , . . . , x J ) , where x j = ( x j 1 , x j 2 , . . . , x jI ) ◮ If gene j is not differentially expressed ( δ j = 0), I � � f 0 ( x j ) = ( f obs ( x ji | µ )) π ( µ ) d µ i = 1 ◮ If gene j is differentially expressed ( δ j � = 0), f 1 ( x j ) = f 0 ( x j 1 ) f 0 ( x j 2 ) ◮ Data is marginally distributed: pf 1 ( x j ) + ( 1 − p ) f 0 ( x j ) ◮ By Bayes’ rule, posterior probability of δ j � = 0 is pf 1 ( x j ) pf 1 ( x j ) + ( 1 − p ) f 0 ( x j )

Mixture Prior ◮ Our primary interest is: H 0 : δ j = 0 ◮ We propose a mixture distribution for δ j , i.e., π ( δ j | p , τ 2 ) = p φ ( δ j /τ ) + ( 1 − p ) I { 0 } ( δ j ) , p : probability of being differentially expressed. γ j : Latent variables. Set to 1 if the j th gene is differentially expressed; otherwise 0. Interest P ( γ j = 1 | Data ) .

Priors and Posteriors ◮ Priors distributions for ( µ , σ 2 , p , τ 2 ) are: π ( µ j ) ∝ 1 � − 2 1 + τ 2 � 1 π ( τ 2 | σ 2 ) ∝ σ 2 σ 2 p α − 1 ( 1 − p ) β − 1 ≡ Beta ( α, β ) π ( p ) ∝ ◮ Improper prior distributions for ρ 1 1 π 1 ( ρ j ) ∝ j ) , π 2 ( ρ j ) ∝ ( 1 − ρ 2 ( 1 − ρ 2 j ) 2 π 1 and π 2 are both can be shown to have proper posteriors. Bayarri(1981) shows that π 1 avoids the “Jeffrey-Lindley” paradox.

Gibbs Sampling Θ = ( µ , δ , γ , ρ , σ 2 , τ 2 , p ) , Data = ( x , y , x ∗ , y ∗ ) Closed forms for sampling µ , δ , γ , σ 2 : N ( m ( µ ) , σ ( µ ) ( µ j | Θ − µ , Data ) ∼ ) j j Bernoulli ( p ( γ ) ( γ j | Θ − ( γ ∪ δ ) , Data ) ∼ ) j N ( m ( δ ) , σ ( δ ) ( δ j | Θ − δ , Data ) ∼ ) j j 0 1 J J X X ( p | Θ − p , Data ) ∼ Beta @ α + γ j , β + J − γ j A j = 1 j = 1 „ „ n 1 + n 2 « « ( σ 2 | Θ − σ 2 , Data ) ∼ IG J n + , η 2 No closed forms for ( τ 2 , ρ ) .

Simulation Study with Normal Distributions Simulate the data with 1000 genes, 5 paired, 2 unpaired control, 2 unpaired treatment, and ρ = 0 . 1 , τ 2 = 100 , σ 2 = 1 . 0 , p = 0 . 01 . False Positive False Negative FDR - T test 0/9 2/991 FDR - random effect 1/11 1/989 Bayesian Model 0/10 1/990 Histogram of p, true 0.01 140 120 100 80 density 60 40 20 0 0.005 0.010 0.015 0.020 0.025 p Figure: Posterior distribution of p (true is 0.01)

Simulation Study with Normal Distributions (Cont...) Delta_ 325 Delta_ 666 Delta_ 84 True = −12.68011;P = 1 True = −3.878074;P = 0.081 True = 0;P = 0 100 0.8 15 80 0.6 60 10 density density density 0.4 40 5 0.2 20 0.0 0 0 −13.5 −13.0 −12.5 −12.0 −11.5 −11.0 −10.5 −4 −3 −2 −1 0 0.0 0.2 0.4 0.6 0.8 1.0 delta delta delta Figure: Posterior distribution for δ ’s

Simulation Study with t Distributions Simulate the data with 1000 genes, 9 samples (5 pairs, 2 unpaired control, 2 unpaired treatment) � 0 . 1 � 1 = Bivariate T 4 with mean 0 and Σ = Data 1 0 . 1 + µ + δ ∼ U ( − 0 . 01 , 0 . 01 ) µ N ( 0 , τ 2 = 100 ); δ i | δ i � = 0 ∼ P ( δ i � = 0 ) = 0 . 01 False Positive False Negative FDR - T test 1/7 3/993 FDR - random effect 6/13 2/987 Bayesian Model 6/13 2/987

Simuation Study with t Distributions (Cont...) Histogram of p, true 0.01 100 80 60 density 40 20 0 0.005 0.010 0.015 0.020 0.025 0.030 0.035 p Figure: Posterior distribution for p

Simulation Study with t Distributions (Cont...) Delta_ 6 Delta_ 390 Delta_ 401 True = −8.308207;P = 1 True = −4.34904;P = 0.855 True = 0;P = 0.178 3.0 1.0 15 2.5 0.8 2.0 10 0.6 density density 1.5 density 0.4 1.0 5 0.2 0.5 0.0 0.0 0 −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 −1 0 1 2 delta delta delta Figure: Posterior distribution for δ ’s

Future work ◮ Apply the method to gene expression data. ◮ Use different p to achieve different thresholding. ◮ EBarrays method with random effects e k . Observe X jk = ( X jk1 , X jk2 ) for gene j and sample k . - If gene j is not differentially expressed, �� f 0 ( X jk ) = f obs ( X jki | µ + e k ) π ( µ ) π ( e k ) d e k d µ i k - If gene j is differentially expressed, �� f 0 ( X jk ) = f obs ( X jki | µ i + e k ) π ( µ i ) π ( e k ) d e k d µ i i k

A Bayesian Method for Partially Paired High Dimensional Data Fei - PowerPoint PPT Presentation

A Bayesian Method for Partially Paired High Dimensional Data Fei Liu, Feng Liang, Woncheol Jang Institute of Statistics and Decision Sciences Duke University SAMSI, Summer 2006 Outline Bayesian methods have been developed for paired high

STAT 113 Independent vs. Paired Samples Colin Reimer Dawson Oberlin College November 16, 2017

Paired Programming & Personality Traits Andrew J. Dick Red Hook Group

PAIRED READING What is it? Paired Reading is a way in which YOU can help your child to improve

Paired Reading 1 Why Paired Reading? Tried and tested evidence based Children feel

Paired t-test STAT 401 - Statistical Methods for Research Workers Jarad Niemi Iowa State

Disclosures I have nothing to disclose Paired Exchange: Compatible Pairs and Process Valerie

Strongly paired fermions Alexandros Gezerlis TALENT/INT Course on Nuclear forces and their

P i Paired Redundant IOCs Paired Redundant IOCs d R d d t IOC with Redundant Hardware with

Introductory Statistics Day 25 Paired Means Test 4.4 Paired Tests Find the data set

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Meta-Analysis of Paired-Comparison Studies of Diagnostic Test Data: A Bayesian Modeling Approach

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations Brian Trippe ,

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

B I O I N F O R M A T I C S Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling

Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits Martin Zhang joint work with:

Linking Rigid Bodies Symmetrically Bernd Schulze 1 and Shin-ichi Tanigawa 2 1 Lancaster Unviersity,

Linear models Subhransu Maji CMPSCI 689: Machine Learning 24 February 2015 26 February 2015

Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null

Flavor Physics: Past, Present, Future Indirect Searches for NP at the Time of LHC GGI, Florence,

Illustrating Agnostic Learning We want a classifier to distinguish between cats and dogs Image 1

Bootstrapping and Learning PDFA in Data Streams Borja Balle , Jorge Castro, Ricard Gavald` a

A Bayesian Method for Partially Paired High Dimensional Data Fei - PowerPoint PPT Presentation

A Bayesian Method for Partially Paired High Dimensional Data Fei Liu, Feng Liang, Woncheol Jang Institute of Statistics and Decision Sciences Duke University SAMSI, Summer 2006 Outline Bayesian methods have been developed for paired high

STAT 113 Independent vs. Paired Samples Colin Reimer Dawson Oberlin College November 16, 2017

Paired Programming &amp; Personality Traits Andrew J. Dick Red Hook Group

PAIRED READING What is it? Paired Reading is a way in which YOU can help your child to improve

Paired Reading 1 Why Paired Reading? Tried and tested evidence based Children feel

Paired t-test STAT 401 - Statistical Methods for Research Workers Jarad Niemi Iowa State

Disclosures I have nothing to disclose Paired Exchange: Compatible Pairs and Process Valerie

Strongly paired fermions Alexandros Gezerlis TALENT/INT Course on Nuclear forces and their

P i Paired Redundant IOCs Paired Redundant IOCs d R d d t IOC with Redundant Hardware with

Introductory Statistics Day 25 Paired Means Test 4.4 Paired Tests Find the data set

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Meta-Analysis of Paired-Comparison Studies of Diagnostic Test Data: A Bayesian Modeling Approach

A Bayesian clustering approach for detecting gene-gene interactions in high-dimensional genotype

LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations Brian Trippe ,

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

B I O I N F O R M A T I C S Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling

Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits Martin Zhang joint work with:

Linking Rigid Bodies Symmetrically Bernd Schulze 1 and Shin-ichi Tanigawa 2 1 Lancaster Unviersity,

Linear models Subhransu Maji CMPSCI 689: Machine Learning 24 February 2015 26 February 2015

Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null

Flavor Physics: Past, Present, Future Indirect Searches for NP at the Time of LHC GGI, Florence,

Illustrating Agnostic Learning We want a classifier to distinguish between cats and dogs Image 1

Bootstrapping and Learning PDFA in Data Streams Borja Balle , Jorge Castro, Ricard Gavald` a

Paired Programming & Personality Traits Andrew J. Dick Red Hook Group