advanced methods in applied statistics
play

Advanced Methods in Applied Statistics Christian Starup & Loui - PowerPoint PPT Presentation

Advanced Methods in Applied Statistics Christian Starup & Loui Wentzel Niels Bohr Institute March 8, 2018 Journal Article https://doi.org/10.1093/bioinformatics/btw438 Problem Given a dataset where one needs to calculate several or many


  1. Advanced Methods in Applied Statistics Christian Starup & Loui Wentzel Niels Bohr Institute March 8, 2018

  2. Journal Article https://doi.org/10.1093/bioinformatics/btw438

  3. Problem Given a dataset where one needs to calculate several or many p-values. Should one account for a possible correlation between data variables?

  4. No Correlation solution If the P-values are not correlated, then according to H 0 the distribution of each P-value should be uniform, and the product of P-values should then be drawn from the distribution of N products of uniform numbers: � � ( − 1) N − 1 ( N − 1)! · ln( u ) N − 1 du P = (1) 0 This is equivalent to a χ 2 -test with 2 k degrees of freedom called Fishers Method: N � Ψ = − 2 log( P i ) (2) i =1 � ∞ χ 2 P = φ 2 k (Ψ) = 2 k ( x ) dx (3) Ψ

  5. Correlation solution However, if the data is correlated, we can’t assume a uniform distribution of P-values. Brown therefore expanded Fisher’s method to include a re-scaling factor, c, such that Ψ ∼ c χ 2 2 f . f = E [Ψ] 2 c = Var [Ψ] 2 E [Ψ] = k � Var [Ψ] = 4 k + 2 cov ( W i , W j ) var [Ψ] f i < j With W i = − 2 log( P i ), E [Ψ] = 2 k (assuming a χ 2 distribution), k is the Fisher’s DoF and f the re-scaled Brown’s DoF. The combined P-value is then: P combined = 1 − Φ 2 f (Ψ / c ) with Ψ = � W i , Φ 2 k being the cumulative distribution function of χ 2 2 f .

  6. Correlation solution continued The articles contribution to Browns’ method is to calculate the covariance matrix by an empirical approximation, thereby the Empirical Brown’s method (EBM): cov ( W i , W j ) ≈ cov ( w i , w j ) w i = − 2 log(1 − F ( − → x i )) Kost’s method uses another approach to calculate the covariance: cov ( W i , W j ) ≈ 3 . 263 ρ ij + 0 . 710 ρ 2 ij + 0 . 027 ρ 3 ij The EBM is a non-parametric approach, where F ( − → x i ) is the right-sided empirical cumulative distribution function.

  7. Simulating data Parameters were µ i = 0, a = 0 . 8, n = 4. b j was randomly sampled from [ − 0 . 5; 0 . 5]. Each sample had 200 entries.  1 . . . . . .  b 2 b j b n b 2 1 . . . a . . . a     . . . . ... ... . . . .   . . . .   M = (4)   . . . 1 . . . b j a a     . . . . ... ...   . . . . . . . .     b n a . . . a . . . 1 From any sample � y drawn from this distribution, n -dimensional uniform noise from [ − 1; 1] was added: y + ξ� � x = � U (5) They draw numbers from one axis on the multivariate normal distribution (axis 1 with correlations b j to the others) and test the correlation to the other axes using Pearsons correlation test.

  8. Ground Truth P-values To test the different tests against correlated data, it should yield the same results as if the data was uncorrelated. ◮ Shuffle � y 1 ◮ Calculate Ψ ∗ as earlier ◮ Repeat M times The ground truth P-value is then � M m =1 I (Ψ ∗ m ≥ Ψ) P ground = (6) M Notice this gives a resolution in the ground truth P-value by 1 / M .

  9. Performance results as a function of Signal to Noise ratio

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend