Microarrays False Discovery Rate Prof. Tesler Math 186 Winter - - PowerPoint PPT Presentation

microarrays false discovery rate
SMART_READER_LITE
LIVE PREVIEW

Microarrays False Discovery Rate Prof. Tesler Math 186 Winter - - PowerPoint PPT Presentation

Microarrays False Discovery Rate Prof. Tesler Math 186 Winter 2019 Prof. Tesler Microarrays False Discovery Rate Math 186 / Winter 2019 1 / 6 P -value histogram for Hedenfalk data 800 600 Frequency 400 Spots with H 1 200 Spots


slide-1
SLIDE 1

Microarrays – False Discovery Rate

  • Prof. Tesler

Math 186 Winter 2019

  • Prof. Tesler

Microarrays – False Discovery Rate Math 186 / Winter 2019 1 / 6

slide-2
SLIDE 2

P-value histogram for Hedenfalk data

P−value Frequency 0.0 0.2 0.4 0.6 0.8 1.0 200 400 600 800 Spots with H1 Spots with H0

The distribution is approximately uniform on [.3, 1] but not on [0, .3].

  • Prof. Tesler

Microarrays – False Discovery Rate Math 186 / Winter 2019 2 / 6

slide-3
SLIDE 3

P-value distribution

One definition of P-value: under H0, what is the probability of seeing data whose test statistic is “at least this extreme”? Apply this definition to the P-value itself: P = .08 means only 8% of the cases will be at least as extreme as the observed data. So, Prob(P .08) = .08. In general, Prob(P α) = α, so P is uniform on [0, 1]. This assumes the data really comes from the distribution for which the “Accept H0” decision rule was designed. If the null is “true” (e.g., µX = µY) but the distribution is not what the decision rule was designed for (e.g., not normal distribution, or incorrect σ), the P-value distribution will not be uniform because the P-values were computed incorrectly or only approximately. Some spots follow the null while some follow the alternative. Tests should be designed so that data actually generated by the alternative has small P-values.

  • Prof. Tesler

Microarrays – False Discovery Rate Math 186 / Winter 2019 3 / 6

slide-4
SLIDE 4

Error rate for multiple hypothesis tests on an array

At significance level α = 0.05, we expect ≈ 5% of spots with no biological difference in expression levels between BRCA1 & 2 tumors will nonetheless appear to exhibit such a difference in the experiment. In this experiment, the arrays have ≈ 6500 spots, but useable data was only available for ≈ 3200 spots (due to image defects, etc.). We don’t know how many of these 3200 are truly H0 or truly H1. Most of them should be H0, so the estimated number of false positives is .05(3200) = 160. There were 565 p-values under 0.05. Additional mathematical and/or (labor-intensive) biological tests are required to determine which of these 565 spots are false positives.

  • Prof. Tesler

Microarrays – False Discovery Rate Math 186 / Winter 2019 4 / 6

slide-5
SLIDE 5

Multiple hypothesis tests on an array

We simultaneously do a separate hypothesis test for every spot: H(i) vs. H(i)

1

at sig. level αi for i = 1, . . . , r, Each spot has its own Type I and Type II error. The False Discovery Rate (FDR) is the fraction of positives that are false positives. For α = 0.05, our estimated FDR is 160

565 = 0.28.

Vary α. Estimate FDR in the same way as α varies. For each α, we can determine how many positives and estimate what fraction (FDR) are false positives, then pick how many we have the resources to do the additional tests for.

  • Prof. Tesler

Microarrays – False Discovery Rate Math 186 / Winter 2019 5 / 6

slide-6
SLIDE 6

Estimated FDR as α varies

  • 0.0

0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 alpha Estimated FDR

  • 500

1000 1500 2000 2500 3000 0.0 0.2 0.4 0.6 0.8 1.0 # positives Estimated FDR

  • Prof. Tesler

Microarrays – False Discovery Rate Math 186 / Winter 2019 6 / 6