Using non-parametric methods in the context of multiple testing to - PowerPoint PPT Presentation

Using non-parametric methods in the context of multiple testing to determine differentially expressed genes Greg Grant, Elisabetta Manduchi, Chris Stoeckert Penn Center for Bioinformatics CAMDA 2000

Outline • Differential Expression • Biological Variability and Replicates • Gene Intensity Distributions – necessitate nonparametric methods • Applications of – PaGE – t -statistic combined with a permutation algorithm

The Dataset • Golub et al. (1999), Science , 286 :531-537 • ALL-AML: heterogeneous groups: source (B- cells, T-cells, 4 AML types), sex, success, etc. • Focus on B-cells (37 replicates) vs T-cells (9 replicates): combined the training and the test sets • Affymetrix – single sample hybridization – each signal is a composite of hybridizations to probes in a set – absent calls

Distribution Heterogeneity

“Deterministic” differential expression B and T B T log scale Identifier: U23852, T-lymphocyte specific protein tyrosine kinase p56lck (lck) aberrant mRNA

“Non-deterministic” differential expression B and T B T log scale Identifier: M23323, T-cell surface glycoprotein CD3 epsilon chain precursor

Absent calls Only by including the absent calls do we see the difference in genes we expect to be differentially expressed, such as the following T-cell antigen CD7 precursor (Id: D007499) B and T B-cell T-cell log scale • Consequence of including the absent calls: the introduction of bimodal distributions and non-deterministic differential expression, thus complicating the problem of assigning confidence to predictions of differential expression.

t -statistics and adjusted p -values Use method described in Dudoit et al. (2000) • Assigns t -statistic to each gene. • p -values are obtained by permuting the columns in place of assuming t -distributions. • Corrects for multiple testing by Westfall and Young stepdown approach.

B-cell vs. T-cell using t -statistic Column n,9 = fraction of times gene is up-regulated in T-cells out of 100 comparisons between n randomly chosen B-cell and all 9 T-cell expmts.

PaGE: Patterns from Gene Expression • PaGE assigns confidence measures to predictions of differential expression. Handles multiple testing in a nonparametric (and non-standard) way. – Does not use t -statistic. • Patterns are generated by comparison of groups of replicates to a reference group. • See Manduchi et al. (2000), Bioinformatics , 16 :685-698.

PaGE: outline • Find C (the upper cutratio ) such that, if a gene is chosen at random from the set of genes which are true negatives, then the probability that i > X 2 , C is small. X 1 , i • This C gives a cutoff for making predictions about up-regulation. • Similarly for down-regulation (find an appropriate c [ lower cutratio ] , reverse the above inequality).

PaGE: approximations   X   2 i µ   The false positive rate is approximated by 2 i > C Prob   X 1 i   µ   1 i After having shifted all intensities by an appropriate numerical constant, we approximate the unknown distribution of X 1 , i µ 1 , i X 1 , i , j − by that of 1 X + 1 , i 1 − n 1 1 where i varies over the gene tags and j varies of the replicates for group 1. Similarly for group 2.

The effect of shifting hypothetical data: assuming variance proportional to magnitude. No shift necessary. real data: • variance greater for low intensities. • absent calls increase this effect. • a moderate shift compen- sates and reduces false positives at low and high intensity.

The effect of shifting (cont.) ↓ 37 B-cell replicates vs 9 T-cell replicates

B-cell vs. T-cell using PaGE Column n,9 = fraction of times gene is up-regulated in T-cells out of 100 comparisons between n randomly chosen B-cell and all 9 T-cell expmts.

Effect of Number of Replicates on False Positives Due to Biological Subclassing • Comparisons of B-cell to B-cell. No. of Indep. Genes – Any predictions are false positives. Reps 1000 3000 • Table entries are empirical likelihoods of observing any false positives. 5 0.39 0.44 • False positives are due to noise and/or biological subclassing, with the latter 10 0.15 0.19 effect diminishing as the number of 20 0.10 0.11 replicates increases. • Confidence was 90%. If PaGE was 30 0.06 0.07 exact instead of conservative, the numbers in each column would 40 0.06 0.02 converge to 0.1. • Tripling the number of independent 50 0.03 0.04 genes does not dramatically worsen the multiple testing problem of subclassing.

Summary • How many replicates are needed? – Gene intensity distributions can be very irregular – Noise and multiple testing (False negatives) • t -statistic: Continue to reduce false negatives even with 25 replicates • PaGE: Much less conservative – Biological variability and multiple testing (False positives) • PaGE: Confidence measures assume that the variability of each class is fully represented in the replicates. If a class is very heterogeneous (e.g. B-cells) then many replicates might be needed to avoid over- representing a subclass by chance and therefore introducing false positives. • The more homogeneous the group, the fewer replicates are needed. • How do findings generalize to other platforms?

Acknowledgements PCBI Brian Brunk Joan Mazzarelli Eugen Buehler Shannon McWeeney Jonathan Crabtree Colleen Petrelli Sue Davidson Debbie Pinney Sharon Diskin Angel Pizarro Georgi Kostov Jonathan Schug Phillip Le Jim Wolff URLs • http://www.cbil.upenn.edu/ • http://www.cbil.upenn.edu/PaGE • http://www.stat.berkeley.edu/users/terry/zarray/html/matt.html (Dudoit et al. ) • http://www.cbil.upenn.edu/tpWY (implementation of Dudoit et al. )

Using non-parametric methods in the context of multiple testing to - PowerPoint PPT Presentation

Using non-parametric methods in the context of multiple testing to determine differentially expressed genes Greg Grant, Elisabetta Manduchi, Chris Stoeckert Penn Center for Bioinformatics CAMDA 2000 Outline Differential Expression

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

Semi-parametric and response setup non-parametric approaches to Parametric models

Introduction to non-parametric Bayes Introduction to non-parametric Bayes methods 1 Overview

Non-parametric Bayesian Statistics Graham Neubig 2011-12-22 1 Graham Neubig Non-parametric

Non-Parametric Methods; Simulations March 6, 2020 Data Science CSCI 1951A Brown University

Towards a non-parametric Towards a non-parametric stochastic framework: a consistent approach of

Non parametric prediction and mapping of standing Non-parametric prediction and mapping of

Dose-response evaluation using a combined parametric/non-parametric approach John-Philip Lawo

Non-Parametric Methods and Support Vector Machines Shan-Hung Wu shwu@cs.nthu.edu.tw Department

TCTL model checking lower/upper-bound Introduction parametric timed automata without Parametric

CMSC427 Notes on piecewise parametric curves: Hermite, Catmull-Rom, and Bezier I. Parametric

Optical Parametric Generation and Amplification 1 Optical Parametric Generation Sum frequency

Parametric Bootstrapping 18.05 Spring 2017 Parametric bootstrapping Use the estimated parameter

Non-parametric Methods Oliver Schulte - CMPT 726 Bishop PRML Ch. 2.5 Kernel Density Estimation

Parametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 Distributions Estimating

Learning From Data Lecture 18 Radial Basis Functions Non-Parametric RBF Parametric RBF k

Network Accelera-on and Time Synchroniza-on for Data Acquisi-on

Growth & Safety Transportation Projects Adel City Council Street Committee Citizen Focus

Presented to the HIHWNMS Advisory Council Presented by: William Aila Jr. December 6, 2017 Vacant

IAEA IAEA Internship Report Internship Report Takanari Fukuda Takanari Fukuda Project Human

A Partial Characterization of Virtually Cohen-Macaulay Simplicial Complexes Nathan Kenshur,

Analgesic Strategies Dr Doug Johnson Birmingham Childrens Hospital ABRA Annual Scientific

Produ ducts cts and nd se service vices 2 Ind ndepend ependent ent ranking. . The he

Flow Batteries Operational experience at the edge of the grid Chris Winter RedFlow Limited

Sambuz

Useful Links

Newsletter

Mail Us

Using non-parametric methods in the context of multiple testing to - PowerPoint PPT Presentation

Using non-parametric methods in the context of multiple testing to determine differentially expressed genes Greg Grant, Elisabetta Manduchi, Chris Stoeckert Penn Center for Bioinformatics CAMDA 2000 Outline Differential Expression

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

Semi-parametric and response setup non-parametric approaches to Parametric models

Introduction to non-parametric Bayes Introduction to non-parametric Bayes methods 1 Overview

Non-parametric Bayesian Statistics Graham Neubig 2011-12-22 1 Graham Neubig Non-parametric

Non-Parametric Methods; Simulations March 6, 2020 Data Science CSCI 1951A Brown University

Towards a non-parametric Towards a non-parametric stochastic framework: a consistent approach of

Non parametric prediction and mapping of standing Non-parametric prediction and mapping of

Dose-response evaluation using a combined parametric/non-parametric approach John-Philip Lawo

Non-Parametric Methods and Support Vector Machines Shan-Hung Wu shwu@cs.nthu.edu.tw Department

TCTL model checking lower/upper-bound Introduction parametric timed automata without Parametric

CMSC427 Notes on piecewise parametric curves: Hermite, Catmull-Rom, and Bezier I. Parametric

Optical Parametric Generation and Amplification 1 Optical Parametric Generation Sum frequency

Parametric Bootstrapping 18.05 Spring 2017 Parametric bootstrapping Use the estimated parameter

Non-parametric Methods Oliver Schulte - CMPT 726 Bishop PRML Ch. 2.5 Kernel Density Estimation

Parametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1 Distributions Estimating

Learning From Data Lecture 18 Radial Basis Functions Non-Parametric RBF Parametric RBF k

Network Accelera-on and Time Synchroniza-on for Data Acquisi-on

Growth &amp; Safety Transportation Projects Adel City Council Street Committee Citizen Focus

Presented to the HIHWNMS Advisory Council Presented by: William Aila Jr. December 6, 2017 Vacant

IAEA IAEA Internship Report Internship Report Takanari Fukuda Takanari Fukuda Project Human

A Partial Characterization of Virtually Cohen-Macaulay Simplicial Complexes Nathan Kenshur,

Analgesic Strategies Dr Doug Johnson Birmingham Childrens Hospital ABRA Annual Scientific

Produ ducts cts and nd se service vices 2 Ind ndepend ependent ent ranking. . The he

Flow Batteries Operational experience at the edge of the grid Chris Winter RedFlow Limited

Sambuz

Useful Links

Newsletter

Mail Us

Growth & Safety Transportation Projects Adel City Council Street Committee Citizen Focus