Gene Set Enrichment Analysis Genome 559: Introduction to - PowerPoint PPT Presentation

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

A quick review � Gene expression profiling � Which molecular processes/functions are involved in a certain phenotype (e.g., disease, stress response, etc.) � The Gene Ontology (GO) Project � Provides shared vocabulary/annotation � GO terms are linked in a complex structure � Enrichment analysis: � Find the “most” differentially expressed genes � Identify functional annotations that are over-represented � Modified Fisher's exact test

A quick review: Modified Fisher's exact test Genes/balls Differentially expressed (DE) genes/balls 10 out of 50 4 out of 8 Do I have a surprisingly high number of blue genes? Null model: the 8 genes/balls are selected randomly … 2 out of 8 1 out of 8 2 out of 8 5 out of 8 3 out of 8 4 out of 8 2 out of 8 So, if you have 50 balls, 10 of them are blue, and you pick 8 balls randomly, what is the probability that k of them are blue?

A quick review: Modified Fisher's exact test Hypergeometric distribution Probability 0.30 0.15 m=50, m t =10, n=8 0 0 1 2 3 4 5 6 7 8 k So … do I have a surprisingly high number of blue genes? Can such high numbers (4 or above) occur by change? What is the probability of getting P(σ t >=4) at least 4 blue genes in the null model?

Enrichment Analysis ClassA ClassB Biological function? Genes ranked by expression correlation to Class A Cutoff

Genes ranked by expression correlation to Class A ClassA ClassB Enrichment Analysis function? Biological Cutoff Function 1 (e.g., metabolism) 2 / 10 Function 2 (e.g., signaling) 5 / 11 Function 3 (e.g., regulation) 3 / 10

Problems with cutoff-based analysis � After correcting for multiple hypotheses testing, no individual gene may meet the threshold due to noise. � Alternatively, one may be left with a long list of significant genes without any unifying biological theme. � The cutoff value is often arbitrary! � We are really examining only a handful of genes, totally ignoring much of the data

Gene Set Enrichment Analysis � MIT, Broad Institute � V 2.0 available since Jan 2007 (Subramanian et al. PNAS. 2005.)

GSEA key features � Calculates a score for the enrichment of a entire set of genes rather than single genes! � Does not require setting a cutoff! � Identifies the set of relevant genes as part of the analysis! � Provides a more robust statistical framework!

Genes ranked by expression correlation to Class A ClassA Gene Set Enrichment Analysis ClassB function? Biological Cutoff Function 1 (e.g., metabolism) 2 / 10 Function 2 (e.g., signaling) 5 / 11 Function 3 (e.g., regulation) 3 / 10

Genes ranked by expression correlation to Class A ClassA Gene Set Enrichment Analysis ClassB Function 1 (e.g., metabolism) Function 2 (e.g., signaling) Function 3 (e.g., regulation) Increase when gene is in set Decrease otherwise Running sum:

Gene Set Enrichment Analysis What would you expect if the hits were randomly distributed? What would you expect if most of the hits cluster at the top of the list?

Gene Set Enrichment Analysis Enrichment score (ES) = max deviation from 0 Running sum Leading Edge genes Genes within functional set (hits)

Gene Set Enrichment Analysis ES = 0.43 ES = -0.45 Low ES (evenly distributed)

Gene Set Enrichment Analysis Ducray et al. Molecular Cancer 2008 7 :41

GSEA Steps 1. Calculation of an enrichment score (ES) for each functional category 2. Estimation of significance level of the ES � An empirical permutation test � Phenotype labels are shuffled and the ES for this functional set is recomputed. Repeat 1000 times. � Generating a null distribution 3. Adjustment for multiple hypotheses testing � Necessary if comparing multiple gene sets (i.e.,functions) � Computes FDR (false discovery rate)

Gene Set Enrichment Analysis Genome 559: Introduction to - PowerPoint PPT Presentation

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein A quick review Gene expression profiling Which molecular processes/functions are involved in a certain phenotype (e.g.,

Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs Bertil Schmidt Christian Hundt

Eukaryotic Gene Eukaryotic Gene Prediction Prediction Eukaryotic gene structure Eukaryotic

Gene Finding Strategies to find gene structures on the web Swiss Institute of Bioinformatics

Staphylococcus aureus Pathogenesis - Gene exchanges - Gene regulation - Gene products - Gene

Enrichment for Animal and Researcher Well-Being Enrichment Benefits in Research Regulatory

School Day Enrichment Opportunities 3/14/17 14 District Enrichment Teachers (district funded)

Elementary Enrichment Program Update Presentation to the Board of Education Kerin Slattery,

Middle School Enrichment & Acceleration Where will students access enrichment and

Gene Set Enrichment Analysis Robert Gentleman Outline ! Description of the experimental

Gene Set Enrichment Analysis Genome 373 Genomic Informatics Elhanan Borenstein A quick review

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics

Gene Expression Data Introduction to gene expression data Expression data storage concept An

Gene Set Enrichment Analysis Subramanian et. al. 2005 Motivation Goal: Determine which genes

Family-based analysis of genome-wide gene gene interactions Marit Ackermann Biotec TU Dresden

Gene-set analysis and data integra/on Leif Vremo

Gene-set analysis and data integration Le Leif if V Vremo leif.varemo@scilifelab.se

Sta$s$cal Hypothesis Tes$ng Ghostbusters Ghostbusters How many

Welcome to the course! EX P ERIMEN TAL DES IGN IN P YTH ON Luke Hayden Instructor

Fundamentals Tamuno Alfred, PhD Biostatistician DataCamp Designing and Analyzing Clinical

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture18: Alternative tests and

Section 3: Permutation Inference Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A

Gov 2002: 3. Randomization Inference Matthew Blackwell September 10, 2015 Where are we? Where

Insights into the Treatment of SBD With Imaging Richard J. Schwab, M.D. Professor of Medicine

Feature Selection ZHI LI Fenys Lab October 3, 2019 What is Feature? X (Independent)

Gene Set Enrichment Analysis Genome 559: Introduction to - PowerPoint PPT Presentation

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein A quick review Gene expression profiling Which molecular processes/functions are involved in a certain phenotype (e.g.,

Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs Bertil Schmidt Christian Hundt

Eukaryotic Gene Eukaryotic Gene Prediction Prediction Eukaryotic gene structure Eukaryotic

Gene Finding Strategies to find gene structures on the web Swiss Institute of Bioinformatics

Staphylococcus aureus Pathogenesis - Gene exchanges - Gene regulation - Gene products - Gene

Enrichment for Animal and Researcher Well-Being Enrichment Benefits in Research Regulatory

School Day Enrichment Opportunities 3/14/17 14 District Enrichment Teachers (district funded)

Elementary Enrichment Program Update Presentation to the Board of Education Kerin Slattery,

Middle School Enrichment &amp; Acceleration Where will students access enrichment and

Gene Set Enrichment Analysis Robert Gentleman Outline ! Description of the experimental

Gene Set Enrichment Analysis Genome 373 Genomic Informatics Elhanan Borenstein A quick review

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics

Gene Expression Data Introduction to gene expression data Expression data storage concept An

Gene Set Enrichment Analysis Subramanian et. al. 2005 Motivation Goal: Determine which genes

Family-based analysis of genome-wide gene gene interactions Marit Ackermann Biotec TU Dresden

Gene-set analysis and data integra/on Leif Vremo

Gene-set analysis and data integration Le Leif if V Vremo leif.varemo@scilifelab.se

Sta$s$cal Hypothesis Tes$ng Ghostbusters Ghostbusters How many

Welcome to the course! EX P ERIMEN TAL DES IGN IN P YTH ON Luke Hayden Instructor

Fundamentals Tamuno Alfred, PhD Biostatistician DataCamp Designing and Analyzing Clinical

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture18: Alternative tests and

Section 3: Permutation Inference Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A

Gov 2002: 3. Randomization Inference Matthew Blackwell September 10, 2015 Where are we? Where

Insights into the Treatment of SBD With Imaging Richard J. Schwab, M.D. Professor of Medicine

Feature Selection ZHI LI Fenys Lab October 3, 2019 What is Feature? X (Independent)

Middle School Enrichment & Acceleration Where will students access enrichment and