Pathway Analysis Jenny Wu Outline Introduction to NGS data - - PowerPoint PPT Presentation
Pathway Analysis Jenny Wu Outline Introduction to NGS data - - PowerPoint PPT Presentation
Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis Jenny Wu Outline Introduction to NGS data analysis in Cancer Genomics NGS applications in cancer research Typical NGS workflows and pipeline
Outline
- Introduction to NGS data analysis in Cancer
Genomics
– NGS applications in cancer research – Typical NGS workflows and pipeline – Open source software with GUI
- Pathway Analysis and Software
- Pathway Analysis goals and concepts
- Commercial and open source pathway analysis software
- Data analysis resources
- Summary
Next Generation Sequencing
Massively Parallel Sequencing: One can generate hundreds of millions of short sequences (up to 250bp) in a single run in a short period of time with low per base cost.
- Illumina/Solexa GA II, HiSeq 2500, 3000,X
- Roche/454 FLX, Titanium
- Life Technologies/Applied Biosystems SOLiD
Reviews: Michael Metzker (2010) Nature Reviews Genetics 11:31 Quail et al (2012) BMC Genomics Jul 24;13:341.
NGS in Cancer Genomics
Shyr et al.2013
Data Analysis in the bottleneck
(wall.hms.harvard.edu)
Informatics
Basic NGS Workflow
Olson et al.
QC and pipeline analysis Data interpretation Isolation of material PCR amplification End repair, size selection Library QC Cluster generation Instrument operation
High Throughput Data Analysis Overview
Olson et al.
http://www.broadinstitute.org/gsa/wiki/images/7/7a/Overall_flow.jpg http://www.broadinstitute.org/gatk/guide/topic?name=intro
Many Analysis Pipelines Start with Read Mapping
http://www.nature.com/nprot/journal/v7/n3/full/nprot.2012.016.html
Genotyping (GATK) RNA-seq (Tuxedo)
Typical Data Analysis Pipelines
Cancer NGS Data Analysis Pipeline-Software
Raw reads Analysis-ready reads
FASTQC, FASTX- toolkit, Trimmomatic
Mapped reads
Visualization (IGV, IGB, USCS GB……) BWA, STAR
……
Data Task Software
Cancer NGS Application Specific Software
Cufflinks, MISO DESeq2,GATK MACS2, SISSRs Mapped reads Bismark, BS Seeker SomaticSniper, VarScan2, mutect freeBayes, Pindel, CNVnator
……
Open Source Software with GUI
http://www.broadinstitute.org/cancer/software/GENE-E
Galaxy: Web based platform for analysis of large datasets
http://hpc-galaxy.oit.uci.edu/root https://main.g2.bx.psu.edu/ https://usegalaxy.org/
GENE-E: java based matrix visualization and analysis platform; includes heatmap, clustering, filtering etc.
Commercial software for NGS analysis
- Easy to use, no
command line skills required
- Usually platform
independent
- Little to no learning
curve
- Limited flexibility
- Harder to publish
Outline
- Introduction to NGS data analysis in Cancer
Genomics
– NGS applications in cancer research – Typical NGS workflows and pipeline – Open source software with GUI
- Pathway Analysis and Software
- Pathway Analysis goals and concepts
- Commercial and open source pathway analysis software
- Data analysis resources
- Summary
Why Pathway Analysis
- Logical next step in any high
throughput experiments
- Goal: to characterize biological
meaning of the joint changes in gene expression
- Why? Often group of genes doing related
functions are changed
Pathway and Network Analysis
Pathway Analysis Methods:
- Functional category over representation:
discrete test for significance (BiNGO, David, IPA etc)
- Continuous test (GSEA, PAGE)
- Signaling Pathway Impact Analysis (iPathway
Guide)
Network Analysis: (WGCNA, Cytoscape etc)
Functional Category Enrichment
- Discrete tests: enrichment for groups in gene
lists
– Select gene list at some predefined cutoff – For each gene list and functional category cross-tabulate to get a 2X2 contingency table – Test for significance using Fisher’s exact test – FDR correction for multiple hypothesis testing
Differentially expressed Not differentially expressed total In the pathway a b a+b Not in the pathway c d c+d total a+c b+d n
Functional Categories in Pathway Analysis
- Gene Ontology
– Biological Process – Molecular Function – Cellular Localization
- Pathway Databases
– KEGG – BioCarta – Broad Institute (MSigDB) – Commercial knowledge bases such as IPA
- Other
– Transcription factor targets – Protein complexes – Self-Defined
Commerical and Open Source Pathway Analysis Software
Ingenuity Pathway Analysis Tool
IPA Input file
IPA results page
Resources in NGS data analysis
Public forums: Computational resources available at UCI:
- HPC: open source software
- CLCbio, IPA, JMP Genomics…
Summary
Thank you!
- NGS technologies are transforming
cancer research.
- Data analysis is a crucial part in NGS
applications
- Pathway analysis concepts and software
- Data analysis resources