Methods to analyze transcriptome data in view of gene regulation and - - PowerPoint PPT Presentation

methods to analyze transcriptome data in view of gene
SMART_READER_LITE
LIVE PREVIEW

Methods to analyze transcriptome data in view of gene regulation and - - PowerPoint PPT Presentation

Methods to analyze transcriptome data in view of gene regulation and signaling pathways Prof. Dr. Tim Beibarth Institute of Medical Statistics Statistical Bioinformatics Group We want to understand the molecular workings of a living cell


slide-1
SLIDE 1

Methods to analyze transcriptome data in view of gene regulation and signaling pathways

  • Prof. Dr. Tim Beißbarth

Institute of Medical Statistics Statistical Bioinformatics Group

slide-2
SLIDE 2

We want to understand the molecular workings of a living cell

Gene Regulation Apoptosis Proliferation

slide-3
SLIDE 3

We want to understand the molecular workings of a living cell

slide-4
SLIDE 4

Most of the time we measure only transcriptome levels

Microarrays since about 1990s RNA-Seq since about 2010s different cellular conditions almost all gene transcripts matrix of gene expression levels

slide-5
SLIDE 5

Proteomics Transcriptomics Genomics complexity ...only modest correlation regulatory control on different cellular layer:

  • protein layer
  • protein activation layer
  • transcription factor layer
  • miRNA layer
  • transcript/mRNA layer
  • gene layer
  • ...

Wachter A and Beissbarth T, Front. Genet. (2015)

Can we learn about the workings of a cell based on transcriptomics data?

slide-6
SLIDE 6
  • Gene Expression
  • miRNA Expression

m

Can we estimate miRNA activity from gene expression data?

  • miRNAs are important regulators of gene expression
  • often Gene Expression Microarrays and miRNA-Microarrays are performed in parallel
slide-7
SLIDE 7
  • Expression of miRNAs
  • Expression of mRNAs
  • Target Prediction: which miRNA influences

which mRNA?

e.g. MicroCosm (Griffiths-Jones et al, Nucleic Acids Res, 2008)

Different sources of information

slide-8
SLIDE 8

Database

  • n miRNA regulated

Gene Sets miRNA Expression Data mRNA Expression Data Testing for Differential Gene Sets Database

  • n miRNA regulated

Gene Sets Database

  • n miRNA regulated

Gene Sets Database

  • n miRNA regulated

Gene Sets mRNA expression data grouped by Gene Sets Testing for Differential miRNAs p-value combination

Artmann S, Jung K, Bleckmann A, Beißbarth T. Detection of simultaneous group effects in microRNA expression and related target gene sets. PLoS One. 2012;7(6):e38365.

Combination of Test Results in order to find differential miRNAs

R – Package: mirTest

slide-9
SLIDE 9

miR-1 GS-1 mRNA Expression miRNA Expression LIMMA

Mean in Group 1 Mean in Group 2

miR-2 GS-2 GS-3 miR-3 Gene Set Enrichment / Globaltest

(Smyth et al. 2004)

Use Gene-Set Enrichment Tests

slide-10
SLIDE 10
  • Enrichment Tests

competitive Null-Hypothesis (e.g. Fisher Test, Wilcoxon, Kolm.-Smirnov Test)

  • Globaltests

self contained Null-Hypothesis (e.g. GlobalTest, GlobalAncova, RepeatedHighDim)

t t t

t { mRNA Expression

t t t t t

t

Global vs. Enrichment tests

Beissbarth T, Speed TP. GOstat: find statistically overrepresented Gene Ontologies within a group of genes.

  • Bioinformatics. 2004 Jun 12;20(9):1464-5.

Jung K, Becker B, Brunner E, Beissbarth T. Comparison of global tests for functional gene sets in two-group designs and selection of potentially effect-causing genes.

  • Bioinformatics. 2011 May 15;27(10):1377-83.

Gene Set

test statistics t

slide-11
SLIDE 11

miR-1 p-Value miR-2 p-Value miR-3 p-Value p-Value GS-1 p-Value GS-2 p-Value GS-3 p-Value 1 p-Value 2 p-Value 2

P-value combinations with method of Fisher or Stouffer.

Combination of P-values using a meta-analytic approach

slide-12
SLIDE 12

Test FDR Power ≥ 0.05 ≥ 0.05 >> 0.05 Enrichment Tests ± 0.05 ± 0.05 Fisher << 0.05 Rotation Tests ROAST ± 0.05 ± 0.05 Globaltests Globaltest Limma < GST < Combi. GlobalAncova Limma < GST < Combi. RepeatedHighDim Limma < GST < Combi.

  • Kolm. Smirnov

Limma < GST < Combi. Wilcoxon Limma < GST < Combi. Limma < GST < Combi. Limma < Combi. < GST Romer Limma < Combi. < GST

Results of simulation

slide-13
SLIDE 13

A eukatiotic cell, e.g. drosophila SL2 cell Nuleus dna An external stimulus, e.g. LPS - a principal cell wall component of bacteria A surface receptor protein, e.g. LPS receptor

tak rel mkk/hep Rel pathway JNK pathway transcriptional regulation transcriptional regulation

Activates antimicrobial response genes Activates pro-apoptotic response genes

Signal network through protein activation

Boutros 2002

Can we learn signaling pathways based on transcriptome data?

slide-14
SLIDE 14

Microarray measureents are used to measure gene expression

  • f response

genes. Genes are selectively silenced using siRNA. Data of intervention effects can be used to reconstruct signal network. controls LPS treat. Rel

  • tak
  • mkk/hep
  • rel targets

Microarray experiments Selected differential genes tak targets mkk/hep targets

  Experimental data

slide-15
SLIDE 15

 = F   D

Observations Signals

rel tak Mkk/hep rel tak Mkk/hep rel tak Mkk/hep Effected genes  rel tak Mkk/hep

What is a Nested Effects Model

Markowetz 2005

slide-16
SLIDE 16

E E E E E S1 S3 S2 S4 E E E E

  • Distinguish between:
  • S-genes

(silenced genes)

  • E-genes

(effected genes)

  • Perform gene expression

study (microarray) for each knock-down experiment.

  • Network reconstruction is

based on the effects seen at E-genes when specific S-genes are knocked- down

S1 S2 S3 S4

Effected (E) Genes Silencing (S) Experiments

Idea of Nested Effects Models

slide-17
SLIDE 17

Choose candidate network topology of silenced genes (S-genes) Calculate score using Bayesian statistics (average over E- gene positions) Propose different topology

S

1

S

3

S

2

E E E E E S

4

E E E E

Likelihood model Statistical Network inference

Review/Method comparison: Fröhlich H, Tresch A, Beißbarth T. Biometrical Journal. 51(2):304-321.

R – Package: nem

slide-18
SLIDE 18

Example from Colorectal Cancer data-set

Knock-down of 5 genes in colorectal (SW480). 2 siRNAs per gene * 3 microarray replikates * 2 controle-siRNAs

Reference: A genomic strategy for the functional validation of colorectal cancer genes identifies potential therapeutic targets. Grade M, Hummon AB, Camps J, Emons G, Spitzner M, Gaedcke J, Hoermann P, Ebner R, Becker H, Difilippantonio MJ, Ghadimi BM, Beißbarth T, Caplen NJ, Ried T. Int J Cancer, 2011, 128(5):1069-79.

slide-19
SLIDE 19

Proteomics Transcriptomics Genomics complexity Pathway databases Data integration

  • Dissolve regulation complexity
  • Compare data from different platforms in a layer-specific way

Pathway-based integration using prior pathway knowledge

? ? Phosphorilation Translation Transcription time Stimulation

slide-20
SLIDE 20

Simplifying assumption: protein phosphorylation corresponds to downstream pathway activation

Knowledge based integrative data analysis approach

Wachter A, Beißbarth T. pwOmics: an R package for pathway-based integration of time-series omics data using public database knowledge. Bioinformatics, 2015, 31(18):3072-4.

slide-21
SLIDE 21

Based on public databases:

  • pathway databases: KEGG, Reactome, PID, Biocarta
  • TF-target databases: ChEA, Pazar, user-specified (e.g.

Transfak)

  • PPI-database: STRING

Pathway databases Protein-protein interactions TF-target relations Phosphoprotein information

Biological databases

Knowledge based integrative data analysis approach

slide-22
SLIDE 22

Wachter A, Beissbarth T, Bioinformatics (2015) Wachter A, Beissbarth T, Front. Genet. (2015)

R package 'pwOmics'

Knowledge based integrative data analysis approach

slide-23
SLIDE 23

Tracking signaling propagation routes Intersection-based analyses

Integrative analysis

slide-24
SLIDE 24
  • Human cell line DG75
  • Identification of BCR induced processes &

druggable signaling pathways

Identification of BCR downstream effectors: 12 % transcriptional, 10 % cytoskeleton regulators 9 % kinases 2 5 10 20 60 120 min RNA-sequencing Phosphoproteomics (SILAC) BCR stimulation (Collaboration with Thomas Oellerich, Henning Urlaub)

Characterization of BCR signaling in Burkitt lymphoma

slide-25
SLIDE 25

BCR stimulation of DG75 Burkitt's lymphoma cells time course data

Phosphoproteome data Transcriptome data

BCR stimulation time (min) (log scale)

240 60 20 10 5 2 120

Number of significantly regulated sites/transcripts at corresponding BCR stimulation durations. Bars above zero-level indicate upregulation numbers, bars below zero-level downregulation numbers.

phosphosites transcripts

slide-26
SLIDE 26
  • Influence of phosphorylation processes on transcriptome dynamics:

Consensus TF→target gene relations at each time point

slide-27
SLIDE 27

Niiro and Clark, Nat Rev Immunol, 2002 Pauls et al., J Immunol, 2016 Niiro et al., Blood, 2012 Su et al., J Biol Chem, 1999 Yin et al., J Biol Chem, 2007 Ingham et al., J Biol Chem, 1996 Castello et al., Nat Immunol, 2013 Goldfeld et al., Proc. Natl. Acad. Sci USA, 1992 Wen et al., J Biol Chem, 2003 Franke et al., Plos One, 2011 Tabrizi et al., J Immunol, 2009 Dörner et al., Autoimmun Rev, 2015 Krzysiek et al., J Immunol, 1999

protein-protein dependencies TF-target relations consensus proteins consensus TFs consensus target genes

→ High concordance with literature → So far mostly level-specific or axis- specific investigation

→ pooling 2, 5, 10 min phosphoproteome time points & 60, 120 min transcriptome time points

Static consensus graph

slide-28
SLIDE 28

Computationally encoding Pathways has several advantages:

  • Seperate data and visualization
  • Ease Knowledge-Exchange
  • Store and curate large amounts of data

Main Pathway Encoding-standards:

  • Ontologies

BioPAX / SBML

  • Graph Representations

KGML / GPML / SBGN <XML> BioPAX Ontology: Pathway = <Interactions>* Interaction = <Entity> activates/inhibits <Conversion> Conversions = <Entity>* → <Entity>*

Encoding Pathway Knowledge

slide-29
SLIDE 29
  • pen-source software package for R:
  • parses BioPAX-encoded pathway databases
  • internally keeps original data structure
  • ffers programmatic modification & merging
  • delivers PWs as graphs, matrices, BioPAX

<XML> <OWL> R <XML> <OWL> rBiopaxParser on Bioconductor

Kramer F, Bayerlová M, Klemm F, Bleckmann A, Beissbarth T. rBiopaxParser--an R package to parse, modify and visualize BioPAX data.

  • Bioinformatics. 2013 Feb 15;29(4):520-2.

rBiopaxParser

slide-30
SLIDE 30

MyPathSem: Generating individualised pathways

Consortium leader: Tim Beißbath Partners: UMG (F Kramer, U Sax, E Wingender, A Bleckmann, J Gaedcke), GeneXplain (A Kel)

p a t i e n t

  • s

p e c i f i c d i s e a s e

  • s

p e c i f i c c

  • h
  • r

t

  • s

p e c i f i c

Clinicians

P a t i e n t A P a t i e n t B P a t i e n t C

Multi-omix

KEGG Reactome PID WikiPath Pathguide EBI Pathway Commons

Pathway

MyPathSem

Researchers

Epigenome Genome DNA Transcriptome mRNA Proteome Proteins Metabolome Small molecules

DNA methylation Histone modifications Non-coding RNA

slide-31
SLIDE 31

Omics Data

Integration Platform

Pathway knowledge

CH3 H3C CH3 O O O N N N

Angular.js CerebralWeb rApache rBiopaxParser ndexr clinical Data Ontology Mapping Service

NCI Thesaurus OBA Service MeSH - ICD

MyPathSem - Daten Integrations Infrastruktur

Docker container

slide-32
SLIDE 32

Statistical Bioinformatics Group, UMG

  • Dr. Andreas Leha
  • Dr. Frank Kramer
  • Dr. Manuel Nietert
  • Dr. med. Annalen Bleckmann
  • Dr. Michaela Biayerlova

Astrid Wachter Julia Perera Bel Alexander Wolf Maren Sitte Florian Auer Zaynab Hammoud Hryhorii Chereda Felix Reinhardt Gonsortium Projects MetastaSys HER2LOW MMML-Demonstrators MyPathSem

Acknowledgements