Loredana Martignetti, Laurence Calzone, Eric Bonnet, Emmanuel - - PowerPoint PPT Presentation

loredana martignetti laurence calzone eric bonnet
SMART_READER_LITE
LIVE PREVIEW

Loredana Martignetti, Laurence Calzone, Eric Bonnet, Emmanuel - - PowerPoint PPT Presentation

Loredana Martignetti, Laurence Calzone, Eric Bonnet, Emmanuel Barillot and Andrei Zinovyev . ROMA: representation and quantification of module activity from target expression data (Highlight talk) institut Curie - INSERM U900 - PSL Research


slide-1
SLIDE 1

Loredana Martignetti, Laurence Calzone, Eric Bonnet, Emmanuel Barillot and Andrei Zinovyev . ROMA: representation and quantification of module activity from target expression data (Highlight talk)

institut Curie - INSERM U900 - PSL Research University / Mines ParisTech Computational Systems Biology of Cancer

Genome Informatics Workshop (GIW 2016), Shanghai, 5 October 2016

slide-2
SLIDE 2

Outline

ROMA: Representation and Quantification of Module Activity

  • Why ROMA?
  • Principles of ROMA
  • Examples of application

– Involvment of Notch and P53 pathways in colon cancer invasiveness – Integrating transcriptome data into a mathematical model of metastasis network – Analysing expression time series of oncogenesis in Ewing' sarcoma – Transcriptional signature of p53 in bladder carcinoma

slide-3
SLIDE 3

Quantifying Network Module Activity

  • Genome
  • Epigenome
  • Transcriptome
  • Proteome
slide-4
SLIDE 4

In cancer the same biological process can be affected by alterations in different individual genes

slide-5
SLIDE 5

Reasoning in terms of active/inactive gene-sets rather than single differentially expressed genes

Gene set = target genes co-regulated by the same TFs Gene set = genes involved in a common signalling pathway

Question: which gene sets (modules) are playing a role in my set of samples?

slide-6
SLIDE 6

Quantification of gene-set activity by PCA (Principal Component Analysis)

F

Gene n

α1

F

α2

F

α3

F

αn

F

The uni-factor linear model of gene expression regulation :

Expr(gene g, sample S ) ~ αg

(F)

AS

(F)

αg

(F)

= response coefficient of gene g to factor F AS

(F)

= activity of factor F in sample s The values αg

(F)

and AS

(F)

are given by the PC1 (first principal component) of the gene set min (Expr(gene g, sample S) - αg

(F)

AS

(F)

)

2

Gene 1 Gene 2 Gene 3

PC1 PC2

gene set

slide-7
SLIDE 7

Identification of active/inactive gene-sets by PCA

Overdispersion (high L1) reflects activation (differential across the samples) Testing if the PC1 variance L1 of a gene-set significantly exceeds the genome-wide background expectation (overdispersion)

PC1 PC2 L1 = amount of variance captured by the PC1

Fan J et al, Nature Methods 2016 Tomfohr et al, BMC Bioinformatics 2005

slide-8
SLIDE 8

ROMA features: assessing the statistical significance of gene-set activity (overdispersion) Statistical significance of L1 is assessed by estimating the null distribution from random set of genes having representative sizes

PC1 PC2

L1 distribution strongly depend on the size of the gene set

PC1 PC2

Small gene set size Large gene set size

slide-9
SLIDE 9

Identification of coordinated gene-sets by PCA

High L1/L2 reflects coordination across the samples Testing if the spectral gap L1/L2 of a gene-set significantly exceeds the expectation is assessed by estimating the null distribution from random set of genes having representative sizes (coordination)

PC1 PC2

L1 = amount of variance captured by the PC1 L2 = amount of variance captured by the PC2

PC1 PC2

L1 = amount of variance captured by the PC1 L2 = amount of variance captured by the PC2

slide-10
SLIDE 10

Significances of overdispersion and coordination might differ

Eg: E2F1 targets are usually overdispersed but not coordinated, contrary to E2F3

PC1 PC2

L1 = amount of variance captured by the PC1 L2

PC1 PC2

L1 L2 = amount of variance captured by the PC2

not coordinated coordinated

PC2

  • verdispersed

not overdispersed

PC2

L1 L1 L2 L2

PC1 PC1

slide-11
SLIDE 11

The ROMA algorithm

Expression data matrix Gene sets (gmt format) Extract expression submatrices Compute PCA based module activities Statistical significance of overdispersion and coordination

Module activity score in each condition Gene projections

  • n PC1

Overdispersion estimation L1+ pv Coordinated expr. estimation L1/L2 + pv

OUTPUT

https://github.com/sysbio-curie/Roma Martignetti et al, Front Genet. 2016

slide-12
SLIDE 12

ROMA features different patterns of overdispersion: Symmetric overdispersion vs displacement wrt data center

An example of active pathway in single-cell transcriptomics

Computed using PCA with fixed center:

slide-13
SLIDE 13

Modification of PCA: PCA with fixed center

PC1 with fixed center PC1 PC1 PC1 with fixed center

slide-14
SLIDE 14

ROMA uses weighted genes to include a priori knowledge

In ROMA, some weights wg per gene can be assigned by the user, eg:

  • Positive weights for “positively regulated genes” and negative for “inhibited genes”
  • Weights reflecting confidence in gene expression quantification (eg dropout problem in

single cell data, Fan et al 2016)

  • Weights reflecting transcription factor binding strength

Weighted PCA is then computed: These weights are also used for orienting the principal component and defining activated/repressed samples and activating/repressing genes:

Green wg >0; Red wg <0; Blue unknown

PC

slide-15
SLIDE 15

ROMA computes robust PCA

Outlier genes abnormally affecting PC1 are identified by “leave one out” procedure and removed from the gene-set

PC1 PC2

L1 estimate is affected by one single gene

In ROMA outlier genes are identified by leave-one-out procedure: → computing L1 n times (n = gene set size) removing at each time one gene in the gene set → outliers are identified as those genes that dramatically increase L1

slide-16
SLIDE 16

Application 1: Involvment of Notch and P53 pathways in colon cancer invasiveness

slide-17
SLIDE 17

Using ROMA to understand control of Epithelial-Mesenchymal Transition (EMT)

  • Network modelling of Notch, p53

and Wnt pathways and their control of EMT phenotype

  • Prediction: p53 inhibition + Notch intracellular domain

activation are required for EMT and metastasis

  • Experimental validation in mouse
  • What about human tumors?

Chanrion*, Kuperstein* et al, Nat. Comm 2014

slide-18
SLIDE 18
  • Two groups: metastatic patients (M1) / non metastatic patients (M0)
  • Prediction: Notch activation is associated to EMT and tumour invasiveness

Single NOTCH signalling genes do not show differential expression between invasive and non invasive tumours

TCGA data (colorectal cancer)

slide-19
SLIDE 19

Module activity analysis on TCGA colon cancer data

Differential module activity analysis gives statistically significant signals Notch pathway Wnt pathway P53 pathway

A11 a12 a13 .... A21 a22 a23 .… A31 a32 a33 …. S1 S2 S3 M1 M1 M0 ...

Module activity matrix

slide-20
SLIDE 20

Application 2: Consistency between transcriptome data and a mathematical model of metastasis network

slide-21
SLIDE 21

Master model of metastasis (boolean framework)

Cohen D et al, PLoS Comp Bio 2015

slide-22
SLIDE 22

Pathways included in the module analysis

Module activity is a valuable input of network simulation for one sample

slide-23
SLIDE 23

Application 3: EWS/FLI1 target time series expression In Ewing' sarcoma

slide-24
SLIDE 24

A malignant genomic translocation and appearance of a chimeric gene EWS/FLI-1 whose activity leads to the uncontrolled cell growth. EWS/FLI1 transcription factor in Ewing's sarcoma

EWS/FLI1 expressed EWS/FLI1 silenced

1 2 3 4 5 6 D10 D13 D17 D21 Nb of cells (X106)

slide-25
SLIDE 25

Putative mechanisms of EWS/FLI1 mediated gene regulation Time series of transcriptome 1 to 21 days after chimeric

  • ncogene activation shows EWS-FLI1 target regulation:

positively and negatively regulated targets

slide-26
SLIDE 26

MODULE L1 L1_pv L1/L2 L1/L2_pv NUMBER_OF_GENES EWS/FLI_Down_signature 0.57 0.005 3.02 0.142 280 EWS/FLI_Up_signature 0.47 0.087 2.88 0.12 492 EWS/FLI_All_signature 0.52 0.001 2.96 0.06 769

Activity score of EWS-FLI1 downstream targets

One single test combining down and up regulated genes Outperforms other approaches where down and up sets cannot be combined

slide-27
SLIDE 27

Application 4: Transcriptional signature of p53 in bladder cancer

slide-28
SLIDE 28

P53 activity vs P53 mutation status in bladder cancer

PCA analysis of the p53 signature (49 genes) from MSigDB P53 mutated tumors P53 non-mutated unknown +

  • Non-functional p53

(various mechanisms) Functional p53 (non-effective mutations) PC1 PC2

Low_grade Low_grade High_grade High_grade

P53 Activity P53 Mutation Total

Number of samples

P53 activity predicts tumor progression much better than P53 mutation status

Data from F. Radvanyi et al, I. Curie (198 tumors)

slide-29
SLIDE 29

Take-home message

  • ROMA: sample-unsupervised method for quantifying gene module activity
  • Characterize each module by overdispersion, coordination, activity/sample
  • Can quantify transcription factor activity, protein activity …
  • Use transcriptome or proteome data
  • Application to single-cell data
  • Much more sensitive than usual single gene approaches
  • Available as a java software in github.com/sysbio-curie/Roma
  • Check also Martignetti et al, 2016

Loredana Martignetti, Laurence Calzone, Eric Bonnet, Emmanuel Barillot, Andrei Zinovyev (2016) ROMA: Representation and Quantification of Module Activity from Target Expression Data. Frontiers in genetics

slide-30
SLIDE 30

Acknowledgements

U900 institut Curie - INSERM – PSL Research University / Mines ParisTech Computational Systems Biology of Cancer team

Luca Albergante Emmanuel Barillot Eric Bonnet (now at CEA / CNG) Laurence Calzone Laura Cantini Urszula Czerwinska Paul Deveau Maria Kondratova Mihaly Koltai Inna Kuperstein

Loredana Martignetti *

* Credit for slides

Topics of interest of the group: Modeling biological network High dimensional statistics High-throughput data analysis u900.curie.fr sysbio.curie.fr www.cancer-systems-biology.net

Marine Le Morvan Gaëlle Letort Arnau Montagud Muneeza Patel Yuvia Perez-Rico Jean-Marie Ravel Daniel Rovera Pauline Traynard Andrei Zinovyev *

Funding

Cell fate decision - deciphering molecular mechanisms Finding therapeutic targets and combination treatment Enabling precision medicine

Postdoc positions open: Emmanuel.Barillot@curie.fr

slide-31
SLIDE 31
slide-32
SLIDE 32

Quantification of gene-set activity by PCA

F

Gene n

α1

F

α2

F

α3

F

αn

F

The uni-factor linear model of gene expression regulation :

Expr(gene g, sample S ) ~ αg

(F)

AS

(F)

The values αg

(F)

and AS

(F)

are given by the PC1 metagene of the gene set and the level

  • f this metagene in each sample

Gene 1 Gene 2 Gene 3

PC1 PC2

= x A1

(F)

… AS

(F)

Expr(g,S) α1

(F) . . .

αg

(F) . . .

samples Genes from the gene set

gene set

slide-33
SLIDE 33
slide-34
SLIDE 34

In progress: Cytoscape plugin for ROMA graphical interface

  • 1. Load a gmt file with gene-sets that will be visualized as a network of meta-nodes

ga gb gc ga gk gj ga gb gc gd ge

  • 2. Load expression data matrix
  • 3. Run ROMA analysis
  • 4. Color meta-nodes according to the activity scores

Edges width is proportional to the number of shared genes

slide-35
SLIDE 35

How to use ROMA in practice Visit github.com/sysbio-curie/Roma Command line usage: java -jar roma_v1.0.jar [required options] [other options]

slide-36
SLIDE 36

ROMA working group

Andrei Zinovyev Loredana Martignetti Laurence Calzone Laura Cantini Eric Bonnet (now at CEA / CNG Evry) Emmanuel Barillot > Other examples of application can be explored > We look for high quality TF target gene sets collections

slide-37
SLIDE 37

Problem 2: Choosing zero

  • Zero as the mean value (what PCA gives by

default). Variants: median

  • Zero for module activities based on reference

samples (ex, normal samples)

  • Computing module activities for PCA with

fixed center

slide-38
SLIDE 38

Problem 3: Choosing the right scale

  • Variance?
  • Consistency in signs between module «activators»

and «inhibitors»

  • Biological «significance» (ability to separate clinical

sample groups)

  • Idea: associate scale to statistical significance of the

«spectral gap» between the first and the second eigen value in PCA ( λ1 / λ2 ) – do random sampling of gene sets of the same size (more conservative: may be with approx. the same variance distribution), compute the p-value, use –log p-value for scaling

slide-39
SLIDE 39

ROMA features: assessing the statistical significance of gene-set activity (overdispersion and coordination) Statistical significance of L1 and L1/L2 is assessed by estimating the null distribution of L1 and L1/L2 from random set of genes having representative sizes Significance of L1 and L1/L2 might differ in a gene set (eg E2F1 targets are usually overdispersed but not coordinated )

PC1 PC2

L1 and L1/L2 strongly depend on the size of the gene set

PC1 PC2

Small gene set size Large gene set size

slide-40
SLIDE 40

NETWORK BASED APPROACHES TO DEFEAT CANCER Quantifying Network Module Activity

Emmanuel Barillot institut Curie - INSERM U900 - PSL Research University / Mines ParisTech Computational Systems Biology of Cancer

BioNetVisA workshop, ECCB 2016 – The Hague, 4 September 2016