Inferring Cancer Subnetwork Markers using Density-Constrained - - PowerPoint PPT Presentation

inferring cancer subnetwork markers
SMART_READER_LITE
LIVE PREVIEW

Inferring Cancer Subnetwork Markers using Density-Constrained - - PowerPoint PPT Presentation

Introduction Methods Experimental Results Inferring Cancer Subnetwork Markers using Density-Constrained Biclustering Phuong Dao , 1 , Recep Colak , 3 Raheleh Salari 1 , Flavia Moser 4 , Elai Davicioni 5 Alexander Schnhuth , 2 ,


slide-1
SLIDE 1

Introduction Methods Experimental Results

Inferring Cancer Subnetwork Markers

using Density-Constrained Biclustering Phuong Dao∗,1, Recep Colak∗,3 Raheleh Salari1, Flavia Moser4, Elai Davicioni5 Alexander Schönhuth†,2, Martin Ester1,†

1School of Computing Science, Simon Fraser University, Canada 2Centrum Wiskunde & Informatica, Amsterdam, Netherlands 3Department of Computing Science, University of Toronto, Canada 4Center for Disease Control, University of British Columbia 5GenomeDX Biosciences Inc.

∗: Joint first authors, †: Joint corresponding, last authors

slide-2
SLIDE 2

Introduction Methods Experimental Results

Introduction

Personalized Medicine

  • Determination of disease status based on patient

genetics/genomics

  • Goal: Specific, individual choice of treatment
  • Necessary: Reliable disease markers
slide-3
SLIDE 3

Introduction Methods Experimental Results

Introduction

Personalized Medicine

  • Determination of disease status based on patient

genetics/genomics

  • Goal: Specific, individual choice of treatment
  • Necessary: Reliable disease markers
  • Monogenic: Each marker is a single gene
  • Multigenic: Each marker is a set of genes
slide-4
SLIDE 4

Introduction Methods Experimental Results

Single Gene Markers

Gene 6 Gene 4 Gene 2 Gene 1 Case 1 Case 2 Case 3 Control 1 Control 2 Control 3 Gene 2 Gene 4 Gene 6 Gene 5 Gene 3 Gene 1 Case 1 Case 2 Case 3 Control 1 Control 2 Control 3

Differentially Expressed Non−Differentially Expressed

Gene 5 Gene 3

Caveat: Single gene markers vary significantly across different studies

slide-5
SLIDE 5

Introduction Methods Experimental Results

Marker Selection

Multigenic Traits

G2

Gene 4 Gene 2 Gene 1 Case 1 Case 2 Case 3 Control 1 Control 2 Control 3

Gene Expression Profiles Interaction/Association Network

Gene 4 Gene 3 Gene 2 Gene 1

(0.85) (0.75) (0.8) (0.9) (0.95) G1 G3 G4

Gene 3

Solution: Differentially expressed genes participating in the same pathway [Chuang et al., 2007], [Chowdhury et al. 2010]

slide-6
SLIDE 6

Introduction Methods Experimental Results

Our Approach

Each of our subnetwork markers:

  • is a

densely connected subnetwork ☞ Disease-related genes have more PPI interactions than expected [Goh et al., PNAS (2007)]

  • contains genes which are differentially expressed

in a subset of samples ☞ cancer tumors vary greatly in phenotype, although belonging to the same (sub)type [Hampton et al., GR (2009)]

slide-7
SLIDE 7

Introduction Methods Experimental Results

Density-Constrained Biclusters

Definition: G is called α-dense if

P

e∈E we

(|V|

2 )

≥ α ≥ 0.5.

0.75 0.9 0.85 0.7 0.95

S1 S2 S3 G1 G2 G3 G4

1 1 1 1 1 1 1 1 1 1

S1 S2 S3

1 1 1 1 1 1 1 1 1

G4 G5 G6 G7 G2 G4 G1 G3 0.8 0.75 0.85 0.95 0.9 G4 0.7 0.9 G6 0.95 0.85 G7 G5

0.3 0.65 0.75 0.45 0.95 0.55 0.7 0.8 0.45 0.95 0.75 0.6 0.85 0.8 0.25 0.9 0.9 0.5 0.9 0.95 0.65 0.35 0.750.8 0.8 0.9 0.8 0.9 0.95 0.85 0.8 0.9

Our markers are α-densely connected subnetworks of genes that are differentially expressed in a subset of patients of size at least k (here: k = 2).

slide-8
SLIDE 8

Introduction Methods Experimental Results

Methods

slide-9
SLIDE 9

Introduction Methods Experimental Results

Density Constrained Biclustering

Search Strategy

Theorem: Every α-densely connected network of size n contains an α-densely connected subnetwork of size n − 1.

maximal wDCB

B D 0.8 A C 0.6 B A 0.4 A D 0.9 B C D C A B D 0.4 0.9 0.8 A C D 0.6 0.9 B D C 0.8 A C B 0.6 0.4 0.8 0.9 0.6 0.4 C A

D

B

Not Connected Not Dense

0.8 0.9 0.6 0.4 C A

D

B

= [(0.8 + 0.9 + 0.6 + 0.4) / 6] Density: 0.45 wDCB

Search Strategy: Breadth-first search.

slide-10
SLIDE 10

Introduction Methods Experimental Results

Classification

  • 1. Marker computation: Feature space creation

marker = dimension

  • 2. Construct classifier using training data
  • 3. Perform classification on test data

Cross-platform study: Marker computation and test data from different platforms

slide-11
SLIDE 11

Introduction Methods Experimental Results

Experimental Results

slide-12
SLIDE 12

Introduction Methods Experimental Results

Network Data

Confidence-scored PPI network

[STRING, von Mering et al., NAR 2009]

  • Edges reflect physical

protein-protein interactions

  • Confidence scores reflect the

probability that the interaction is associated with a cellular phenomenon (and not an experimental artifact)

  • Scoring system based on KEGG

pathways

0.95 0.3 0.65 0.75 0.45 0.95 0.55 0.7 0.8 0.45 0.95 0.75 0.6 0.85 0.8 0.25 0.9 0.9 0.95 0.5 0.9 0.85 0.95 0.75 0.8 0.65 0.35 0.750.8 0.8 0.9 0.8 0.9 0.9 0.85 0.7 0.9

slide-13
SLIDE 13

Introduction Methods Experimental Results

Gene Expression Data

Colon cancer

  • GSE8671, 32 patients / tissue pairs
  • GSE10950, 24 patients / tissue pairs
  • GSE6988, 123 samples across several cancer subtypes

Breast cancer

  • GSE3494, 251 patients with different TP53 mutation status (wildtype vs.

mutant)

slide-14
SLIDE 14

Introduction Methods Experimental Results

Colon Cancer

Prediction 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 5 10 15 20 25 30 35 40 45 50 AUC #Subnetworks/Genes GSE8671 >> GSE6988 SGM GMI NETCOVER wDCB

slide-15
SLIDE 15

Introduction Methods Experimental Results

Colon Cancer

Prognosis

0.4 0.5 0.6 0.7 0.8 0.9 1 10 20 30 40 50 AUC # Subnetworks/Genes GSE8671 >> GSE6988 prognosis SGM GMI NETCOVER wDCB

slide-16
SLIDE 16

Introduction Methods Experimental Results

Colon Cancer: Prognosis

Accuracy

8671→6988, Prognosis 10950→6988, Prognosis K SGM GMI NC wDCB SGM GMI NC wDCB 1 0.57 0.57 0.51 0.56 0.57 0.68 N/A 0.47 5 0.74 0.62 0.74 0.6 0.63 0.81 N/A 0.68 10 0.76 0.77 0.74 0.88 0.57 0.77 N/A 0.74 20 0.72 0.62 0.77 0.83 0.61 0.79 N/A 0.85 30 0.65 0.74 0.83 0.88 0.63 0.81 N/A 0.85 40 0.67 0.79 0.83 0.90 0.78 0.85 N/A 0.89 50 0.74 0.77 0.81 0.92 0.76 0.85 N/A 0.91 Top values previous methods Top value our method

slide-17
SLIDE 17

Introduction Methods Experimental Results

Breast Cancer

TP53 Wildtype vs. Mutant

0.7 0.75 0.8 0.85 0.9 5 10 15 20 25 Accuracy # Subnetworks/Genes GSE3494 (Miller et al.) SGM (mappable) GMI (mappable) wDCB (mappable) SPM (not mappable)

slide-18
SLIDE 18

Introduction Methods Experimental Results

Subnetwork Marker Statistics

# Subnetworks Enrichment # Subnetworks Enrichment GMI 806 0.38 755 0.34 NC 923 0.12 N/A N/A wDCB 282 0.76 216 0.74 8671 Subnetworks 10950 Subnetworks

GMI = Greedy Mutual Information (Chuang et al.) NC = NetCover (Chowdhury et al.) wDCB = weighted Density Constrained Biclustering # Subnetworks = total number of subnetworks computed Enrichment = enrichment rate of the top-50 markers

slide-19
SLIDE 19

Introduction Methods Experimental Results

Top Markers in GSE8671

  • Enriched with DNA replication

initiation (p=6.39e-14), DNA metabolic process (p=6.15e-12)

  • TP53, BRCA1: tumor suppressor

genes

  • Minichromosome maintenance

(MCM) complex

  • MCM2, MCM5: early markers for

colon cancer (Burger et al., 2008)

slide-20
SLIDE 20

Introduction Methods Experimental Results

Outlook / Acknowledgments

Outlook:

  • Analyze subnetwork signatures
  • ncRNA-protein interaction data

Acknowledgments:

  • Mehmet Koyutürk
  • David DesJardins, Google Inc.
  • Lab for Mathematical and Computational Biology, UC Berkeley
slide-21
SLIDE 21

Introduction Methods Experimental Results

Thanks for the attention!

slide-22
SLIDE 22

Introduction Methods Experimental Results

Densely Connected Subnetworks

Properties

Let G = (V, E) be a network with edge weights we, e ∈ E.

  • The density θ(G) of G is

θ(G) :=

  • e∈E we

|V|

2

  • = 2 ·

e∈E we

|V|(|V| − 1) where |V|

2

  • is the number of possible edges in G.
  • G is called α-dense if

θ(G) ≥ α ≥ 0.5

  • An α-dense, connected network G is called α-densely

connected.

slide-23
SLIDE 23

Introduction Methods Experimental Results

Classifier Construction

  • 1. Rank density constrained

biclusters according to density significance

  • 2. Keep only high-ranked

subnetworks with little overlap

  • 3. Feature space dimension =

number of markers

  • 4. SVM classification

Average Gene Expression Profile

1.25 1.5 1.0 1.25 0.5 0.0 0.25 Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 Gene 7 1.25 0.5 Marker 1 Marker 2 0.8 0.95 0.85 0.75 0.9 G4 G6 0.95 G2 G4 G3 G1 0.7 0.9 0.85 G5 G7

Average Gene Expression Profile

slide-24
SLIDE 24

Introduction Methods Experimental Results

Colon Cancer: Prediction

Accuracy

8671→6988 10950→6988 K SGM GMI NC wDCB SGM GMI NC wDCB 1 0.56 0.84 0.72 0.84 0.63 0.37 N/A 0.77 5 0.73 0.72 0.72 0.82 0.82 0.68 N/A 0.86 10 0.76 0.76 0.83 0.85 0.82 0.81 N/A 0.88 20 0.80 0.84 0.86 0.89 0.84 0.83 N/A 0.89 30 0.80 0.83 0.84 0.91 0.83 0.85 N/A 0.85 40 0.85 0.85 0.87 0.90 0.84 0.84 N/A 0.89 50 0.85 0.84 0.85 0.93 0.81 0.82 N/A 0.89 Top values previous methods, our method