Using Graph Theory to Analyze Gene Network Coherence Francisco A. - - PowerPoint PPT Presentation

using graph theory to analyze gene network coherence
SMART_READER_LITE
LIVE PREVIEW

Using Graph Theory to Analyze Gene Network Coherence Francisco A. - - PowerPoint PPT Presentation

Using Graph Theory to Analyze Gene Network Coherence Francisco A. Gmez-Vela Norberto Daz-Daz fgomez@upo.es ndiaz@upo.es Jess S. Aguilar Jos A. Lagares Jos A. Snchez 1 Outlines n Introduction n Proposed Methodology n


slide-1
SLIDE 1

1

Using Graph Theory to Analyze Gene Network Coherence

José A. Lagares Jesús S. Aguilar Norberto Díaz-Díaz ndiaz@upo.es Francisco A. Gómez-Vela fgomez@upo.es José A. Sánchez

slide-2
SLIDE 2

2

Outlines

n Introduction n Proposed Methodology n Experiments n Conclusions

slide-3
SLIDE 3

3

Outlines

n Introduction n Proposed Methodology n Experiments n Conclusions

slide-4
SLIDE 4

n There is a need to generate patterns of expression, and

behavioral influences between genes from microarray.

n GNs arise as a visual and intuitive solution for gene-

gene interaction.

n They are presented as a graph:

q Nodes: are made up of genes. q Edges: relationships among these genes.

4

Introduction

Gene Network

slide-5
SLIDE 5

5

Introduction

Gene Network

slide-6
SLIDE 6

6

Introduction Gene Network

n Many GN inference algorithms have been developed as

techniques for extracting biological knowledge

q Ponzoni et al., 2007. q Gallo et al., 2011.

n They can be broadly classified as (Hecker M, 2009):

q Boolean Network q Information Theory Model q Bayesian Networks

slide-7
SLIDE 7

7

Introduction

Gene Network Validation in Bioinformatics

n Once the network has been generated, it is very

important to assure network reliability in order to illustrate the quality of the generated model.

q

Synthetic data based validation

  • This approach is normally used to validate new

methodologies or algorithms.

q

Well-Known data based validation

  • The literature prior knowledge is used to validate

gene networks.

slide-8
SLIDE 8

Introduction

Well-Known Biological data based Validation

n The quality of a GN can be measured by a direct

comparison between the obtained GN and prior biological knowledge (Wei and Li, 2007; Zhou and Wong, 2011).

n However, these approaches are not entirely accurate as

they only take direct gene–gene interactions into account for the validation task, leaving aside the weak (indirect) relationships (Poyatos, 2011).

8

slide-9
SLIDE 9

9

Outlines

n Introduction n Proposed Methodology n Experiments n Conclusions

slide-10
SLIDE 10

10

Proposed Methodology

n The main features of our method:

q Evaluate the similarities and differences between gene

networks and biological database.

q Take into account the indirect gene-gene relationships for the

validation process.

q Using Graph Theory to evaluate with gene networks and

  • btain different measures.
slide-11
SLIDE 11

11

Proposed Methodology

B A D C

Input Network Biological Database

B A E C F

Distance Matrices

DMIN DMDB

Floyd Warshall Algorithm

slide-12
SLIDE 12

12

Proposed Methodology

B A D C

Input Network Biological Database

B A E C F

DMIN DMDB

Coherence Matrix

CM

CM = |DMIN – DMDB|

CM=|DMi – DMj|

slide-13
SLIDE 13

13

Proposed Methodology

Floyd-Warshall Algorithm

n This approach is a graph analysis method that solves

the shortest path between nodes.

Network

A B C E F A 2 1 1 2 B 2 1 1 2 C 1 1 2 1 E 1 1 2 1 F 2 2 1 1

Distance Matrix

B F E A C

slide-14
SLIDE 14

Proposed Methodology

Distance Threshold

n Distance threshold (δ)

q It is used to exclude relationships that lack biological meaning. q This threshold denotes the maximum distance to be considered

as relevant in the Distance Matrix generation process.

q If the minimum distance between two genes is greater than δ,

then no path between the genes will be assumed.

14

slide-15
SLIDE 15

A B C E F A 2 1 1 2 B 2 1 1 2 C 1 1 2 1 E 1 1 2 1 F 2 2 1 1

15

Network Distance Matrix

δ = 1

Proposed Methodology

Distance Threshold

B F E A C A B C E F A 2 1 1 2 B 2 1 1 2 C 1 1 2 1 E 1 1 2 1 F 2 2 1 1

slide-16
SLIDE 16

A B C E F A 2 1 1 2 B 2 1 1 2 C 1 1 2 1 E 1 1 2 1 F 2 2 1 1

16

Network Distance Matrix

δ = 1

Proposed Methodology

Distance Threshold

B F E A C A B C E F A 2 1 1 2 B 2 1 1 2 C 1 1 2 1 E 1 1 2 1 F 2 2 1 1 A B C E F A ∞ 1 1 ∞ B ∞ 1 1 ∞ C 1 1 ∞ 1 E 1 1 ∞ 1 F ∞ ∞ 1 1

slide-17
SLIDE 17

17

Proposed Methodology

Coherence Matrix (CM) A B C D A 1 3 2 B 1 2 1 C 3 2 1 D 2 1 1 A B C A 1 ∞ B 1 1 C ∞ 1 A B C D A 1 ∞ 2 B 1 2 1 C ∞ 2 1 D 2 1 1 A B C E F A 0 2 1 2 2 B 2 0 1 1 2 C 1 1 0 1 1 E 2 1 1 0 1 F 2 2 1 1 0 A B C E F A 0 2 1 2 2 B 2 0 1 1 2 C 1 1 0 1 1 E 2 1 1 0 1 F 2 2 1 1 0

DMIN DMDB

CM=|DMi – DMj|

slide-18
SLIDE 18

n Coherence Level threshold (θ)

q This threshold denotes the maximum coherence level to be

considered as relevant in the Coherence Matrix.

q It is used to obtain well-Known indices by using the elements of

the coherence matrix:

Proposed Methodology

Obtaining Measures

18

CMi,j

|∞-y| (α) |v-y|>θ FP FN |∞-∞|(β) TN |v-y|<=θ TP

0< v,y <∞

slide-19
SLIDE 19

A B C D E A

  • 1

α

4 7 B 1

  • β

2 5 C

α β

  • 1

8 D 4 2 1

  • 1

E 7 5 8 1

  • 19

Coherence Matrix

θ = 3

Proposed Methodology

slide-20
SLIDE 20

A B C D E A

  • 1

α

4 7 B 1

  • β

2 5 C

α β

  • 1

8 D 4 2 1

  • 1

E 7 5 8 1

  • 20

Coherence Matrix

θ = 3

Proposed Methodology

A B C D E A

  • 1

α

4 7 B 1

  • β

2 5 C

α β

  • 1

8 D 4 2 1

  • 1

E 7 5 8 1

  • A

B C D E A

  • TP

α

4 7 B TP

  • β

TP 5 C

α β

  • TP

8 D 4 TP TP

  • TP

E 7 5 8 TP

  • A

B C D E A

  • TP

α

4 7 B TP

  • β

TP 5 C

α β

  • TP

8 D 4 TP TP

  • TP

E 7 5 8 TP

  • A

B C D E A

  • TP

α

FP FP B TP

  • β

TP FP C

α β

  • TP

FP D FP TP TP

  • TP

E FP FP FP TP

  • A

B C D E A

  • TP

α

FP FP B TP

  • β

TP FP C

α β

  • TP

FP D FP TP TP

  • TP

E FP FP FP TP

  • A

B C D E A

  • TP

FN FP FP B TP

  • β

TP FP C FN

β

  • TP

FP D FP TP TP

  • TP

E FP FP FP TP

  • A

B C D E A

  • TP

FN FP FP B TP

  • β

TP FP C FN

β

  • TP

FP D FP TP TP

  • TP

E FP FP FP TP

  • A

B C D E A

  • TP

FN FP FP B TP

  • TN

TP FP C FN TN

  • TP

FP D FP TP TP

  • TP

E FP FP FP TP

slide-21
SLIDE 21

21

Outlines

n Introduction n Proposed Methodology n Experiments n Conclusions

slide-22
SLIDE 22

22

Results

Real data experiment

n Input networks were obtained by applying four

inference network techniques on the well-known yeast cell cycle expression data set (Spellman et al., 1998).

  • Soinov et al., 2003.
  • Bulashevska et al., 2005.
  • Ponzoni (GRNCOP) et al., 2007

n Comparison with Well-Known data:

  • BioGrid
  • KEGG
  • SGD
  • YeastNet
slide-23
SLIDE 23

23

Results

Real data experiment

n Several studies were carried out using different

threshold value combinations:

q Distance threshold (δ) and Coherence level threshold (θ) have

been modified from one to five, generating 25 different combinations.

n The results show that the higher δ and θ values, the

greater is the noise introduced.

q The most representative result, was obtained for δ=4 and θ=1.

slide-24
SLIDE 24

24

Results

Biogrid KEGG SGD YeastNet Soinov Bulashevska Ponzoni

0,27 0,58 0,31 0,29 0,65 0,34 0,53 0,50 0,82 0,28 1 1 0,42 0,48 0,47 0,45 0,79 0,50 0,69 0,66 0,90 0,43 1 1

Accuracy F-measure Accuracy F-measure Accuracy F-measure

slide-25
SLIDE 25

25

Results

q These results are consistent with the experiment

carried out in Ponzoni et al., 2007.

q Ponzoni was successfully compared with Soinov and

Bulashevska approaches.

slide-26
SLIDE 26

26

Outlines

n Introduction n Proposed Methodology n Experiments n Conclusions

slide-27
SLIDE 27

27

Conclusions

n A new approach of a gene network validation framework

is presented:

q The methodology not only takes into account the direct

relationships, but also the indirect ones.

q Graph theory has been used to perform validation task.

slide-28
SLIDE 28

28

Conclusions

n Experiments with Real Data.

q These results are consistent with the experiment carried out in

Ponzoni et al., 2007.

q Ponzoni was successfully compared with Soinov and

Bulashevska approaches.

q These behaviours are also found in the obtained results. Ponzoni

presents better coherence values than Soinov and Bulashevska in BioGrid, SGD and YeastNet.

slide-29
SLIDE 29

29

Future Works

n The methodology has been improved:

q The elements in coherence matrix will be weighted based on the

gene-gene relationships distance.

q A new measure, based on different databases will be generated.

n Moreover, a Cytoscape plugin will be implemented.

slide-30
SLIDE 30

Some References

30

Pavlopoulos GA, et al. (2011): Using graph theory to analyze biological networks. BioData Mining, 4:10. Asghar A, et al (2012) Speeding up the Floyd–Warshall algorithm for the cycled shortest path problem. AppliedMathematics Letters 25(1): 1 Bulashevska S and Eils R (2005) Inferring genetic regulatory logic from expression

  • data. Bioinformatics 21(11):2706.

Ponzoni I, et al (2007) Inferring adaptive regulationthresh-olds and association rules from gene expressiondata through combinatorial optimization learning.IEEE/ACM Transaction on Computation Biology andBioinformatics 4(4):624. Poyatos JF (2011). The balance of weak and strong interactions in genetic networks. PloS One 6(2):e14598.

slide-31
SLIDE 31

31

Using Graph Theory to Analyze Gene Network Coherence

Thanks for your attention