GraphBlast: multi-feature graphs database searching Alfredo Ferro, - - PowerPoint PPT Presentation

graphblast multi feature graphs database searching
SMART_READER_LITE
LIVE PREVIEW

GraphBlast: multi-feature graphs database searching Alfredo Ferro, - - PowerPoint PPT Presentation

GraphBlast: multi-feature graphs database searching Alfredo Ferro, Rosalba Giugno, Misael Mongiov, Alfredo Pulvirenti, Dmitry Skripin, Dennis Shasha University of Catania, New York University June 12-15, 2007, University of Pisa, Italy


slide-1
SLIDE 1

June 12-15, 2007, University of Pisa, Italy

GraphBlast: multi-feature graphs database searching

Alfredo Ferro, Rosalba Giugno, Misael Mongioví, Alfredo Pulvirenti, Dmitry Skripin, Dennis Shasha University of Catania, New York University

slide-2
SLIDE 2

Gene Ontologies Pathways Biological Networks Collection of Molecules

Graphs in Bio-Chemistry

slide-3
SLIDE 3

Motivation for Searching in Graphs (molecules, networks)

  • Prediction of the functionality of new

natural or synthesized compounds

  • Make a compound Q more active
  • Find fragment with the same function

among different species

  • Predict protein function
  • Predict protein interaction
slide-4
SLIDE 4

Graphs Searching Steps

Database

3 2 1 B C A B

g1 g2

4 B C 1 A 2B 3 C

g3

2 3 5 4 B C A B 1 D 6 E

Index Construction

Index

Query processing

Candidate Verification Filtering: find candidate Load from DB

Indexing is crucial to reduce the search space and make the problem affordable!

Graphs searching is an NP-problem

slide-5
SLIDE 5

GraphBlast Index

  • For each graph in DB

– Find all paths of length from 1 to L (4,10) – Count how many occurrences of each path in each graph 2 h(ABCA) …. ….. ….. ….. …. … … ….. 2 2 2 h(CB) g3 g2 g1 Key

Database

3 2 1 B C A B

g1 g2

4 B C 1 A 2 B 3 C

g3

2 3 5 4 B C A B 1 D 6 E

slide-6
SLIDE 6

Database Fingerprint

2 h(ABCA) …. ….. ….. ….. …. … … ….. 2 2 2 h(CB) g3 g2 g1 Key

Query Fingerprint

….. ….. 1 h(ABCA) … ….. 2 h(CB) Query key

Query Processing

1 3 2 B C A B Query Decomposition in Patterns

Filtered Database (1)

3 2 1 B C A B

g1

Filtering Step1: Select all Candidate Graphs Filtering Step2: Select all Candidate SubGraphs

3 2 1 B C A B

g1

A (1) CB (3,0) (3,2) BA (0,1) (2,1) ABC (1,3) (3,0) (3,2) ABCA (1,0) (0,3) (3,1) (1,2) (2,3) …….

Filtered Database (2)

CB (3,0) (3,2) ABCA (1,0) (0,3) (3,1) (1,2) (2,3)

g1 g1

Query Specification

GraphBlast Filtering

2 3 A B 1 B

C

slide-7
SLIDE 7

Approximate Searches

Run GraphBlast to Find occurrences for each full specified query subgraph Check (DFS) if the approximate connections exist

slide-8
SLIDE 8

Improving Indexing using Graph Data Mining

Database

3 2 1 B C A B

g1 g2

4 B C 1 A 2B 3 C

g3

2 3 5 4 B C A B 1 D 6 E

Index construction Index Query processing Candidate Verification Filtering: find candidate Load from DB Data Mining Low support

slide-9
SLIDE 9

Min hashing: Low Support Data Mining Technique for Indexing Size Reduction

gIndex (TODS 2005) GraphBlast

slide-10
SLIDE 10

Performance

Database GraphBlas t gIndex Molecular 62.5 30.4 Regular 2D 56.4 76.6 Irregular 2D 56.9 52.9 Valence 48.0 9.4 Irr. Valence 48.0 7.9 Random 43.4 70.5 Database GraphBlast gIndex Molecular 1314.0 13750.0 Regular 2D 6.4 746.0 Irregular 2D 11.6 4587.3 Valence 6.2 7.2

  • Irr. Valence

6.3 7.6 Random 3511.0 > 3 days Query Size GraphBlast gIndex 11 0.00 0.04 19 0.01 0.08 43 0.00 0.05 58 0.01 0.42 148 0.01 0.31 239 0.00 0.60

Preprocessing Time Compression ratio % (Min-Hashing) Query Time-Molecular DB

slide-11
SLIDE 11

Distributed GraphBlast for searching in a Large Network

Network SUBNetwork (1) SUBNetwork (2) SUBNetwork (3) Cut-Edges Query Occurrences

  • n the Network

Query

slide-12
SLIDE 12

Reference

  • D. Shasha, J.T-L Wang, and R. Giugno. Algorithmics and applications of

tree and graph searching. Proceeding of the ACM Symposium on Principles of Database Systems (PODS), pages 39–52, 2002.

  • Luigi P. Cordella, Pasquale Foggia, Carlo Sansone, Mario Vento: A (Sub)

Graph Isomorphism Algorithm for Matching Large Graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10): 1367-1372 (2004)

  • R. Giugno, D. Shasha, GraphGrep: A Fast and Universal Method for

Querying Graphs. Proceeding of the IEEE International Conference in Pattern recognition (ICPR), Quebec, Canada, August 2002.

  • Yan X, Yu PS, Han J: Graph Indexing Based on Discriminative Frequent

Structure Analysis. ACM Transactions on Database Systems 2005, 30 (4):960-993

  • Cohen E, Datar M, Fujiwara S, Gionis A, Indyk P, Motwani R, Ullman JD,

Yang C: Finding interesting associations without support pruning. IEEE Transactions on Knowledge and Data Engineering 2001, 13:64-78.

  • Michelle Girvan, M. E. J. Newman, Community structure in social and

biological networks, PNAS, June 11, 2002 vol. 99 no. 12 7821–7826.