graphblast multi feature graphs database searching
play

GraphBlast: multi-feature graphs database searching Alfredo Ferro, - PowerPoint PPT Presentation

GraphBlast: multi-feature graphs database searching Alfredo Ferro, Rosalba Giugno, Misael Mongiov, Alfredo Pulvirenti, Dmitry Skripin, Dennis Shasha University of Catania, New York University June 12-15, 2007, University of Pisa, Italy


  1. GraphBlast: multi-feature graphs database searching Alfredo Ferro, Rosalba Giugno, Misael Mongioví, Alfredo Pulvirenti, Dmitry Skripin, Dennis Shasha University of Catania, New York University June 12-15, 2007, University of Pisa, Italy

  2. Graphs in Bio-Chemistry Pathways Gene Ontologies Collection of Molecules Biological Networks

  3. Motivation for Searching in Graphs (molecules, networks) •Prediction of the functionality of new natural or synthesized compounds •Make a compound Q more active •Find fragment with the same function among different species •Predict protein function •Predict protein interaction

  4. Graphs Searching Steps Graphs searching is an NP-problem Database 0 3 B C g 1 1 2 A B Query processing 1 D 2 3 B C g 2 4 5 B 6 E Candidate A Index Verification 0 B Construction 4 g 3 C 2B 3 C 1 Load from DB A Filtering: find candidate Index Indexing is crucial to reduce the search space and make the problem affordable!

  5. GraphBlast Index • For each graph in DB – Find all paths of length from 1 to L (4,10) – Count how many occurrences of each path in each graph Database B 3 0 C A 1 2 B g 1 1 Key g 1 g 2 g 3 g 2 D h(CB) 2 2 2 2 3 B C ….. … … …. B A 4 5 6 E g 3 h(ABCA) 2 0 0 0 B ….. ….. ….. …. C 4 B 2 3 C 1 A

  6. GraphBlast Query Processing Query Specification Query Fingerprint Query Decomposition Filtering in Patterns 1 key Query B 0 1 C h(CB) 2 0 Key g 1 g 2 g 3 C B 2 3 ….. … h(CB) 2 2 2 2 3 h(ABCA) 1 ….. … … …. A B A B ….. ….. h(ABCA) 2 0 0 ….. ….. ….. …. Database Fingerprint Filtering Step1: Filtering Step2: Select all Select all Candidate Candidate SubGraphs Graphs Filtered Database (1) Filtered Database (2) A CB B C B C (3,0) 0 3 0 3 (1) (3,2) 1 2 1 2 BA ABC A B A B g 1 (0,1) ABCA g 1 (1,3) CB (2,1) (1,0) (3,0) (3,0) (0,3) (3,2) (3,2) (3,1) ABCA (1,2) (1,0) (2,3) ……. (0,3) g 1 (3,1) g 1 (1,2) (2,3)

  7. Approximate Searches Run GraphBlast to Find occurrences for each full specified query subgraph Check (DFS) if the approximate connections exist

  8. Improving Indexing using Graph Data Mining Database 0 3 B C Low support g 1 1 2 Data Mining A B Query processing 1 D 2 3 B C g 2 4 5 B 6 E A Candidate Verification 0 B 4 g 3 C 2B 3 C 1 Load from DB A Index Filtering: construction find candidate Index

  9. Min hashing: Low Support Data Mining Technique for Indexing Size Reduction GraphBlast gIndex (TODS 2005)

  10. Performance Compression ratio % (Min-Hashing) Database GraphBlas gIndex t Molecular 62.5 30.4 Regular 2D 56.4 76.6 Irregular 56.9 52.9 2D Valence 48.0 9.4 Irr. 48.0 7.9 Valence Random 43.4 70.5 Preprocessing Time Query Time-Molecular DB Database GraphBlast gIndex Query Size GraphBlast gIndex Molecular 1314.0 13750.0 11 0.00 0.04 Regular 2D 6.4 746.0 19 0.01 0.08 Irregular 11.6 4587.3 43 0.00 0.05 2D 58 0.01 0.42 Valence 6.2 7.2 148 0.01 0.31 Irr. Valence 6.3 7.6 239 0.00 0.60 Random 3511.0 > 3 days

  11. Distributed GraphBlast for searching in a Large Network Network SUBNetwork (1) Query SUBNetwork (2) Cut-Edges SUBNetwork (3) Query Occurrences on the Network

  12. Reference • D. Shasha, J.T-L Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. Proceeding of the ACM Symposium on Principles of Database Systems (PODS), pages 39–52, 2002. • Luigi P. Cordella, Pasquale Foggia, Carlo Sansone, Mario Vento: A (Sub) Graph Isomorphism Algorithm for Matching Large Graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10): 1367-1372 (2004) • R. Giugno, D. Shasha, GraphGrep: A Fast and Universal Method for Querying Graphs. Proceeding of the IEEE International Conference in Pattern recognition (ICPR), Quebec, Canada, August 2002. • Yan X, Yu PS, Han J: Graph Indexing Based on Discriminative Frequent Structure Analysis. ACM Transactions on Database Systems 2005, 30 (4):960-993 • Cohen E, Datar M, Fujiwara S, Gionis A, Indyk P, Motwani R, Ullman JD, Yang C: Finding interesting associations without support pruning. IEEE Transactions on Knowledge and Data Engineering 2001, 13:64-78. • Michelle Girvan, M. E. J. Newman, Community structure in social and biological networks, PNAS, June 11, 2002 vol. 99 no. 12 7821–7826.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend