1
Mining, Indexing, and Similarity Search in Graphs and Complex Structures
Jiawei Han Xifeng Yan
Department of Computer Science University of Illinois at Urbana-Champaign
Philip S. Yu
IBM T. J. Watson Research Center
- ✁
- ✡
Mining, Indexing, and Similarity Search in Graphs and Complex - - PDF document
1 3 4 5 2
1: makepat 2: esc 3: addstr 4: getccl 5: dodash 6: in_set_2 7: stclose
(1)
1 3 4 5 2 1 3 4 5 2 6 7
(2) (3)
1 3 4 5 2
(1)
3 4 5 2
(2)
+
✻+
N N S OH S HO O O N N O O OH O N N+ NH N O N HO OH O N O N
N N
OH O N O N OH O N N+ NH N O N HO N N S OH S HO O O N N O OOH O N N+ NH N O N HO O N O N
N N S OH S HO O O N N O O
✘✚✙✜✛✚✢✤✣✦✥✧✛✩★✪✛✚✫✬✛✮✭✧✯N N
OH O N N+ NH N O N HO O N O N
N N S OH S HO O O N N O O
2 1
i n
n
2 1
0.0E+00 2.0E+04 4.0E+04 6.0E+04 8.0E+04 1.0E+05 1.2E+05 1.4E+05
20 30 40 50 60 70 80
2K 4K 6k 8k 10k
From scratch Incremental
k
greedy
10 100 1000 10000 1 2 3 4
c1 c2… cm g1 .1 .2… .2 g2 .4 .3… .4 … c1 c2… cm g1 .8 .6… .2 g2 .2 .3… .4 … c1 c2… cm g1 .9 .4… .1 g2 .7 .3… .5 … c1 c2… cm g1 .2 .5… .8 g2 .7 .1… .3 …
a b c d e f g h i j k a b c d e f g h i j k a b c d ef g h i j k a b d e f g h i j k c
a b c d e f g h i j k a b c d e f g h i j k a b c d e f g h i j k a b d e f g h i j k c
(1)
(2)
(3)
… … … … … … … 1 1 1 e-f 1 1 1 c-i 1 1 1 c-h 1 1 1 1 c-f 1 1 1 1 c-e G6 G5 G4 G3 G2 G1 E
edge occurrence profiles
c e f h e g h i
Step 4 Step 5
Sub(G)
a b d e g h i c f a b c d e f g h i a b c d e f g h i a b c d e f g h i a b d e f g h i c a b c d e f g h i a b c d e f g h i
G1 G3 G2 G6 G5 G4
c-f c-h c-e e-h e-f f-h c-i e-i e-g g-i h-i
second-order graph S
g-h f-i
Step 1 Step 3
summary graph ✘
e g h i c f
Sub(
✙ ✚Step 2
c-f c-h c-e e-h e-f f-h e-i e-g g-i h-i
Sub(S)
g-h
Step 6
✛ ✜ ✢ ✣ ✤ ✥ ✦ ✦ ✧ ★ ✩ ✪ ✛ ✜ ✢ ✣ ✤ ✫ ✬ ✭ ✪ ✮ ✯ ✬ ✰✲✱ ✳ ✦ ✛ ✜ ✢ ✣ ✤ATP17 ATP12 MRPL38 MRPL37 MRPL39 FMC1 MRPS18 MRPL32 ACN9 MRPL51 MRP49 YDR115W PHB1 PET100
ATP17 ATP12 MRPL38 MRPL39 FMC1 MRPS18 MRPL32 ACN9 MRPL51 MRP49 YDR115W PHB1 PET100
PET100
Red:PHB1,ATP17,MRPL51,MRPL39, MRPL49, MRPL51,PET100 GO:0006091(generation of precursor metabolites and energy; pvalue=0. 001339)
ATP17 ATP12 MRPL38 MRPL37 MRPL39 FMC1 MRPS18 MRPL32 ACN9 MRPL51 MRP49 YDR115W PHB1 PET100
1 3 4 5 2
1: makepat 2: esc 3: addstr 4: getccl 5: dodash 6: in_set_2 7: stclose
(1)
1 3 4 5 2 1 3 4 5 2 6 7
(2) (3)
molecules”, ICDM'02
✛PKDD'05.
✛Classifying Chemical Compounds”, ICDM 2003
✛BIOKDD'02
✛KDD'98
✛Graphs”, ICML’05
✛COLT/Kernel’03
✛from protein structure graphs”, RECOMB’04
isomorphism”, ICDM'03
✙Biological Networks for Functional Discovery
✛ , ISMB'05 ✙from graph data”, PKDD'00
✙Chemical Information Systems, Inc., 2003.
✙subgraphs in biological networks”, Bioinformatics, 20:I200--I207, 2004.
✙ICDM’04
✙Bugs’'', SDM'05
ICML’04
✙databases”. KDD'04
✙searching”, PODS'02
✙data”, ICDM'02
✙databases”, KDD'04
✙5:59-68, 2003
✙KDD'05
✙ICDE'06
✙