Finding representatives in a heterogeneous network Laura Langohr - - PowerPoint PPT Presentation

finding representatives in a heterogeneous network
SMART_READER_LITE
LIVE PREVIEW

Finding representatives in a heterogeneous network Laura Langohr - - PowerPoint PPT Presentation

Outline Introduction K -medoids Experiments Future Work Conclusion Finding representatives in a heterogeneous network Laura Langohr Department of Computer Science University of Helsinki May 19, 2009 Laura Langohr Finding representatives


slide-1
SLIDE 1

Outline Introduction K-medoids Experiments Future Work Conclusion

Finding representatives in a heterogeneous network

Laura Langohr

Department of Computer Science University of Helsinki

May 19, 2009

Laura Langohr Finding representatives in a heterogeneous network

slide-2
SLIDE 2

Outline Introduction K-medoids Experiments Future Work Conclusion

Introduction K-medoids Experiments Future Work Conclusion

Laura Langohr Finding representatives in a heterogeneous network

slide-3
SLIDE 3

Outline Introduction K-medoids Experiments Future Work Conclusion

Motivation

  • Finding representative vertices
  • Given a list of 100 vertices
  • But only resources to study 10 vertices
  • Cluster 100 vertices in 10 clusters
  • For each cluster suggest a vertex as representative

Laura Langohr Finding representatives in a heterogeneous network

slide-4
SLIDE 4

Outline Introduction K-medoids Experiments Future Work Conclusion

Example graph

  • A
  • B
  • C
  • D
  • E
  • F

0.55 0.51

  • 0.54
  • 0.62

0.71

  • 0.83
  • 0.63
  • 0.55

0.78

  • 0.9

0.72

  • 0.9
  • 0.64

0.54

  • 0.71
  • Laura Langohr

Finding representatives in a heterogeneous network

slide-5
SLIDE 5

Outline Introduction K-medoids Experiments Future Work Conclusion

K-medoids

  • Clustering method
  • Objects are partitioned into k clusters
  • First, an initial partitioning is created
  • The partition is then iteratively improved
  • Cluster centers are objects → medoids

Laura Langohr Finding representatives in a heterogeneous network

slide-6
SLIDE 6

Outline Introduction K-medoids Experiments Future Work Conclusion

Algorithm

  • 1. K objects are randomly chosen as medoids
  • 2. Assign remaining objects to the medoid that is the nearest
  • 3. Calculate new medoid for each cluster

Laura Langohr Finding representatives in a heterogeneous network

slide-7
SLIDE 7

Outline Introduction K-medoids Experiments Future Work Conclusion

K-means

  • K-medoids is similar to k-means
  • K-means uses mean value as cluster center

Laura Langohr Finding representatives in a heterogeneous network

slide-8
SLIDE 8

Outline Introduction K-medoids Experiments Future Work Conclusion

K-medoids vs k-means

Laura Langohr Finding representatives in a heterogeneous network

slide-9
SLIDE 9

Outline Introduction K-medoids Experiments Future Work Conclusion

K-medoids in a heterogeneous network

  • Select few representatives from a large set of vertices
  • Representatives should be independent of each other
  • Relations between two vertices in a graph → link
  • Including undiscovered relations
  • Undiscovered relations are manifested as path(s)

Laura Langohr Finding representatives in a heterogeneous network

slide-10
SLIDE 10

Outline Introduction K-medoids Experiments Future Work Conclusion

Measure for link strength

  • Probability of a path is the product of the probabilities of the

edges along the path g(p) = k

i=1 w(ei)

  • Probability of the best path between two vertices

Pbp = max

p∈Pa(G,o,o′) g(p)

A

  • B
  • C
  • D
  • E
  • F
  • 0.55

0.51

  • 0.54
  • 0.62

0.71

  • 0.83
  • Laura Langohr

Finding representatives in a heterogeneous network

slide-11
SLIDE 11

Outline Introduction K-medoids Experiments Future Work Conclusion

Algorithm

  • 1. Calculate similarity matrix
  • 2. Choose k objects randomly as initial medoids
  • 3. Assign each remaining object to the most similar medoid
  • 4. Calculate new medoid for each cluster

medoid(Cj) = argmax

  • ∈Cj
  • ′∈Cj
  • ′=o

Pbp(G, o, o′) Repeat steps 3. and 4. until clustering converges

Laura Langohr Finding representatives in a heterogeneous network

slide-12
SLIDE 12

Outline Introduction K-medoids Experiments Future Work Conclusion

Biomine

  • 12 biological databases are integrated
  • Over 1 million vertices
  • Over 9 million edges

Gene:7299

  • Pathway:04916
  • Gene:4157
  • Gene:434
  • Phenotype:203200
  • Gene:4948
  • 0.55

0.51

  • 0.54
  • 0.62

0.71

  • 0.83
  • http://biomine.cs.helsinki.fi

Laura Langohr Finding representatives in a heterogeneous network

slide-13
SLIDE 13

Outline Introduction K-medoids Experiments Future Work Conclusion

Artificial example

  • Three phenotypes, for each three genes
  • k-medoids with nine genes, and k = 3

Laura Langohr Finding representatives in a heterogeneous network

slide-14
SLIDE 14

Outline Introduction K-medoids Experiments Future Work Conclusion

Result

Laura Langohr Finding representatives in a heterogeneous network

slide-15
SLIDE 15

Outline Introduction K-medoids Experiments Future Work Conclusion

Future Work

  • Hierarchical clustering
  • Statistical evaluation
  • Comparison to an existing method

Laura Langohr Finding representatives in a heterogeneous network

slide-16
SLIDE 16

Outline Introduction K-medoids Experiments Future Work Conclusion

Conclusion

  • Finding representative vertices, e.g. genes
  • K-medoids on Biomine
  • Example with nine genes is promising

Laura Langohr Finding representatives in a heterogeneous network