Network Alignment Using Isomorphic Graphlets Harrison Lee - - PowerPoint PPT Presentation

network alignment using isomorphic graphlets
SMART_READER_LITE
LIVE PREVIEW

Network Alignment Using Isomorphic Graphlets Harrison Lee - - PowerPoint PPT Presentation

Network Alignment Using Isomorphic Graphlets Harrison Lee 21Nov2019 Problem Introduction: It is useful to be able to align similar networks to evaluate their similarity If two networks are very similar, we could roughly overlay them


slide-1
SLIDE 1

Network Alignment Using Isomorphic Graphlets

Harrison Lee 21Nov2019

slide-2
SLIDE 2

Problem Introduction:

  • It is useful to be able to align similar networks to evaluate their similarity

○ If two networks are very similar, we could roughly overlay them over each other, similar topographies

  • To compare larger networks, we need to define a cost matrix to build our

alignment on.

○ How do we objectively and quickly define how similar two nodes are? ○ Degree? Neighbors’ degrees? How can we capture local network topographies?

slide-3
SLIDE 3

More on Network Alignment Applications

  • Network alignment has strong uses in bioinformatics

○ A good network alignment implies strong similarities in topology between the two

  • For example, aligning protein-protein interaction (PPI) networks

○ Similarities between PPI Networks could help highlight analogous metabolic pathways. ○ Seeing these similarities can help us infer about or direct research for proteins and their functions based on knowledge on more well-studied, similar analogues. ■ Model organisms, such as yeast or rats, are much easier to study than others, such as humans. ■ Alternatively, this can also highlight the differences in proteins and metabolic pathways, suggesting a way to explain potential differences in function.

slide-4
SLIDE 4

The Networks We Initially Test On

  • In this project and existing papers, we compare protein-protein interaction

networks, though these algorithms are not limited to them.

○ In this case, we use PPI networks such as for H. pylori (stomach ulcers, n=700 nodes, m=1425 edges), Saccharomyces (yeast, n=5113, m=22315) or human (n=9141, m=41456)

  • For testing, we look at both largest component and regular high-confidence PPI networks that were

noised for comparison

  • For automated larger scale testing, we can also generate networks and compare them to a

modified/noised version, with a defined similarity, by randomly re-wiring edges. ○ These generated networks can be replaced with other arbitrary types for different network types, and their corresponding topologies.

slide-5
SLIDE 5

Current Alignment Methods Are Slow

  • Calculating and finding graph isomorphisms is NP-Complete
  • Networks such as PPI networks, in the 100s-1000s of node ranges, would be

infeasibly slow to calculate without heuristics

○ This does not even consider additional possible applications in huge data sets such as social network data, which is in the billions of nodes range.

  • Heuristics like graphlet orbits can reduce those run times without

compromising accuracy relative to un-tenable run-time algorithms

  • Here we look at graphlet orbits as an option to create a network aligning, and

subsequently, similarity scoring, mechanism.

○ While we initially test with PPI networks, this works on patterns in graph topologies, and could be applied in any other types of network, being constrained primarily by network size.

slide-6
SLIDE 6

Nodes should be Independent

  • f rotation,

rearrangement

Graphlets, Orbits

  • To build the cost matrix, we can begin by looking at isomorphic graphlets
  • We define these small sub-graphs as node orbits

○ We begin set-up by constructing all the possible shapes or edge configurations for small, less than 5 node, graphlets Small Graphlet Examples: 2-node (1) ______3-node (2)_______ _______________________4-node (5)________________ 1 1 2 3 2 4 4 4 7 8 7 7 9 9 9 9 5 6 6 5

1 2 1 1 1 1 1 4 1 3 1 4 1 3 1 5 1 5 1 5 1 5

4 4 4 2 3 2

slide-7
SLIDE 7

Larger Graphlet Examples

  • These larger graphlets better capture local neighborhoods around nodes

○ With higher node graphlets, we can go deeper/further from each source node, increase the similarity to any other nodes that have the equivalent large orbits. ○ We will use the number of orbits each node has to construct our cost matrix.

  • Here are all the 5-node graphlets, as from an earlier work

○ The number of orbits per nodes increase rapidly: there are 73 5-node graphlets ○

Malod-Dognin, Noël & Przulj, Natasa. (2015). L-GRAAL: Lagrangian graphlet-based network aligner. Bioinformatics. 10.1093/bioinformatics/btv130.

2 1 3 3 4 11 5 57 # Nodes | # Orbits

slide-8
SLIDE 8

Treelets

  • In this project, we build on the existing work by using treelets,

instead of graphlets. Treelets are a subset of graphlets.

○ Require both isomorphic orbits as well as tree structures ○ Trees: All nodes are connected by only one path, and are acyclic.

  • Similarly with graphlets, we begin by constructing all the possible small treelets

○ 2-Node graphlets and treelets are the same, but on larger nodes the difference is more noticeable.

  • The number of orbits increase less rapidly, allowing us to feasibly use larger treelets

Valid Treelets: Invalid: These are graphlets 1 1 2 3 2 7 8 7 7 5 6 6 5

1 2 1 1 1 1

4 4 4 9 9 9 9

1 4 1 3 1 4 1 3 1 5 1 5 1 5 1 5 2 1 3 2 4 4 5 9 6 20 7 48 # Nodes | # Treelet Orbits

slide-9
SLIDE 9

Edge Orbits

  • We can also define orbits by edge isomorphisms

○ This can be done with both graphlets or treelets a a a b c c d Ex: treelet edge orbits graphlet edge orbits

# Nodes # Node Graphlets # Edge Graphlets # Node Treelets # Edge Treelets 3 3 2 2 1 4 11 10 4 3 5 57 57 9 6 6 n/a n/a 20 16 7 n/a n/a 48 37

slide-10
SLIDE 10

Collect all the orbit counts

  • With any of those orbits, we can begin to construct our cost matrix by using

the counts of each orbit that each node has.

○ Each node (row) has some number of each orbit (columns)

  • Let’s look at one of the networks on the first slide, and focus on one node
  • Larger orbits are less

common and provide better accuracy, but are more difficult to calculate

x 2 3 1 4 9 2

3

1

slide-11
SLIDE 11

Alignment: Cost Matrix Construction

  • Now we have orbit counts, how do we turn that into a cost-matrix now?

○ point_A, (point_B), num_orbit1, num_orbit2, ...

  • For node orbits, we use node graphlet degree vector (GDV) similarity

○ Let u and v refer to the two nodes we are comparing, k= the total number of orbits we consider ○ With Di and wi referring to GDV similarity/distance and weighting for orbit i

  • The signature sum between the two

GDVs is then defined as S(u, v)

slide-12
SLIDE 12

Edge-Based Costs

  • For edges: we also use edge graphlet degree centrality (GDC) as well

○ GDV signature similarity are still used:

  • Edge graphlet degree centrality is also used, measuring the extended

neighborhood’s complexity, favoring the denser neighborhoods.

○ With edge e, and ei referring to the orbiti count

  • The final edge cost function (ECF) is:

○ alpha scales the weighting of GDV vs GDC

  • These similarity values in the cost matrix can then be minimized to create our

alignment.

slide-13
SLIDE 13

Alignment: Aligning according to our Cost Matrix

  • With this new cost matrix we now have maps similarities of the nA nodes of

network A to the nB nodes of network B. (na x nb matrix)

○ The only constraint on large networks is that the two should probably be reasonably well connected

  • We can use this cost matrix to run a few possible alignment algorithms to map

the nodes of each network

○ Greedy Example Cost Matrix ○ Hungarian ○ Just need to minimize total cost In the matrix a1 a2 a3 a4 a5 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 d1 d2 d3 d4 d5

slide-14
SLIDE 14

Assignment Algorithm Examples

  • There is further improvement possible in this direction, by using more

effective choice algorithms Greedy: Hungarian: Subtract smallest uncovered values

Total Cost: 43 23 7 9 8 6 8 4 3 7 6 9 5 8 3 1 4 60 43 32 10 80 7 9 8 6 8 4 3 7 6 9 5 8 3 1 4 60 43 32 10 80 1 3 2 2 1 4 3 6 3 7 2 3 50 33 22 70 7 9 8 6 8 4 3 7 6 9 5 8 3 1 4 60 43 32 10 80

slide-15
SLIDE 15
  • Optimal Network Alignment with Graphlet Degree Vectors (Milenković, T., 2010) shows significant

improvement when using Hungarian alignment, over the greedy original/initial GRAAL, which also uses graphlets.

  • Edge Correctness refers to the
  • ur alignment quality measure. It refers

to the percentage of edges in one graph that aligned to edges in the other.

  • Here, the network is a high confidence

yeast PPI network, that is aligned against the same network augmented with interactions from a lower-confidence Version at different noise levels.

Improvement from using Hungarian Alignment

slide-16
SLIDE 16
  • We also use node correctness and interaction correctness

○ These both require knowing the true alignment, and correct node or edge mappings.

Additional Alignment Quality Measures

slide-17
SLIDE 17

Yeast-Human Alignment Analysis and Run-time Analysis

  • In addition to comparing against what the results from a random mapping

would be, the shared Gene Ontology (GO) terms between aligned protein pairs are also counted.

  • Run-time:

○ Hungarian Algorithm runs in O(n3), where n is the number of nodes ○ Collecting orbits and counts takes O(nk) at least, where k is the number of orbits ○ Cost matrix generation over n2 nodes for k orbits each, takes O(kn2) # Common GO Terms 1 2 3 4 H-GRAAL 45.38% 14.54% 4.55% 1.3% GRAAL 45.10% 15.60% 5.06% 2.02%

slide-18
SLIDE 18

Bibliography, questions

Milenković, T., Ng, W. L., Hayes, W., & Pržulj, N. (2010). Optimal network alignment with graphlet degree vectors. Cancer informatics, 9, CIN-S4744. Crawford, J., & Milenković, T. (2015, November). Great: graphlet edge-based network alignment. In 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp. 220-227). IEEE. Slota, G. M. (2016). Irregular Graph Algorithms on Modern Multicore, Manycore, and Distributed Processing Systems.