Network Alignment Using Isomorphic Graphlets Harrison Lee - PowerPoint PPT Presentation

Network Alignment Using Isomorphic Graphlets Harrison Lee 21Nov2019

Problem Introduction: ● It is useful to be able to align similar networks to evaluate their similarity ○ If two networks are very similar, we could roughly overlay them over each other, similar topographies ● To compare larger networks, we need to define a cost matrix to build our alignment on. ○ How do we objectively and quickly define how similar two nodes are? ○ Degree? Neighbors’ degrees? How can we capture local network topographies?

More on Network Alignment Applications ● Network alignment has strong uses in bioinformatics ○ A good network alignment implies strong similarities in topology between the two ● For example, aligning protein-protein interaction (PPI) networks ○ Similarities between PPI Networks could help highlight analogous metabolic pathways. ○ Seeing these similarities can help us infer about or direct research for proteins and their functions based on knowledge on more well-studied, similar analogues. ■ Model organisms, such as yeast or rats, are much easier to study than others, such as humans. ■ Alternatively, this can also highlight the differences in proteins and metabolic pathways, suggesting a way to explain potential differences in function.

The Networks We Initially Test On ● In this project and existing papers, we compare protein-protein interaction networks, though these algorithms are not limited to them. ○ In this case, we use PPI networks such as for H. pylori (stomach ulcers, n=700 nodes, m=1425 edges), Saccharomyces (yeast, n=5113, m=22315) or human (n=9141, m=41456) ● For testing, we look at both largest component and regular high-confidence PPI networks that were noised for comparison ● For automated larger scale testing, we can also generate networks and compare them to a modified/noised version, with a defined similarity, by randomly re-wiring edges. ○ These generated networks can be replaced with other arbitrary types for different network types, and their corresponding topologies.

Current Alignment Methods Are Slow ● Calculating and finding graph isomorphisms is NP-Complete ● Networks such as PPI networks, in the 100s-1000s of node ranges, would be infeasibly slow to calculate without heuristics ○ This does not even consider additional possible applications in huge data sets such as social network data, which is in the billions of nodes range. ● Heuristics like graphlet orbits can reduce those run times without compromising accuracy relative to un-tenable run-time algorithms ● Here we look at graphlet orbits as an option to create a network aligning, and subsequently, similarity scoring, mechanism. ○ While we initially test with PPI networks, this works on patterns in graph topologies, and could be applied in any other types of network, being constrained primarily by network size.

Graphlets, Orbits ● To build the cost matrix, we can begin by looking at isomorphic graphlets ● We define these small sub-graphs as node orbits ○ We begin set-up by constructing all the possible shapes or edge configurations for small, less than 5 node, graphlets Small Graphlet Examples: 2-node (1) ______3-node (2)_______ _______________________4-node (5)________________ 1 1 5 3 4 1 2 4 1 7 9 9 2 6 3 1 1 1 4 4 4 3 1 8 1 2 9 6 9 1 1 4 1 1 2 5 5 7 7 4 0 0 Nodes should be 3 Independent 5 of rotation, 4 1 1 rearrangement 2 5 5

Larger Graphlet Examples ● These larger graphlets better capture local neighborhoods around nodes ○ With higher node graphlets, we can go deeper/further from each source node, increase the similarity to any other nodes that have the equivalent large orbits. ○ We will use the number of orbits each node has to construct our cost matrix. # Nodes | # Orbits ● Here are all the 5-node graphlets, as from an earlier work 2 1 ○ The number of orbits per nodes increase rapidly: there are 73 5-node graphlets 3 3 ○ 4 11 5 57 Malod-Dognin, Noël & Przulj, Natasa. (2015). L-GRAAL: Lagrangian graphlet-based network aligner. Bioinformatics. 10.1093/bioinformatics/btv130.

Treelets # Nodes | # Treelet ● In this project, we build on the existing work by using treelets, Orbits 2 1 instead of graphlets. Treelets are a subset of graphlets. 3 2 ○ Require both isomorphic orbits as well as tree structures 4 4 ○ Trees: All nodes are connected by only one path, and are acyclic. ● Similarly with graphlets, we begin by constructing all the possible small treelets 5 9 ○ 2-Node graphlets and treelets are the same, but on larger nodes the 6 20 difference is more noticeable. 7 48 ● The number of orbits increase less rapidly, allowing us to feasibly use larger treelets Valid Treelets: Invalid: These are graphlets 1 1 1 1 1 7 5 5 3 4 2 9 9 1 5 2 4 1 6 8 1 1 1 1 1 3 6 5 5 4 3 1 4 4 9 9 1 1 7 7 5 0 0 2

Edge Orbits ● We can also define orbits by edge isomorphisms ○ This can be done with both graphlets or treelets Ex: treelet edge orbits graphlet edge orbits # Nodes # Node # Edge # Node # Edge Graphlets Graphlets Treelets Treelets a b 3 3 2 2 1 4 11 10 4 3 a a c c 5 57 57 9 6 d 6 n/a n/a 20 16 7 n/a n/a 48 37

Collect all the orbit counts ● With any of those orbits, we can begin to construct our cost matrix by using the counts of each orbit that each node has. ○ Each node (row) has some number of each orbit (columns) ● Let’s look at one of the networks on the first slide, and focus on one node 4 2 9 3 x 0 1 2 ● Larger orbits are less 0 0 0 3 common and provide 1 0 better accuracy, but are 0 more difficult to calculate 0

Alignment: Cost Matrix Construction ● Now we have orbit counts, how do we turn that into a cost-matrix now? ○ point_A, (point_B), num_orbit1, num_orbit2, ... ● For node orbits, we use node graphlet degree vector (GDV) similarity ○ Let u and v refer to the two nodes we are comparing, k = the total number of orbits we consider ○ With D i and w i referring to GDV similarity/distance and weighting for orbit i ● The signature sum between the two GDVs is then defined as S(u, v)

Edge-Based Costs ● For edges: we also use edge graphlet degree centrality (GDC) as well ○ GDV signature similarity are still used: ● Edge graphlet degree centrality is also used, measuring the extended neighborhood’s complexity, favoring the denser neighborhoods. ○ With edge e, and e i referring to the orbit i count ● The final edge cost function (ECF) is: ○ alpha scales the weighting of GDV vs GDC ● These similarity values in the cost matrix can then be minimized to create our alignment.

Alignment: Aligning according to our Cost Matrix ● With this new cost matrix we now have maps similarities of the n A nodes of network A to the n B nodes of network B. (n a x n b matrix) ○ The only constraint on large networks is that the two should probably be reasonably well connected ● We can use this cost matrix to run a few possible alignment algorithms to map the nodes of each network ○ Greedy Example Cost Matrix ○ Hungarian a1 a2 a3 a4 a5 ○ Just need to minimize total cost In the matrix b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 d1 d2 d3 d4 d5

Assignment Algorithm Examples ● There is further improvement possible in this direction, by using more effective choice algorithms Greedy: Hungarian: Subtract smallest uncovered values 7 9 8 6 8 7 9 8 6 8 1 3 2 0 2 7 9 8 6 8 4 3 7 6 9 4 3 7 6 9 1 0 4 3 6 4 3 7 6 9 5 8 3 1 4 5 8 3 1 4 3 7 2 0 3 5 8 3 1 4 60 43 32 10 80 60 43 32 10 80 50 33 22 0 70 60 43 32 10 80 Total Cost: 43 23

Improvement from using Hungarian Alignment ● Optimal Network Alignment with Graphlet Degree Vectors (Milenković, T., 2010) shows significant improvement when using Hungarian alignment, over the greedy original/initial GRAAL, which also uses graphlets. ● Edge Correctness refers to the our alignment quality measure. It refers to the percentage of edges in one graph that aligned to edges in the other. ● Here, the network is a high confidence yeast PPI network, that is aligned against the same network augmented with interactions from a lower-confidence Version at different noise levels.

Additional Alignment Quality Measures ● We also use node correctness and interaction correctness ○ These both require knowing the true alignment, and correct node or edge mappings.

Yeast-Human Alignment Analysis and Run-time Analysis ● In addition to comparing against what the results from a random mapping would be, the shared Gene Ontology (GO) terms between aligned protein pairs are also counted. # Common 1 2 3 4 GO Terms H-GRAAL 45.38% 14.54% 4.55% 1.3% GRAAL 45.10% 15.60% 5.06% 2.02% ● Run-time: Hungarian Algorithm runs in O(n 3 ) , where n is the number of nodes ○ ○ Collecting orbits and counts takes O(nk) at least, where k is the number of orbits Cost matrix generation over n 2 nodes for k orbits each, takes O(kn 2 ) ○

Network Alignment Using Isomorphic Graphlets Harrison Lee - PowerPoint PPT Presentation

Network Alignment Using Isomorphic Graphlets Harrison Lee 21Nov2019 Problem Introduction: It is useful to be able to align similar networks to evaluate their similarity If two networks are very similar, we could roughly overlay them

Isomorphic Data Type Transformations Alessandro Coglio Stephen Westfold KESTREL INSTITUTE

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Learning Non-Isomorphic Tree Mappings for Machine Translation Syntax-Based Machine Translation

Non-linearity in Davenport-Schinzel Sequences Seth Pettie University of Michigan Isomorphism

On the behavior of pro-isomorphic zeta functions under base extension Michael M. Schein Bar-Ilan

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Data driven Ontology Alignment Data driven Ontology Alignment Nigam Shah nigam@stanford.edu

Sequence Alignment (chapter 6) The biological problem l Global alignment l Local alignment l

Image alignment Slides from Derek Hoiem, Svetlana Lazebnik Image source Alignment applications

TOD Alignment Rezoning Public Meeting July 18, 2019 TOD Alignment Rezoning The TOD Alignment

This week CSE 527 Sequence alignment Computational Biology More sequence alignment

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence Alignment Exercise: Scoring

Discriminative word alignment by learning the Discriminative word alignment by learning the

Sequence Alignment Mark Voorhies 5/29/2013 Mark Voorhies Sequence Alignment Exercise: Scoring

Explosives & Experiments Amanda Kiely Ketchikan Public Library I science Its

Detection of network motifs by local Local Statistics concentration A global statistic Motif

Modelling Biochemical Reaction Networks Lecture 6: Coupling uptake and growth Marc R. Roussel

StashCache Derek Weitzel Open Science Grid (with slides from Brian Bockelman) 1 2015 OSG All

Continuity of Care Ontario Telemedicine Network (OTN) Non profit agency of the Ontario

COVID-19 and LTC October 01, 2020 Questions and Answer Session Use the QA box in the webinar

WEATHER M EDIC INC J ACK W. KANACK Web site: www.weathermedic.com Email:

CompSci 101: Test 2 PRACTISE Peter Lorensen April 8, 2013 Name:

Network Alignment Using Isomorphic Graphlets Harrison Lee - PowerPoint PPT Presentation

Network Alignment Using Isomorphic Graphlets Harrison Lee 21Nov2019 Problem Introduction: It is useful to be able to align similar networks to evaluate their similarity If two networks are very similar, we could roughly overlay them

Isomorphic Data Type Transformations Alessandro Coglio Stephen Westfold KESTREL INSTITUTE

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Learning Non-Isomorphic Tree Mappings for Machine Translation Syntax-Based Machine Translation

Non-linearity in Davenport-Schinzel Sequences Seth Pettie University of Michigan Isomorphism

On the behavior of pro-isomorphic zeta functions under base extension Michael M. Schein Bar-Ilan

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Data driven Ontology Alignment Data driven Ontology Alignment Nigam Shah nigam@stanford.edu

Sequence Alignment (chapter 6) The biological problem l Global alignment l Local alignment l

Image alignment Slides from Derek Hoiem, Svetlana Lazebnik Image source Alignment applications

TOD Alignment Rezoning Public Meeting July 18, 2019 TOD Alignment Rezoning The TOD Alignment

This week CSE 527 Sequence alignment Computational Biology More sequence alignment

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence Alignment Exercise: Scoring

Discriminative word alignment by learning the Discriminative word alignment by learning the

Sequence Alignment Mark Voorhies 5/29/2013 Mark Voorhies Sequence Alignment Exercise: Scoring

Explosives &amp; Experiments Amanda Kiely Ketchikan Public Library I science Its

Detection of network motifs by local Local Statistics concentration A global statistic Motif

Modelling Biochemical Reaction Networks Lecture 6: Coupling uptake and growth Marc R. Roussel

StashCache Derek Weitzel Open Science Grid (with slides from Brian Bockelman) 1 2015 OSG All

Continuity of Care Ontario Telemedicine Network (OTN) Non profit agency of the Ontario

COVID-19 and LTC October 01, 2020 Questions and Answer Session Use the QA box in the webinar

WEATHER M EDIC INC J ACK W. KANACK Web site: www.weathermedic.com Email:

CompSci 101: Test 2 PRACTISE Peter Lorensen April 8, 2013 Name:

Explosives & Experiments Amanda Kiely Ketchikan Public Library I science Its