G3NA-V Global GPU-based Gene Network Alignment Visualization Karan - - PowerPoint PPT Presentation

g3na v
SMART_READER_LITE
LIVE PREVIEW

G3NA-V Global GPU-based Gene Network Alignment Visualization Karan - - PowerPoint PPT Presentation

AN NVIDIA POWERED BIO-GRAPH ALIGNMENT AND VISUALIZATION TOOL G3NA-V Global GPU-based Gene Network Alignment Visualization Karan Sapra Melissa C. Smith, Alex Feltus, Joshua Levine ACCELERATING DISCOVERY Problem Statement World Population


slide-1
SLIDE 1

G3NA-V

Global GPU-based Gene Network Alignment Visualization

AN NVIDIA POWERED BIO-GRAPH ALIGNMENT AND VISUALIZATION TOOL

Karan Sapra

Melissa C. Smith, Alex Feltus, Joshua Levine

ACCELERATING DISCOVERY

slide-2
SLIDE 2

Problem Statement

Crop Yield World Population Estimates Cancer and other Infectious Diseases “Human Prosperity depends on our ability to understand genomes of

  • rganism and change them for the better (or worse).”
slide-3
SLIDE 3

Overview of Genomic Discoveries

Construct Gene Expression Identify Modules Relate Modules using External information Sequencing and Sampling Collecting multiple Samples from tissues

from organism, etc. Correlation, Topological Analysis, thresholding, clustering Network Alignment (G3NA) Utilize GO Ontology, Evolutionary Tree, Molecular Structure

slide-4
SLIDE 4

G3NA-V Workflow

Conserved Graph Alignment (Compute ) Clustering and GUI Ontologies Molecular Visualization Multi Network Alignment Gene Expression Network Gene Expression Matrix Sample Curve Distribution Evolutionary Tree

ACCELERATING DISCOVERY

slide-5
SLIDE 5

Complex biological systems can be modeled as graphs…

Rice graph mapped to genome

Genenet Engine: sysbio.genome.clemson.edu

Higher Yield!!!!

Edge (Gene Interaction) Node (Gene)

Alzheimer’s (Plaque in Brain)

slide-6
SLIDE 6

Aligned Graphs

Paleogenomics: Conserved subgraphs can be detected by graph alignment…

Conserved Subgraphs Evidence: Maize-Rice Ancestor Shared Similar Gene Interaction Patterns 50-70 Millions of Years Ago

Ficklin & Feltus "Gene coexpression network alignment and conservation of gene modules between two grass species: maize and` rice." Plant Physiology 156:3 (2011)

Maize Rice

slide-7
SLIDE 7

G3NA-V Workflow

Conserved Graph Alignment (Compute ) Clustering and GUI Ontologies Molecular Visualization Multi Network Alignment Gene Expression Network Gene Expression Matrix Sample Curve Distribution Evolutionary Tree

ACCELERATING DISCOVERY

slide-8
SLIDE 8

G3NA-V Overview

Compute Engine Visualization Engine

Preprocessing Network Alignment Postprocessing Graph Algorithm Edge and Cluster Matching Node Matching Visualization(Graph, Molecule, Matrix, etc) Update Visualization Apply Filtering

tcp mpi

  • sg

shared-memory

Daemon (Message Passing Control Unit)

slide-9
SLIDE 9

Daemon (Message Passing Control Unit)

  • User activity offload Computation task such as multiple alignment, clustering, data

reduction, ray-casting, Parsing, etc.

  • Use Shared Memory / TCP / UDP
  • Fast Offloading to Daemon
  • Daemon Offloads using MPI / OSG
  • MPI using obtaining multiple nodes during initial launch
  • Can launch Multiple GPU/CPU
  • Daemon Monitors resources
  • Working on Integration with Open Science Grid ( OSG )

Node GPU GPU Super Daemon Node GPU GPU Daemon Node GPU GPU Daemon

slide-10
SLIDE 10

Compute Engine

  • CUDA7 enabled global pairwise aligner GPU-enabled Global Gene

Network Aligner (G3NA)

  • CUDA enabled graph processing libraries
  • Thrust, Map-graph, etc.
  • Use Multiple GPUs for alignment of multiple graph
  • Utilize various algorithms
  • Clustering, Page Ranking, Filtering, Max-flow min-cut, etc.
slide-11
SLIDE 11

Visualization Engine

  • Orientation and Visual Flexibility
  • GPU enabled OpenGL and GLUI based visualization
  • Support for Multiple View ports and Data Types
  • CUDA-based Layout algorithms for Graphs and Trees
  • Dual/Multi GPU Support for Compute and Visualization
slide-12
SLIDE 12

INPUT DATA FORMAT

  • Input Data : Tab Separate Data for each Graph
  • Undirected Edge list pair
  • Size : 2000 Nodes / Graph
  • 40,000 Edges / Graph
  • Alignment Graph: Tab Separated for between pair of graphs
  • Undirected Edge List pair
  • Size : ~700 Nodes
  • Edges ~ 1000 Edges

Maize Rice Alignment Graph Maize - Rice

slide-13
SLIDE 13

SUPPORTING DATA

  • Cluster File: Tab Separated for each graph
  • File per graph containing node and clusterID
  • Network Information File: Tab Separated for each graph
  • File per graph containing information about species including extra

non-utilized information

  • Utilize to get Ontology information

Cluster File Network Information File

slide-14
SLIDE 14

ONTOLOGY FORMAT

  • Gene Ontology(GO) Basic File
  • Id: GOID ( GO:xxxxxxx)
  • Name: Gene Ontology information
  • NameSpace: Gene Ontology Classification
  • Definition: Description about the GO
  • is_a, consider, synonym, obsolete, etc.
  • Available at: http://geneontology.org/ontology/go-basic.obo
slide-15
SLIDE 15

JSON FILE

  • Used for user-directed layout

and input of graph and supporting data

  • Contains Position in 3D

space

  • Contains Initial Size
  • Contains Alignment

Information between graphs

{ "graph": { c "graph1": { "id": 1, "name": "Maize", "fileLocation": "M.tab", "clusterLocation": "M.tab.cluster", "Ontology": "Maize_info2.txt", "x": -300, "y": 0, "z": 0, "w": 200, "h": 200 }, "graph2": { "id": 2, "name": "Rice", "fileLocation": "R.tab", "clusterLocation": "R.tab.cluster", "Ontology": "Rice_info2.txt", "x": 0, "y": 0, "z": 0, "w": 200, "h": 200 } }, "alignment": { "alignment1": { "graphID1" : 1, "graphID2" : 2, "filelocation" : “output.gna”} } }

slide-16
SLIDE 16

http://network.genome.clemson.edu

Enabling Systems Genetics using HPC

  • Enabling anonymous pairwise alignment using Palmetto Supercomputer at

Clemson University

  • Snappy overview visualization of alignment using WebGL

721.9 Teraflops 2021 Compute Nodes and 22,336 cores 374 nodes with dual K20/K40 for acceleration and visualization 56Gbit interconnect

slide-17
SLIDE 17

G3NA-V Workflow

Gene Expression Matrix Sample Curve Distribution

slide-18
SLIDE 18

Gene Expression Matrix

  • Raw genomic data is a list of

genes associated to a species and a number of Samples

  • Each Sample is an intensity

value expressed by the gene

  • Raw data matrix is visualized

as a heatmap

slide-19
SLIDE 19

Gene Sample Distribution

  • Sample curve

distribution identify

  • utliers in the raw

genomic expression data.

  • Normalized

histogram of intensities with the range [-1, 1].

slide-20
SLIDE 20

G3NA-V Workflow

Multi Network Alignment Gene Expression Network Gene Expression Matrix Sample Curve Distribution

slide-21
SLIDE 21

G3NA Result

Performance with IsoRankN G3NA Scalability

All Performance data are from single node K40

slide-22
SLIDE 22

G3NA Profiling Overview

All Performance data are from single node K40

slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29

Ontology Visualization

  • A gene may be associated

to multiple GO terms

  • Every GO term is part on

an ontology

  • We navigate through the
  • ntologies through the

GO terms (and descriptions)

slide-30
SLIDE 30

Genomic Molecular Vis

  • We can visualize the protein

structure for each gene node

  • Obtain files from the Protein

Data Bank archive (PDB)

  • Crystal Structure of Protective

Ebola Virus Antibody 114

slide-31
SLIDE 31

Conclusion

  • NVIDIA powered tool for alignment and visualizations of graphs

and networks related information

  • Support for various formats and visualization
  • Accelerate discovery by incorporation tools.