Analysis of High-Throughput Biological Data Part I: Scalable High - - PowerPoint PPT Presentation

analysis of high throughput biological data part i
SMART_READER_LITE
LIVE PREVIEW

Analysis of High-Throughput Biological Data Part I: Scalable High - - PowerPoint PPT Presentation

NZIMA NZIMA Napier Napier 2008 2008 Analysis of High-Throughput Biological Data Part I: Scalable High Performance Algorithms and Implementations Mike Langston Professor Department of Electrical Engineering and Computer Science University


slide-1
SLIDE 1

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Analysis of High-Throughput Biological Data Part I: Scalable High Performance Algorithms and Implementations

Mike Langston

Professor Department of Electrical Engineering and Computer Science University of Tennessee

and

Collaborating Scientist Biological Sciences Division Oak Ridge National Laboratory USA

21 February 2008 NZIMA Napier 2008

slide-2
SLIDE 2

2

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Outline of Talk

Sample Application Tools and Technologies

Complexity Theory Graph Algorithms High Performance Computation Reconfigurable Computation

Compute Engine Problem Variants

slide-3
SLIDE 3

3

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Outline of Talk

Sample Application Tools and Technologies

Complexity Theory Graph Algorithms High Performance Computation Reconfigurable Computation

Compute Engine Problem Variants

slide-4
SLIDE 4

4

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Technology Mapping

Biological Knowledge

. . Protein Structure . . Gene Regulatory Networks . . Sequence Homology . . Protein function . . Cell Physiology . .

Analysis Tools

. . Ontology . . Cis-Regulatory Elements . . Quantitative Trait Loci . . Combinatorial Algorithms . . Bayesian Networks . .

slide-5
SLIDE 5

5

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Biological Knowledge

. . Protein Structure . . Gene Regulatory Networks . . Sequence Homology . . Protein function . . Cell Physiology . .

Analysis Tools

. . Ontology . . Cis-Regulatory Elements . . Quantitative Trait Loci . . Combinatorial Algorithms . . Bayesian Networks . .

Technology Mapping

slide-6
SLIDE 6

6

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Biological Knowledge

. . Protein Structure . . Gene Regulatory Networks . . Sequence Homology . . Protein function . . Cell Physiology . .

Analysis Tools

. . Ontology . . Cis-Regulatory Elements . . Quantitative Trait Loci . . Combinatorial Algorithms . . Bayesian Networks . .

Technology Mapping

slide-7
SLIDE 7

7

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Gene Regulatory Networks

Gene4 Gene2 Gene1 Gene3 CREs

regulation via cis regulatory elements (CREs) promoter, TATA box, motifs, modules 8-15 bp in length, action often at the ends

CREs CREs CREs

central dogma: one gene one protein cis regulation

slide-8
SLIDE 8

8

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008 trans regulation (direct) via gene products

Gene4 Gene2 Gene1 Gene3 transcription factor protein translation up or down regulate mRNA expression mRNA transcription

Gene Regulatory Networks

slide-9
SLIDE 9

9

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008 trans regulation (indirect) via post-translational modification

Gene4 Gene2 Gene1 Gene3 transcription factor protein translation up or down regulate mRNA expression mRNA Gene4 Gene2 Gene1 Gene3 transcription factor protein kinase protein phosphorylation protein transcription factor protein up or down regulate mRNA expression transcription

Gene Regulatory Networks

slide-10
SLIDE 10

10

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008 many other network actions

Gene4 Gene2 Gene1 Gene3 transcription factor protein translation up or down regulate mRNA expression mRNA Gene4 Gene2 Gene1 Gene3 transcription factor protein kinase protein phosphorylation protein transcription factor protein up or down regulate mRNA expression transcription

post-transcriptional regulation (e.g., alternate splicing) μRNA (e.g., functional RNA, RNAi and gene silencing) but all are forms of co-regulation

Gene Regulatory Networks

slide-11
SLIDE 11

11

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Currently Awash in a Sea of Transcriptomic Data

An organism’s mRNA transcripts:

  • link between the genome, the proteome and the cellular phenotype
  • data quality and richness increasing
  • noise reduction
  • more conditions
  • correlation, putative coregulation, regulatory networks
  • cannot see post-translational modifications (e.g., phosphorylation)
  • huge range of prokaryotic and eukaryotic data coming on line
  • timely confluence of technologies
  • proteomics, metabolomics data not far behind
slide-12
SLIDE 12

12

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

A Major Computational Bottleneck: Clique

Data transformation:

  • representing biological networks with graphs is well understood
  • genes (via transcripts, probesets) are denoted by vertices
  • edges denote significant gene-gene correlations
  • we seek genesets with common regulatory mechanisms
  • thus we want to identify dense subgraphs, in particular cliques
  • complete subgraphs
  • special case of subgraph isomorphism
  • NP-complete to decide
  • NP-complete even to approximate

K4

slide-13
SLIDE 13

13

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Outline of Talk

Sample Application Tools and Technologies

Complexity Theory Graph Algorithms High Performance Computation Reconfigurable Computation

Compute Engine Problem Variants

slide-14
SLIDE 14

14

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Clique

COMPLEXITY THEORY Problem Classification Algorithm Selection GRAPH ALGORITHMS Modeling Optimization PARALLELISM AND GRIDS Speedup Collaboration RECONFIGURATION Hardware Acceleration Fast Prototyping

Intellectual Property Available Technologies

Tools and Technologies

slide-15
SLIDE 15

15

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Clique

COMPLEXITY THEORY FIXED-PARAMETER TRACTABILITY GRAPH ALGORITHMS PARALLELISM AND GRIDS RECONFIGURATION

Intellectual Property Available Technologies

Tools and Technologies

Clique

slide-16
SLIDE 16

16

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

The Classic View:

P NP PSPACE Σ 2

P

… …

“easy”

A Little Complexity Theory

slide-17
SLIDE 17

17

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

The Classic View:

P NP PSPACE Σ 2

P

… …

“easy” “hard”

A Little Complexity Theory

slide-18
SLIDE 18

18

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

The Classic View:

P NP PSPACE Σ 2

P

… …

“easy” “hard” “fuggettaboutit”

A Little Complexity Theory

slide-19
SLIDE 19

19

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Fixed-Parameter Tractability

Pioneering approach going back twenty years

– Well-Quasi-Order theory – nonuniform measure of complexity

slide-20
SLIDE 20

20

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Pioneering approach going back twenty years

– Well-Quasi-Order theory – nonuniform measure of complexity

Exploit knowledge of the solution space

– Consider an algorithm with a time bound such as O(2kn). – And now one with a time bound more like O(2kn). – Both are exponential in parameter value(s). – But what happens when k is fixed? – Fixed-Parameter Tractable (FPT) iff O(f(k)nc) – Confines superpolynomial behavior to the parameter

Fixed-Parameter Tractability

slide-21
SLIDE 21

21

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Hence, the Parameterized View:

FPT … …

W[1] W[2] XP

“solvable” (even if NP-hard!)

Complexity Theory, Refined

slide-22
SLIDE 22

22

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

The Parameterized View:

FPT … …

W[1] W[2] XP

“solvable” (even if NP-hard!) “heuristics only”

Complexity Theory, Refined

slide-23
SLIDE 23

23

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

The Parameterized View:

FPT … …

W[1] W[2] XP

“solvable” (even if NP-hard!) “heuristics only” “fuggettaboutit”

Complexity Theory, Refined

slide-24
SLIDE 24

24

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Clique

COMPLEXITY THEORY FIXED-PARAMETER TRACTABILITY GRAPH ALGORITHMS VERTEX COVER PARALLELISM AND GRIDS RECONFIGURATION

Intellectual Property Available Technologies

Tools and Technologies

slide-25
SLIDE 25

25

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

The Vertex Cover Project

Pioneering approach going back twenty years

– Well-Quasi-Order theory – nonuniform measure of complexity

Exploit knowledge of the solution space

– Consider an algorithm with a time bound such as O(2kn). – And now one with a time bound more like O(2kn). – Both are exponential in parameter value(s). – But what happens when k is fixed? – Fixed-Parameter Tractable (FPT) iff O(f(k)nc) – Confines superpolynomial behavior to the parameter

Duality

– We solve vertex cover, clique’s complementary dual – O(1.2759kk1.5+kn) time

Key features

– Kernelization, branching and interleaving G G

_

slide-26
SLIDE 26

26

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

  • use preprocessing via degree

structures

–Low degree rules –High degree rule –Resultant graph has size O(k2) [at most k(1+k/3) vertices]

The Vertex Cover Project

slide-27
SLIDE 27

27

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

  • use preprocessing via degree

structures

  • then kernelize to reduce to a

computational core

–suite of codes

The Vertex Cover Project

slide-28
SLIDE 28

28

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

  • use preprocessing via degree

structures

  • then kernelize to reduce to a

computational core

–suite of codes –LP variants

minimize: Σ Xi, i in V(G) subject to: Xu+Xv>=1 for all uv in E(G) where: Xi >=0 for all i in V(G)

The Vertex Cover Project

slide-29
SLIDE 29

29

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

  • use preprocessing via degree

structures

  • then kernelize to reduce to a

computational core

–suite of codes –LP variants –crown rule

Rest of graph … … … A crown of width 3 Rest of graph … … … A crown of width 1

A crown of width one. A crown of width three.

The Vertex Cover Project

slide-30
SLIDE 30

30

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

392 630 0.07 Crown Rule 392 622 40.53 Network Flow 389 616 69.49 LP Parameter (k’) Kernel (n’) Run Time Algorithm

Preprocessing completed first. All times in seconds.

Representative Kernelization Results

slide-31
SLIDE 31

31

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

392 630 0.07 Crown Rule 392 622 40.53 Network Flow 389 616 69.49 LP Parameter (k’) Kernel (n’) Run Time Algorithm

Preprocessing completed first. All times in seconds.

Some conclusions:

  • Perform preprocessing, then the crown rule.
  • If dense, stop trying to kernelize.
  • If sparse, try LP or network flow before stopping.

Representative Kernelization Results

slide-32
SLIDE 32

32

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

  • use preprocessing via degree

structures

  • then kernelize to reduce to a

computational core

  • employ branching to explore

the core

–exhaustive search –highly parallel

The Vertex Cover Project

slide-33
SLIDE 33

33

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

  • use preprocessing via degree

structures

  • then kernelize to reduce to a

computational core

  • employ branching to explore

the core

  • finally, interleave all three

The Vertex Cover Project

slide-34
SLIDE 34

34

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Clique

COMPLEXITY THEORY FIXED-PARAMETER TRACTABILITY GRAPH ALGORITHMS VERTEX COVER PARALLELISM AND GRIDS SSH, CONDOR, NETSOLVE, BIG IRON RECONFIGURATION

Intellectual Property Available Technologies

Tools and Technologies

slide-35
SLIDE 35

35

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Middleware (NetSolve) Foundational Fabric (Switches and Depots) Compute Resources (Grid Service Clusters) NetSolve Client NetSolve Agent Distributed Storage NetSolve Servers

Key: NetSolve’s program description file facility

Sample Grid Architecture

slide-36
SLIDE 36

36

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

High Performance Implementations

  • suites of maximum/maximal/bi/para clique methods
  • have processed graphs with over 3M vertices
  • memory often a limiting factor
  • currently working on out-of-core methods

SGI Altix supercomputer at ORNL 256 dual-CPU processors, two terabytes of shared memory

Supercomputer Platforms

slide-37
SLIDE 37

37

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Clique

COMPLEXITY THEORY FIXED-PARAMETER TRACTABILITY GRAPH ALGORITHMS VERTEX COVER PARALLELISM AND GRIDS SSH, CONDOR, NETSOLVE, BIG IRON RECONFIGURATION FIELD PROGRAMMABLE GATE ARRAYS

Intellectual Property Available Technologies

Tools and Technologies

slide-38
SLIDE 38

38

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

With current implementations, we are able to solve sub-instances:

  • of size 512 or less,
  • with speedups north of about 125.

Algorithms are very different. VHDL versus C. I/O is often the most critical resource.

Hardware Acceleration

Sample FPGA

slide-39
SLIDE 39

39

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Clique

COMPLEXITY THEORY FIXED-PARAMETER TRACTABILITY GRAPH ALGORITHMS VERTEX COVER PARALLELISM AND GRIDS SSH, CONDOR, NETSOLVE, BIG IRON RECONFIGURATION FIELD PROGRAMMABLE GATE ARRAYS

Intellectual Property Available Technologies

Put the Pieces Together

slide-40
SLIDE 40

40

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Outline of Talk

Sample Application Tools and Technologies

Complexity Theory Graph Algorithms High Performance Computation Reconfigurable Computation

Compute Engine Problem Variants

slide-41
SLIDE 41

41

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

A Clique Compute Engine

Preprocessing and Kernelization Branching and Interleaving

Works well with synthetic data. But with real data, dynamic workload balancing is required. And that can be very tricky! Distilled Genesets, Models and Testable Hypotheses

Parametric Tuning, Decomposition and Refinement Highly Parallel Computation PE PE PE PE PE PE FPGA FPGA FPGA Reconfigurable Technology

. . .

Recalcitrant Subproblem

PE

. . . . . . . . . . . . . . . . . . . . . . . . . . .

Input Graph Cliques for Post-Processing Prioritized by GO, CREs, pathways, literature, etc Transcriptomic Context

slide-42
SLIDE 42

42

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Splitter Job Scheduler Initialize Branching Handle Machine Job List Handle Machine Handle Machine Branching ssh Open Socket

Processor 1 Processor N

. . . . . .

A simple mechanism. (Sometimes too simple.)

Workload Balancing: A Vertex Cover Driver

Processor 2

Branching Branching

slide-43
SLIDE 43

43

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Pruning is needed at processor 4. 1 2 3 4

… … … …

Processor 1 is still active. Processor 2 is still active. Processor 3 is still active. Send a subtree to the job queue.

Workload Balancing: Distributed Subtree Splitting

slide-44
SLIDE 44

44

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

82 minutes Not needed Parallel Branching 141 minutes 7 seconds Sequential Branching 20 minutes Not needed Dynamic Decomposition 34 seconds 34 seconds Sequential Kernelization 2043 2044 398 399 Cover Size No Yes Instance Type 2466 SH3-10 2466 SH3-10 839 SH2-5 839 SH2-5 Graph Size Graph Name

So clique size is 422. A direct assault ~ 2466422.

Sample Results on Protein Sequence Data

slide-45
SLIDE 45

45

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

82 minutes Not needed Parallel Branching 141 minutes 7 seconds Sequential Branching 20 minutes Not needed Dynamic Decomposition 34 seconds 34 seconds Sequential Kernelization 2043 2044 398 399 Cover Size No Yes No Yes Instance Type 2466 SH3-10 2466 SH3-10 839 SH2-5 839 SH2-5 Graph Size Graph Name

So clique size is 422. The hardest computations.

Sample Results on Protein Sequence Data

slide-46
SLIDE 46

46

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

82 minutes Not needed Parallel Branching 141 minutes 7 seconds Sequential Branching 20 minutes Not needed Dynamic Decomposition 203 minutes 203 minutes 34 seconds 34 seconds Sequential Kernelization 2043 2044 398 399 Cover Size No Yes No Yes Instance Type 2466 SH3-10 2466 SH3-10 839 SH2-5 839 SH2-5 Graph Size Graph Name

So clique size is 422. The hardest computations.

Sample Results on Protein Sequence Data

slide-47
SLIDE 47

47

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

82 minutes Not needed Parallel Branching 6+ days ~ 5 days 141 minutes 7 seconds Sequential Branching 20 minutes Not needed Dynamic Decomposition 203 minutes 203 minutes 34 seconds 34 seconds Sequential Kernelization 2043 2044 398 399 Cover Size No Yes No Yes Instance Type 2466 SH3-10 2466 SH3-10 839 SH2-5 839 SH2-5 Graph Size Graph Name

So clique size is 422. The hardest computations.

Sample Results on Protein Sequence Data

slide-48
SLIDE 48

48

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

6+ days ~ 5 days 82 minutes Not needed Parallel Branching 6+ days ~ 5 days 141 minutes 7 seconds Sequential Branching 20 minutes Not needed Dynamic Decomposition 203 minutes 203 minutes 34 seconds 34 seconds Sequential Kernelization 2043 2044 398 399 Cover Size No Yes No Yes Instance Type 2466 SH3-10 2466 SH3-10 839 SH2-5 839 SH2-5 Graph Size Graph Name

So clique size is 422. The hardest computations. 32 PEs @ 500MHz.

Sample Results on Protein Sequence Data

slide-49
SLIDE 49

49

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

6+ days ~ 5 days 82 minutes Not needed Parallel Branching 6+ days ~ 5 days 141 minutes 7 seconds Sequential Branching 620 minutes 140 minutes 20 minutes Not needed Dynamic Decomposition 203 minutes 203 minutes 34 seconds 34 seconds Sequential Kernelization 2043 2044 398 399 Cover Size No Yes No Yes Instance Type 2466 SH3-10 2466 SH3-10 839 SH2-5 839 SH2-5 Graph Size Graph Name

So clique size is 422. The hardest computations. 32 PEs @ 500MHz. Load balancing is critical. “No” is harder than “yes.”

Sample Results on Protein Sequence Data

slide-50
SLIDE 50

50

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

6+ days ~ 5 days 82 minutes Not needed Parallel Branching 6+ days ~ 5 days 141 minutes 7 seconds Sequential Branching 620 minutes 140 minutes 20 minutes Not needed Dynamic Decomposition 203 minutes 203 minutes 34 seconds 34 seconds Sequential Kernelization 2043 2044 398 399 Cover Size No Yes No Yes Instance Type 2466 SH3-10 2466 SH3-10 839 SH2-5 839 SH2-5 Graph Size Graph Name

We now routinely solve these sorts of instances in seconds. But these are not genome scale problems!

Sample Results on Protein Sequence Data

slide-51
SLIDE 51

51

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

A Bit of Blasphemy: The Real Power of FPT

Guides our thinking, steering us to exploit parameters

slide-52
SLIDE 52

52

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

A Bit of Blasphemy: The Real Power of FPT

Guides our thinking, steering us to exploit parameters Kernelization sets the stage for efficiency

slide-53
SLIDE 53

53

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

A Bit of Blasphemy: The Real Power of FPT

Guides our thinking, steering us to exploit parameters Kernelization sets the stage for efficiency Branching still requires serious computation

slide-54
SLIDE 54

54

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

A Bit of Blasphemy: The Real Power of FPT

Guides our thinking, steering us to exploit parameters Kernelization sets the stage for efficiency Branching still requires serious computation Interleaving is indispensible in practice

slide-55
SLIDE 55

55

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

A Bit of Blasphemy: The Real Power of FPT

Guides our thinking, steering us to exploit parameters Kernelization sets the stage for efficiency Branching still requires serious computation Interleaving is indispensible in practice Solve problems directly (clique not vertex cover)

slide-56
SLIDE 56

56

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

A Bit of Blasphemy: The Real Power of FPT

Guides our thinking, steering us to exploit parameters Kernelization sets the stage for efficiency Branching still requires serious computation Interleaving is indispensible in practice Solve problems directly (clique not vertex cover) Better subtree pruning via iterated preprocessing

slide-57
SLIDE 57

57

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

A Bit of Blasphemy: The Real Power of FPT

Guides our thinking, steering us to exploit parameters Kernelization sets the stage for efficiency Branching still requires serious computation Interleaving is indispensible in practice Solve problems directly (clique not vertex cover) Better subtree pruning via iterated preprocessing Examples: Common Neighbor Preprocessing (CNP) Color Preprocessing (CP)

slide-58
SLIDE 58

58

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

A Bit of Blasphemy: The Real Power of FPT

Representative Computational Results

Common Neighbor Preprocessing versus Color Preprocessing

Time Kernel Size CP CNP+CP CNP Method

Data Source: Gerling Affymetrix 430A read time 0:40, probe sets 22690, threshold edges 7,534,598, maximum clique size 248

slide-59
SLIDE 59

59

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

A Bit of Blasphemy: The Real Power of FPT

Representative Computational Results

Common Neighbor Preprocessing versus Color Preprocessing

Time Kernel Size 585k 1700 CP 576k 1692 CNP+CP 2785k 5896 CNP Edges Vertices Method

Data Source: Gerling Affymetrix 430A read time 0:40, probe sets 22690, threshold edges 7,534,598, maximum clique size 248

slide-60
SLIDE 60

60

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

A Bit of Blasphemy: The Real Power of FPT

Representative Computational Results

Common Neighbor Preprocessing versus Color Preprocessing

Time Kernel Size 1:22 585k 1700 CP 3:46 576k 1692 CNP+CP 2:24 2785k 5896 CNP Preprocess Edges Vertices Method

Data Source: Gerling Affymetrix 430A read time 0:40, probe sets 22690, threshold edges 7,534,598, maximum clique size 248

slide-61
SLIDE 61

61

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

A Bit of Blasphemy: The Real Power of FPT

Representative Computational Results

Common Neighbor Preprocessing versus Color Preprocessing

Time Kernel Size 4:04 1:22 585k 1700 CP 3:46 3:46 576k 1692 CNP+CP 51:54 2:24 2785k 5896 CNP Branch Preprocess Edges Vertices Method

Data Source: Gerling Affymetrix 430A read time 0:40, probe sets 22690, threshold edges 7,534,598, maximum clique size 248

slide-62
SLIDE 62

62

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

A Bit of Blasphemy: The Real Power of FPT

Representative Computational Results

Common Neighbor Preprocessing versus Color Preprocessing

Time Kernel Size 5:26 4:04 1:22 585k 1700 CP 6:58 3:46 3:46 576k 1692 CNP+CP 54:18 51:54 2:24 2785k 5896 CNP Total Branch Preprocess Edges Vertices Method

Data Source: Gerling Affymetrix 430A read time 0:40, probe sets 22690, threshold edges 7,534,598, maximum clique size 248

slide-63
SLIDE 63

63

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Outline of Talk

Sample Application Tools and Technologies

Complexity Theory Graph Algorithms High Performance Computation Reconfigurable Computation

Compute Engine Problem Variants

slide-64
SLIDE 64

64

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Maximal Clique

Biological Fidelity

Genes are Pleiotropic Maximal Cliques May Overlap

slide-65
SLIDE 65

65

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Maximal Clique

Biological Fidelity

Genes are Pleiotropic Maximal Cliques May Overlap

Results

Efficiency Predictable Range of Outputs

slide-66
SLIDE 66

66

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Maximal Clique

Biological Fidelity

Genes are Pleiotropic Maximal Cliques May Overlap

Results

Efficiency Predictable Range of Outputs

Keys

Global Shared Memory Map Bitmapped Implementations Synchronization and Load Balancing

slide-67
SLIDE 67

67

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Maximal Clique

1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 1 6 2 4 3 2 4 0 4 8 5 6 6 4

Num ber of Processors

I deal range[ 2 0 , 2 8 ] range[ 1 9 , 2 8 ] range[ 1 8 , 2 8 ] range[ 3 , 2 8 ]

2 4 6 8 10 12 14 16 18 20 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

Clique Size Mem ory usage ( GBytes)

Near Linear Speedup Significant Memory Requirements

slide-68
SLIDE 68

68

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Biclique

Concentrate on Bipartite Graphs Previous Algorithms Make Unwarranted Assumptions Bookkeeping Branch & Bound Ontological Discovery

Genes Phenotypes gene1 gene3 trait1 gene4 trait3 trait2 trait4 trait5 gene2 gene6 gene7 gene5

slide-69
SLIDE 69

69

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

1.E-02 1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.20

p-value Log (Wallclock times in Seconds)

MICA MBEA

Observed Biclique Runtimes

2 ~ 3 orders of magnitude faster than the best previous alternative

Time Complexity: O(dn2B), where d is maximum degree and B is the number of maximal bicliques. Keys: preprocess and exploit structure. Sound familiar?

Biclique

slide-70
SLIDE 70

70

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Discretionary Power

We can now explore much denser graphs, as shown by edge weights.

0.E+00 1.E+08 2.E+08 3.E+08 4.E+08 5.E+08 6.E+08 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15 0.16

p-value Number of times that fundamental

  • perations are performed

MICA MBEA

Biclique

slide-71
SLIDE 71

71

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Paraclique

  • Clique gloms onto highly

connected vertices.

  • Here a 280-clique is

transformed into a 466-paraclique.

  • Edge density remains

north of about 95%.

  • Lift and separate.

279

280-clique

279 279

466-paraclique

. . . . . . . . .

slide-72
SLIDE 72

72

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Collaborators

Research Scientists (Incomplete!): Mikael Benson Elissa Chesler Frank Dehne Mike Fellows Ivan Gerling Dan Goldowitz Malak Kotb Mark Ragan Arnold Saxton Brynn Voy Rob Williams Bing Zhang Current Students: Bhavesh Borate Patricia Carey John Eblen Jeremy Jay Zuopan Li Sudhir Naswa Andy Perkins Vivek Philip Charles Phillips Gary Rogers Jon Scharff Yun Zhang

slide-73
SLIDE 73

73

ELECTRICAL ENGINEERING & COMPUTER SCIENCE UNIVERSITY OF TENNESSEE

NZIMA Napier 2008

Geeks Я Us