bioKepler: A Comprehensive Bioinformatics Scientific Workflow - - PowerPoint PPT Presentation

biokepler a comprehensive bioinformatics scientific
SMART_READER_LITE
LIVE PREVIEW

bioKepler: A Comprehensive Bioinformatics Scientific Workflow - - PowerPoint PPT Presentation

bioKepler: A Comprehensive Bioinformatics Scientific Workflow Module for Distributed Analysis of Large-Scale Biological Data Project Website: http://www.biokepler.org Ilkay Altintas 1 , Daniel Crawl 1 , Weizhong Li 2 , Shulei Sun 2 , Jianwu


slide-1
SLIDE 1

bioKepler: A Comprehensive Bioinformatics Scientific Workflow Module for Distributed Analysis of Large-Scale Biological Data

Ilkay Altintas1, Daniel Crawl1, Weizhong Li2, Shulei Sun2, Jianwu Wang1, Sitao Wu2

1San Diego Supercomputer Center, UCSD 2Center for Research in Biological Systems, UCSD

Project Website: http://www.biokepler.org

slide-2
SLIDE 2

Kepler: a Scientific Workflow System

http://www.biokepler.org/ 07/14/12 2

Ptolemy II: A laboratory for investigating design KEPLER: A problem-solving environment for Scientific Workflow KEPLER = “Ptolemy II + X” for Scientific Workflows

  • A cross-project collaboration

initiated August 2003 download times > 40,000

  • 2.3 released on 20 Jan 2012
  • Builds upon the open-source

Ptolemy II framework

slide-3
SLIDE 3

bioKepler: a Module Being Built in Kepler

  • Use Distributed Data-Parallel (DDP)

frameworks, e.g., MapReduce, to accelerate bioinformatics tool execution

  • Create, configurable, reusable and

executable DDP components in Scientific Workflow System

  • Support different execution engines and

computational environments and

  • ptimize workflow execution

http://www.biokepler.org/ 07/14/12 3

slide-4
SLIDE 4

Conceptual Framework

http://www.biokepler.org/ 07/14/12 4

slide-5
SLIDE 5

Software Architecture

http://www.biokepler.org/ 07/14/12 5

slide-6
SLIDE 6

Sample bioActors

  • Alignment: BLAST, BLAT
  • Profile-Sequence Alignment: PSI-BLAST
  • Hidden Markov Model: HMMER
  • Mapping: Bowtie, BWA, Samtools
  • Multiple Alignment: ClustalW, Muscle
  • Clustering: CD-HIT, Blastclust
  • Gene Prediction: Glimmer, Genescan, Fraggenescan
  • tRNA prediction: tRNA-scan, Meta-RNA
  • Phylogeny: FastTree, RAxML

07/14/12 6 http://www.biokepler.org/

slide-7
SLIDE 7

DDP BLAST Workflow via Splitting Query Sequences

http://www.biokepler.org/ 07/14/12 7

Switch director to work with other DDP engines, such as Hadoop execute with data partition

slide-8
SLIDE 8

http://www.biokepler.org/

DDP BLAST Workflow Experiments

07/14/12 8

1.0 1.5 2.0 2.5 3.0 3.5 4.0 Number of Slave CPU Cores Total Execution Time (hours) 16 32 48 64

slide-9
SLIDE 9

Questions?

  • More Information

jianwu@sdsc.edu http://www.biokepler.org http://www.kepler-project.org

  • Acknowledgements

07/14/12 9 http://www.biokepler.org/