biokepler a comprehensive bioinformatics scientific
play

bioKepler: A Comprehensive Bioinformatics Scientific Workflow - PowerPoint PPT Presentation

bioKepler: A Comprehensive Bioinformatics Scientific Workflow Module for Distributed Analysis of Large-Scale Biological Data Project Website: http://www.biokepler.org Ilkay Altintas 1 , Daniel Crawl 1 , Weizhong Li 2 , Shulei Sun 2 , Jianwu


  1. bioKepler: A Comprehensive Bioinformatics Scientific Workflow Module for Distributed Analysis of Large-Scale Biological Data � Project Website: http://www.biokepler.org Ilkay Altintas 1 , Daniel Crawl 1 , Weizhong Li 2 , Shulei Sun 2 , Jianwu Wang 1 , Sitao Wu 2 � 1 San Diego Supercomputer Center, UCSD � 2 Center for Research in Biological Systems, UCSD �

  2. Kepler: a Scientific Workflow System � • A cross-project collaboration � Ptolemy II: A laboratory for investigating design initiated August 2003 � KEPLER: A problem-solving environment for Scientific download times > 40,000 � Workflow • 2.3 released on 20 Jan 2012 � KEPLER = “ Ptolemy II + X ” for • Builds upon the open-source Scientific Workflows Ptolemy II framework � 07/14/12 � http://www.biokepler.org/ � 2 �

  3. bioKepler: a Module Being Built in Kepler � • Use Distributed Data-Parallel (DDP) frameworks, e.g., MapReduce, to accelerate bioinformatics tool execution � • Create, configurable, reusable and executable DDP components in Scientific Workflow System � • Support different execution engines and computational environments and optimize workflow execution � 07/14/12 � http://www.biokepler.org/ � 3 �

  4. Conceptual Framework � 07/14/12 � http://www.biokepler.org/ � 4 �

  5. Software Architecture � 07/14/12 � http://www.biokepler.org/ � 5 �

  6. Sample bioActors � • Alignment: BLAST, BLAT � • Profile-Sequence Alignment: PSI-BLAST � • Hidden Markov Model: HMMER � • Mapping: Bowtie, BWA, Samtools � • Multiple Alignment: ClustalW, Muscle � • Clustering: CD-HIT, Blastclust � • Gene Prediction: Glimmer, Genescan, Fraggenescan � • tRNA prediction: tRNA-scan, Meta-RNA � • Phylogeny: FastTree, RAxML � 07/14/12 � http://www.biokepler.org/ � 6 �

  7. DDP BLAST Workflow via Splitting Query Sequences � Switch director to work with other DDP engines, such as Hadoop � execute with data partition � 07/14/12 � http://www.biokepler.org/ � 7 �

  8. DDP BLAST Workflow Experiments � 4.0 3.5 Total Execution Time (hours) 3.0 2.5 2.0 1.5 1.0 16 32 48 64 Number of Slave CPU Cores 07/14/12 � http://www.biokepler.org/ � 8 �

  9. Questions? � • More Information � jianwu@sdsc.edu � http://www.biokepler.org � http://www.kepler-project.org � • Acknowledgements � 07/14/12 � http://www.biokepler.org/ � 9 �

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend