producing green algae Project by Dan Browne, PhD Candidate, - - PowerPoint PPT Presentation

producing green algae
SMART_READER_LITE
LIVE PREVIEW

producing green algae Project by Dan Browne, PhD Candidate, - - PowerPoint PPT Presentation

Improving HPC resource utilization in the genome assembly of a biofuel producing green algae Project by Dan Browne, PhD Candidate, Devarenne Lab, Biochemistry & Biophysics Presented by Michael Dickens, High Performance Research Computing


slide-1
SLIDE 1

Improving HPC resource utilization in the genome assembly of a biofuel producing green algae

Project by Dan Browne, PhD Candidate, Devarenne Lab, Biochemistry & Biophysics Presented by Michael Dickens, High Performance Research Computing Texas A&M University

slide-2
SLIDE 2

Basic model of Botryococcus braunii cell biology

Weiss et al (2012) Eukaryotic Cell 11:1424-1440

Improving HPC resource utilization in the genome assembly

  • f a biofuel producing green algae

Dan Browne, PhD Candidate, Devarenne Lab, Biochemistry & Biophysics Department, Texas A&M University

slide-3
SLIDE 3

Why sequence the B. braunii genome?

Main project organizers:

Andy Koppisch Northern Arizona University Joe Chappell University of Kentucky Shigeru Okada Tokyo University Tim Devarenne Texas A&M University

  • B. braunii is a potential source of renewable fuels and chemicals
  • B. braunii is found worldwide, most notably in oil and coal shale deposits
  • B. braunii has a very high oil content, ~40% of dry weight
  • B. braunii oils can be processed with conventional petroleum technology

Improving HPC resource utilization in the genome assembly

  • f a biofuel producing green algae

Dan Browne, PhD Candidate, Devarenne Lab, Biochemistry & Biophysics Department, Texas A&M University

slide-4
SLIDE 4

High rates of drop-in biofuel recovery

Paraffins 68.5% Gasoline C5-C12 40-205˚C 67% Naphthenes 30.0% Kerosene C10-C16 175-325˚C 15% Olefins <0.2% Diesel C14-C20 250-350˚C 15% Aromatics 1.4% Residuals >C70 >600˚C 3%

60 - 70% of crude

  • B. braunii hydrocarbons

converted to gasoline Comparable to petroleum

Hillen et al. (1982) Biotechnol Bioeng 24:193

Hydrocracking (497˚C/high pressure/catalyst) Distillation

30-40% of B. braunii dry weight

Liquid hydrocarbons are easily recovered from colony

Improving HPC resource utilization in the genome assembly

  • f a biofuel producing green algae

Dan Browne, PhD Candidate, Devarenne Lab, Biochemistry & Biophysics Department, Texas A&M University

slide-5
SLIDE 5
  • B. braunii whole-genome sequencing with Illumina

Library Name Library Type Insert Size Total Sequence Reads Read Length Genome Size Coverage SXPX Paired End 800 bp 499,073,402 250 bp 166 Mb ~750x

Genomic DNA Reconstructed DNA Sequence

Computational Assembly

Genome sequence can be used to identify genes involved in hydrocarbon production

Improving HPC resource utilization in the genome assembly

  • f a biofuel producing green algae

Dan Browne, PhD Candidate, Devarenne Lab, Biochemistry & Biophysics Department, Texas A&M University

AATAATGTCAATTTGGTAGATATCAGAGAGTTTTATGTTGACAAAGATGG AATAATGTCA GATATCAGAGA ATGTTGACAAA AATAATGTCAA GATATCAGAGAG ATGTTGACAAA ATAATGTCAAT TATCAGAGAGT GTTGACAAAG ATAATGTCAAT TATCAGAGAGT GTTGACAAAG TAATGTCAATT TCAGAGAGT TTGACAAAGAT TGTCAATTTGG CAGAGAGT TTGACAAAGAT TGTCAATTTGGT CAGAGAGT TGACAAAGATG AATTTGGTAGAT GAGAGT GACAAAGATGG TTGGTAGATAT CAAAGATGG TGGTAGATATC AAAGATGG AATAATGTCAATTTGGTAGATATCAGAGAGTNNNATGTTGACAAAGATGG

slide-6
SLIDE 6
  • B. braunii whole-genome sequencing with Illumina

Library Name Library Type Insert Size Total Sequence Reads Read Length Genome Size Coverage SXPX Paired End 800 bp 499,073,402 250 bp 166 Mb ~750x

Genomic DNA Reconstructed DNA Sequence

Computational Assembly

Genome sequence can be used to identify genes involved in hydrocarbon production

Improving HPC resource utilization in the genome assembly

  • f a biofuel producing green algae

Dan Browne, PhD Candidate, Devarenne Lab, Biochemistry & Biophysics Department, Texas A&M University

AATAATGTCAATTTGGTAGATATCAGAGAGTTTTATGTTGACAAAGATGG AATAATGTCA GATATCAGAGA ATGTTGACAAA AATAATGTCAA GATATCAGAGAG ATGTTGACAAA ATAATGTCAAT TATCAGAGAGT GTTGACAAAG ATAATGTCAAT TATCAGAGAGT GTTGACAAAG TAATGTCAATT TCAGAGAGT TTGACAAAGAT TGTCAATTTGG CAGAGAGT TTGACAAAGAT TGTCAATTTGGT CAGAGAGT TGACAAAGATG AATTTGGTAGAT GAGAGT GACAAAGATGG TTGGTAGATAT CAAAGATGG TGGTAGATATC AAAGATGG AATAATGTCAATTTGGTAGATATCAGAGAGTNNNATGTTGACAAAGATGG

slide-7
SLIDE 7

Workflow of Assembly By Short Sequences (ABySS): A parallel de novo genome assembler with MPI support

  • s
  • http://www.bcgsc.ca/platform/bioinfo/software/abyss

Slide Material From: Shaun D. Jackman (http://sjackman.github.io/)

(2) AdjList (3) Prune tips (4) Pop bubbles (5) Generate contigs

k-mer De Bruijn graph

(1) ABYSS-P map 1 map 2 scaffold

MPI

slide-8
SLIDE 8

Default and modified ABySS execution pipelines

Default

ABYSS-P todot scaffold 1 MPI job n cores multi-node n serial jobs 1 node 1 node job 1 node job

Assembly

AdjList 1 node job

Input

n serial jobs 1 node split input (n files) 1 node job

map 1 map 2

Improving HPC resource utilization in the genome assembly

  • f a biofuel producing green algae

Dan Browne, PhD Candidate, Devarenne Lab, Biochemistry & Biophysics Department, Texas A&M University

All commands of each mapping step run in serial and limited to one compute node

slide-9
SLIDE 9

Default and modified ABySS execution pipelines

Modified Default

ABYSS-P todot scaffold 1 MPI job n cores multi-node n serial jobs 1 node 1 node job 1 node job

Assembly

AdjList 1 node job

Input

n serial jobs 1 node split input (n files) 1 node job

map 1 map 2

ABYSS-P todot scaffold 1 MPI job n cores multi-node n parallel jobs HpcGridRunner multi-node 1 node job 1 node job

Assembly

AdjList 1 node job

Input

n parallel jobs HpcGridRunner multi-node split input (n files) 1 node job

map 1 map 2

Improving HPC resource utilization in the genome assembly

  • f a biofuel producing green algae

Dan Browne, PhD Candidate, Devarenne Lab, Biochemistry & Biophysics Department, Texas A&M University

slide-10
SLIDE 10

Assembly times of default and modified pipelines

0:00:00 2:24:00 4:48:00 7:12:00 9:36:00 12:00:00 14:24:00 Default Modified hh:mm:ss scaffold map 2 todot map 1 AdjList ABYSS-P

  • HPC resource utilization: 50 cores (5 cores/node * 10 nodes)
  • Assembly time reduced by 46% using modified ABySS pipeline.
  • Modified pipeline eliminated 45 cores being idle for almost 6 hours.

Improving HPC resource utilization in the genome assembly

  • f a biofuel producing green algae

Dan Browne, PhD Candidate, Devarenne Lab, Biochemistry & Biophysics Department, Texas A&M University

slide-11
SLIDE 11

Devarenne Lab 2015

Dongyin Su Grad Student Hem Thapa Grad Student Mehmet Tatli Grad Student Dan Browne Grad Student Tim Devarenne, PhD Associate Professor

Department of Biochemistry & Biophysics

Incheol Yeo Grad Student

http://devarennelab.tamu.edu Botryococcus braunii

Victoria Yell Undergrad Student