Parallel Recipes Yves Vandriessche Sept. 08, 2015 scripts deal - PowerPoint PPT Presentation

CnC ¡as ¡workflow ¡coordina.on ¡language ¡ ¡ for ¡scien.fic ¡compu.ng ¡ Parallel ¡Recipes Yves ¡Vandriessche Sept. ¡08, ¡2015

scripts deal with complexity of gluing together applications + = GATK, BWA, Picard, TopHat, samtools, … Broad Institute best practices seq. pipeline ~ 200 SLoC 2

Distribution and parallelisation explodes accidental complexity of scripts ( ) + GATK, BWA, Picard, TopHat, samtools, … = x distributed seq. pipeline ~ 2000 SLoC eHive exome pipeline distributed seq. pipeline 28,066 SLoC 1 (Perl) ~ 2000 SLoC => $898,255 est. 1 generated using David A. Wheeler's 'SLOCCount' 3

parallel recipe: What ¡is ¡the ¡essential ¡ 𝚬 ¡between ¡sequential ¡and ¡parallel ¡script? ordering ¡dependencies! In ¡a ¡sequential ¡world: one ¡single ¡ordering ¡of ¡operations In ¡a ¡parallel ¡world: more ¡#orderings => more ¡parallelism => more ¡performance sources ¡of ¡ordering: •data ¡dependencies produce/consume, ¡consistency •control ¡dependencies iteration, ¡branching, ¡recursion, ¡… concurrency ¡(shared ¡resources)

Parallel ¡Recipes: precipes ¡ complex ¡glue ¡x ¡ complex ¡coordination reuse ¡scripting ordering ¡dependencies Intel ¡Concurrent ¡Collections ¡inside ¡ as ¡ Coordination ¡Language • ¡ ¡cluster-‑level ¡and ¡node-‑level ¡parallelism ¡ ¡ CnC ¡offers: • ¡ ¡determinate ¡execution ¡ • ¡ ¡flexible ¡parallel ¡execution ¡model ¡ • ¡ ¡stable ¡& ¡practical ¡implementation ¡(CnC++) 5

parallel hello world recipe: $ echo ‘B’ B command: B out: B_done B_finished $ echo ‘another thing for B’ command: Bbis B_done in: A Bbis C out: B_or_C_done A_done B_or_C_done command: $ echo ‘finished’ finish { A_done, B_or_C_done } finish in: what needs to happen when I start? command: what dependencies need to be satisfied before I can start? in: what dependencies are satisfied after I finished successfully? out: 6

parallel hello world recipe bis: practical ¡consideration: ¡ ¡ parallel ¡scripts ¡rarely ¡run ¡only ¡once ¡ $ wget ftp://citizenfiles.gov/dosiers/yves.txt . command: fetch dosier dosier command: $ grep 'gross' yves.txt > yves_gross.txt extract gross income income command: $ echo -n citizen yves is making; report income cat yves_gross.txt ; echo a year. 7

parallel hello world recipe bis: practical ¡consideration: ¡ ¡ parallel ¡scripts ¡rarely ¡run ¡only ¡once ¡ parallel ¡scripts ¡typically ¡run ¡data-‑parallel ¡ tom roel yves $ wget ftp://citizenfiles.gov/dosiers/{}.txt . command: fetch dosier dosier command: $ grep 'gross' {}.txt > {}_gross.txt extract gross income income command: $ echo -n citizen {} is making ; report income cat {}_gross.txt ; echo a year. 8

parallel hello world recipe bis: out ¡of ¡the ¡box: ¡ ¡ ¡data-‑parallel ¡runs yves tom roel fetch dosier fetch dosier fetch dosier dosier dosier dosier . . . extract gross income extract gross income extract gross income income income income report income report income report income 9

{ "stages" : { "A" : { " command " : "echo A for {}.", " out " : " A_done " }, "B" : { " command " : "echo B for {}.", " out " : " B_finished " }, B "Bbis" : { " command " : "echo One more thing for B and {}.", " in " : " B_finished ", B_finished " out " : " B_or_C_done " }, "C" : { A Bbis C " command " : "echo C for {}.", " out " : " B_or_C_done " }, A_done B_or_C_done "finish" : { " command " : "echo Done with A and B for {}.", " in " : [" A_done ", " B_or_C_done "] finish } } } 10

check_paired has_paired_end_reads JSON parallel recipe: fetch_paired_1 fetch_paired_2 fetch_unpaired paired_1.fastq.gz paired_2.fastq.gz unpaired.fastq.gz $ ./precipes -p bpp.dot exome_best_practices_pipeline.json alignment_paired alignment_unpaired paired.sam unpaired.sam { "stages" : { check_no_unpaired sort_for_coordinate_order_paired sort_for_coordinate_order_unpaired check_no_paired "check_paired" : { "command" : "$CHECK_EXISTS $READS/ {} _1.filt.fastq.gz", no_unpaired_end_reads sorted_paired.bam sorted_unpaired.bam no_paired_end_reads "out" : " has_paired_end_reads " }, merge_bams_paired merge_bams_paired_unpaired merge_bams_unpaired "fetch_unpaired" : { " command " : "$FETCH $READS/ {} .filt.fastq.gz $LOCAL_DIR/ {} .unpaired.fastq.gz", sorted.bam " out " : " unpaired.fastq.gz " }, remove_duplicates "fetch_paired_1" : { " command " : "$FETCH $READS/ {} _1.filt.fastq.gz $LOCAL_DIR/ {} .paired_1.fastq.gz", dedup.bam " in " : " has_paired_end_reads ", " out " : " paired_1.fastq.gz " build_bam_index_1 }, "fetch_paired_2" : { dedup.bai " command " : "$FETCH $READS/ {} _2.filt.fastq.gz $LOCAL_DIR/ {} .paired_2.fastq.gz", " in " : " has_paired_end_reads ", realign_around_indels_1 " out " : " paired_2 . fastq.gz " }, intervals "alignment_paired" : { "command" : “\ realign_around_indels_2 $BWA mem -R '@RG\\tID:Group1\\tLB:lib1\\tPL:illumina\\tSM:sample1' \ -t $NUM_THREADS $REF/ucsc.hg19.fasta \ 7.bam $LOCAL_DIR/ {} .paired_1.fastq.gz $LOCAL_DIR/{}.paired_2.fastq.gz \ > $LOCAL_DIR/ {} .paired.sam && build_bam_index_2 rm $LOCAL_DIR/ {} .paired_1.fastq.gz $LOCAL_DIR/{}.paired_2.fastq.gz", "in" : [" paired_1.fastq.gz ", " paired_2.fastq.gz "], 7.bai "out" : " paired.sam " },   base_recalibrate_1 … recal base_recalibrate_2 11 8.bam 8.bai call_variants [1] G. A. Auwera, M. O. Carneiro, C. Hartlm, et al, “From FastQ data to high ‐ confidence variant calls: the genome analysis toolkit best practices pipeline,” Curr. Protoc. Bioinform.11.10.1-11.10.33, October 2013. vcf vcfinocx

Execution bash$ ¡ ¡./precipes ¡ exome_best_practices_pipeline.json ¡sample_{00..07} ./precipes • ¡workstation ¡ core .json • ¡cluster ¡ • ¡Amazon ¡EC2 12

Execution bash$ ¡ ¡./precipes ¡ exome_best_practices_pipeline.json ¡sample_{00..07} ./precipes core .json add_stage( “fetch_paired_1”, “$FETCH $READS/…”, { “ has_paired_end_reads ” }, { “ paired_1.fastq.gz ” } ); add_stage( “check_paired”, “test -f …”, { }, { “ has_paired_end_reads ” } ); add_stage( … ); 13

Execution bash$ ¡ ¡./precipes ¡ exome_best_practices_pipeline.json ¡sample_{00..07} // start running samples in parallel > for( int i = 2; i < argc; ++i ) pipeline.run( argv[i], i-2 ); sai sample_07 > pipeline.tags.put( “sample_00” ) … > pipeline.tags.put( “sample_01” ) sample_00 sam … pipeline.wait() 1.bam 14

parallel scaling experiment: 32 samples from g1k NA12878 Exome Best Practices Scaling Experiment 7d 1 worker thread 158h 7m 2 worker threads 6d 5d 4d Runtime 80h 31m 79h 21m 3d 2d 41h 21m 40h 21m 1d 21h 38m 21h 7m 12h 20m 1 2 4 8 # compute nodes 15

Scaling Efficiency : single fat node (exome best practices, 32 samples) 100% 100% 98,224% 98,224% 96,369% 96,369% 14d 100% 95,285% 95,285% 336h 19m time(s) efficiency 83,361% 83,361% 83% 72,38% 72,38% 10,5d 69,132% 69,132% 67% Efficiency Runtime 46,928% 46,928% 7d 50% 171h 12m 33% 3,5d 87h 15m 17% 44h 7m 25h 13m 19h 22m 15h 12m 14h 55m 0d 1 2 4 8 16 24 32 64 # workers 16

Scaling Efficiency : cluster Scaling Efficiency : 2 workers Scaling Efficiency : 1 worker 7d 100% 7d 100% 100,00% 100,00% 100,00% 100,00% 99,63% 99,63% 97,36% 97,36% 97,97% 97,97% 158h 7m 93,05% 93,05% 93,60% 93,60% 6d 6d 1 worker runtime 2 workers runtime 80% efficiency efficiency 80% 81,60% 81,60% 5d 5d 60% 60% 4d 4d Efficiency Runtime Efficiency Runtime 80h 31m 79h 21m 3d 3d 40% 40% 2d 2d 41h 21m 40h 21m 20% 20% 1d 1d 21h 38m 21h 7m 12h 20m 0d 0d 1 2 4 8 1 2 4 8 # compute nodes # compute nodes 17

execution trace: 32 samples, 4 nodes, 2 workers 0 1 2 3 18

Next! Common ¡Workflow ¡Language 1 ¡(CWL) ¡integration • ¡workstation ¡ { core … • ¡cluster ¡ "run": { "inputs": [ • ¡amazon ¡ec2 { "inputBinding": { "position": 1, "prefix": "--reverse" }, "type": "boolean", "id": "#reverse" }, { "inputBinding": { "position": 2 }, "type": "File", "id": "#input" } ], … "class": "Workflow" } Shoutout ¡to ¡BOSC ¡CodeFest2015! 1 https://github.com/common-workflow-language/common-workflow-language 19

Parallel Recipes Yves Vandriessche Sept. 08, 2015 scripts deal - PowerPoint PPT Presentation

CnC as workflow coordina.on language for scien.fic compu.ng Parallel Recipes Yves Vandriessche Sept. 08, 2015 scripts deal with complexity of gluing together applications + = GATK, BWA,

Standardized Recipes Maine Department of Education Child Nutrition 2018 Standardized Recipes

The BBQ KING! Tasty Creations. I love to BBQ I enjoy sharing my tasty recipes.

Some recipes for BSSN formulation Some recipes for BSSN formulation 2012 Asia Pacific

Home Made Formula Recipes NOTE: These recipes do NOT provide

Aug 19 2020 Aug 19 2020 tea-time-tradition-presentation-and-recipes

[PDF] Spice of Life : The Recipes and Cooking Culture of Thailand (book with CD Rom in

Sushi Specials: More than 50 Recipes for the Perfect Presentation Oyamada Yasuto Click here if

Recipes for presentations with beamer latex using emacs org-mode Arne Babenhauserheide August 8,

Crowdsourcing policy recipes Supported by A project by Challenge ? Ideation Anybody

Chapter 12 Stoichiometry 1 Section 12.1 The Arithmetic of Equations 2 Cookies and

Numerical Recipes for Multiprecision Computations Henri Cohen May 13, 2014 IMB, Universit e

Thing Description Recipes Linked Data & Semantic Processing TF F2F Meeting, 13.07.2017

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution

Parallel and Distributed Programming Introduction Kenjiro Taura 1 / 21 Contents 1 Why Parallel

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Real-Time Streaming Protocol draft-ietf-mmusic-rfc2326bis-01.txt Magnus Westerlund New in

The financial cycle and macroeconomics: Rethinking the way forward Claudio Borio* Bank for

Generalizing the Bardos-LeRoux-Ndlec boundary condition for scalar conservation laws Boris

Overview of Chinas Bond Market Opening Michael CHEN Managing Director, Head of Overseas Client

Proof Methodologies for Behavioural Equivalence in D Alberto Cia ff aglione 1 , Matthew

Some thoughts on future challenges facing public debt managers Presentation for World Bank

1 Transport Layer Layer Protocols Protocols Transport Entire network seen as a pipe ...

on intervals and bounds in bit-vector arithmetic Mikol Janota and Christoph M. Wintersteiger

Parallel Recipes Yves Vandriessche Sept. 08, 2015 scripts deal - PowerPoint PPT Presentation

CnC as workflow coordina.on language for scien.fic compu.ng Parallel Recipes Yves Vandriessche Sept. 08, 2015 scripts deal with complexity of gluing together applications + = GATK, BWA,

Standardized Recipes Maine Department of Education Child Nutrition 2018 Standardized Recipes

The BBQ KING! Tasty Creations. I love to BBQ I enjoy sharing my tasty recipes.

Some recipes for BSSN formulation Some recipes for BSSN formulation 2012 Asia Pacific

Home Made Formula Recipes NOTE: These recipes do NOT provide

Aug 19 2020 Aug 19 2020 tea-time-tradition-presentation-and-recipes

[PDF] Spice of Life : The Recipes and Cooking Culture of Thailand (book with CD Rom in

Sushi Specials: More than 50 Recipes for the Perfect Presentation Oyamada Yasuto Click here if

Recipes for presentations with beamer latex using emacs org-mode Arne Babenhauserheide August 8,

Crowdsourcing policy recipes Supported by A project by Challenge ? Ideation Anybody

Chapter 12 Stoichiometry 1 Section 12.1 The Arithmetic of Equations 2 Cookies and

Numerical Recipes for Multiprecision Computations Henri Cohen May 13, 2014 IMB, Universit e

Thing Description Recipes Linked Data &amp; Semantic Processing TF F2F Meeting, 13.07.2017

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution

Parallel and Distributed Programming Introduction Kenjiro Taura 1 / 21 Contents 1 Why Parallel

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Real-Time Streaming Protocol draft-ietf-mmusic-rfc2326bis-01.txt Magnus Westerlund New in

The financial cycle and macroeconomics: Rethinking the way forward Claudio Borio* Bank for

Generalizing the Bardos-LeRoux-Ndlec boundary condition for scalar conservation laws Boris

Overview of Chinas Bond Market Opening Michael CHEN Managing Director, Head of Overseas Client

Proof Methodologies for Behavioural Equivalence in D Alberto Cia ff aglione 1 , Matthew

Some thoughts on future challenges facing public debt managers Presentation for World Bank

1 Transport Layer Layer Protocols Protocols Transport Entire network seen as a pipe ...

on intervals and bounds in bit-vector arithmetic Mikol Janota and Christoph M. Wintersteiger

Thing Description Recipes Linked Data & Semantic Processing TF F2F Meeting, 13.07.2017