Howdah
a flexible pipeline framework and applications to analyzing genomic data
Steven Lewis PhD slewis@systemsbiology.org
Howdah a flexible pipeline framework and applications to analyzing - - PowerPoint PPT Presentation
Howdah a flexible pipeline framework and applications to analyzing genomic data Steven Lewis PhD slewis@systemsbiology.org What is a Howdah? A howdah is a carrier for an elephant The idea is that multiple tasks can be performed
Steven Lewis PhD slewis@systemsbiology.org
Setup Map1 Consolidation Partition Task SNSP Task Reduce1
Break SubTask SNP SubTask
Statistics Subtask
Output Output
Mapper1 Mapper2 Mapper4 Mapper3 Reducer1 Totals Totals Totals Totals Reducer2 Reducer3 Reducer5 Reducer4 Every Mapper sends its total to each reducer – reducer makes grand total – before other keys sent
Grand Total Grand Total Grand Total Grand Total Grand Total
00000, part-r-00001 … independently and in parallel.
reference ACGTATTACGTACTACTACATAGATGTACAGTACTACAATAGATTCAAACATGATACA Sequences with ends fit to reference ATTACGTACTAC...... ……………... ACAGTACTACAA CGTATTACGTAC…………………………….……ACTACAATAGATT ACTACTACATA…………………..…….CAATAGATTCAAA
SNP reference
ACGTATTACGTACTACTACATAGATGTACAG ACGTATTACGTAC TTTTACGTACTACTA GTTTTACGTACTAC TTTTACGTACTACATAG CGTATTACGTACTACTA
but agree with each other a single mutation is suspected
ACGTATTACGTACTACTACATAGATGTACAGTACTACAATAGATTCAAACATGATACAACACACAGTA
actual deletion
ACGTATTACGTACTAC|TCAAACATGATACAACACACAGTAAGATAGTTACACGTTTATATATATACC
fit to actual
ATTACGTACTAC...... ? ? ? .. TACAACACACAG
reported fit to reference
ATTACGTACTAC......................................................................................... ............. TACAACACACAG
the reference much further apart than normal