loonybin
play

LoonyBin: Keeping Language Technologists Sane through Automated - PowerPoint PPT Presentation

LoonyBin: Keeping Language Technologists Sane through Automated Management of (Hyper)Workflows Jonathan Clark and Alon Lavie Carnegie-Mellon University LREC 2010 Thursday, May 20, 2010 Outline Empirical NLP Research Day-to-day issues


  1. LoonyBin: Keeping Language Technologists Sane through Automated Management of (Hyper)Workflows Jonathan Clark and Alon Lavie Carnegie-Mellon University LREC 2010 Thursday, May 20, 2010

  2. Outline • Empirical NLP Research • Day-to-day issues • Current problems • LoonyBin’s solutions • Workflows • HyperWorkflows 2

  3. Empirical NLP • Plumbing: Gluing (Linux) tools together • Recording results • Sanity checking • Running variations • Moving between clusters & schedulers 3

  4. Empirical NLP • Plumbing: Gluing (Linux) tools together • Recording results • Sanity checking • Running variations • Moving between clusters & schedulers 3

  5. Empirical NLP • Plumbing: Gluing (Linux) tools together • Recording results • Sanity checking • Running variations • Moving between clusters & schedulers 3

  6. Empirical NLP • Plumbing: Gluing (Linux) tools together • Recording results • Sanity checking • Running variations • Moving between clusters & schedulers 3

  7. Empirical NLP • Plumbing: Gluing (Linux) tools together • Recording results A B C • Sanity checking • Running variations • Moving between clusters & schedulers 3

  8. Empirical NLP • Plumbing: Gluing (Linux) tools together • Recording results A B C • Sanity checking • Running variations • Moving between clusters & schedulers 3

  9. Empirical NLP • Plumbing: Gluing (Linux) tools together • Recording results A B C • Sanity checking • Running variations • Moving between clusters & schedulers 3

  10. Empirical NLP • Plumbing: Gluing (Linux) tools together • Recording results A B C • Sanity checking • Running variations • Moving between clusters & schedulers 3

  11. Empirical NLP X • Plumbing: Gluing (Linux) tools together • Recording results A B C • Sanity checking • Running variations • Moving between clusters & schedulers 3

  12. Empirical NLP X X • Plumbing: Gluing (Linux) tools together • Recording results A B C • Sanity checking • Running variations • Moving between clusters & schedulers 3

  13. Empirical NLP X X • Plumbing: Gluing (Linux) tools X together • Recording results A B C • Sanity checking • Running variations • Moving between clusters & schedulers 3

  14. Empirical NLP X X • Plumbing: Gluing (Linux) tools X together • Recording results A B C • Sanity checking • Running variations • Moving between clusters & schedulers 3

  15. Empirical NLP X X • Plumbing: Gluing (Linux) tools X together • Recording results A B C • Sanity checking • Running variations • Moving between clusters & schedulers 3

  16. Empirical NLP X X • Plumbing: Gluing (Linux) tools X together • Recording results A B C • Sanity checking • Running variations X • Moving between clusters & schedulers 3

  17. Empirical NLP X X • Plumbing: Gluing (Linux) tools X together • Recording results A B C • Sanity checking • Running variations X • Moving between clusters X & schedulers 3

  18. Proposed Solution: HyperWorkflow Management 4

  19. LoonyBin • Define the tools (inputs/outputs/parameters → shell commands) • Define the workflow (DAG of steps and dependencies) • Generate & run a shell script 5

  20. LoonyBin • Define the tools (inputs/outputs/parameters → shell commands) • Define the workflow (DAG of steps and dependencies) • Generate & run a shell script 5

  21. 6

  22. Available Tools 6

  23. Drag and Drop Available Tools 6

  24. Drag and Drop Available Tools 6

  25. Drag and Drop Available Tools 6

  26. Drag and Drop Available Tools 6

  27. Drag and Drop Available Tools 6

  28. Drag and Drop Tooltips for Params Available Tools 6

  29. Drag and Drop Tooltips for Params Available Tools Machine Assignment 6

  30. Generating a Script for A W B INPUTS OUTPUTS alignments foreignCorpus nativeCorpus PARAMETERS fertility Python Tool Descriptor 7

  31. Generating a Script for A W B INPUTS OUTPUTS A’s output “x” alignments foreignCorpus nativeCorpus B’s output “y” PARAMETERS 0.01 fertility Parameters & dependencies from workflow Python Tool Descriptor 7

  32. Generating a Script for A W B INPUTS OUTPUTS A’s output “x” …/outputs/wa …/inputs/f alignments foreignCorpus …/inputs/n nativeCorpus B’s output “y” LoonyBin assigns PARAMETERS paths 0.01 fertility Parameters & dependencies from workflow Python Tool Descriptor 7

  33. Generating a Script for A W B INPUTS OUTPUTS A’s output “x” …/outputs/wa …/inputs/f alignments foreignCorpus …/inputs/n nativeCorpus B’s output “y” LoonyBin assigns PARAMETERS paths 0.01 fertility Parameters & dependencies from workflow Python Tool java edu.cmu.Tokenizer ../inputs/f Descriptor ../inputs/n > ../outputs/wa 7

  34. So far... • Complaints about current implementation of empirical NLP experiments • Define the tools (inputs/outputs/parameters) • Define the workflow (DAG of steps and dependencies) • Generate & run a shell script 8

  35. HyperWorkflows • HyperWorkflows: Shared substructure in experiments • Encode small variations in a HyperDAG moses Moses Phrase Table Training syntax Word Alignment Parallel Build Decode Filter Minimum Corpus Syntactic Sentences Corpus Error Translation st {syntax-st, Stanford Rate Model syntax-ch, Parser Training {st,ch} moses} ch Target {syntax-st, Build Language syntax-ch, Language Charniak Corpus moses} Model Parser 9

  36. HyperWorkflows • HyperWorkflows: Shared substructure in experiments • Encode small variations in a HyperDAG moses Packing Moses Phrase Table Training Node syntax Word Alignment Parallel Build Decode Filter Minimum Corpus Syntactic Sentences Corpus Error Translation st {syntax-st, Stanford Rate Model syntax-ch, Parser Training {st,ch} moses} ch Target {syntax-st, Build Language syntax-ch, Language Charniak Corpus moses} Model Parser 9

  37. HyperWorkflows • HyperWorkflows: Shared substructure in experiments • Encode small variations in a HyperDAG moses Packing Moses Phrase Table Training Node syntax Word Alignment Parallel Build Decode Filter Minimum Corpus Syntactic Sentences Corpus Error Translation st {syntax-st, Stanford Rate Model syntax-ch, Parser Training {st,ch} moses} ch Target {syntax-st, Build Language syntax-ch, Language Charniak Corpus moses} Model Parser 9

  38. HyperWorkflows • HyperWorkflows: Shared substructure in experiments • Encode small variations in a HyperDAG moses Packing Moses Phrase Table Training Node syntax Word Alignment Parallel Build Decode Filter Minimum Corpus Syntactic Sentences Corpus Error Translation st {syntax-st, Stanford Rate Model syntax-ch, Parser Training {st,ch} moses} ch Target {syntax-st, Build Language syntax-ch, Language Charniak Realizations Corpus moses} Model Parser 9

  39. HyperWorkflows • HyperWorkflows: Shared substructure in experiments • Encode small variations in a HyperDAG moses Packing Moses Phrase Table Training Node Don’t re-run syntax Word Alignment Parallel Build Decode Filter Minimum Corpus Syntactic Sentences Corpus Error Translation st {syntax-st, Stanford Rate Model syntax-ch, Parser Training {st,ch} moses} ch Target {syntax-st, Build Language syntax-ch, Language Charniak Realizations Corpus moses} Model Parser 9

  40. HyperWorkflows Organized directory • HyperWorkflows: Shared substructure in structure experiments & easy- • Encode small variations in a HyperDAG to-parse moses logs Packing Moses Phrase Table Training Node Don’t re-run syntax Word Alignment Parallel Build Decode Filter Minimum Corpus Syntactic Sentences Corpus Error Translation st {syntax-st, Stanford Rate Model syntax-ch, Parser Training {st,ch} moses} ch Target {syntax-st, Build Language syntax-ch, Language Charniak Realizations Corpus moses} Model Parser 9

  41. Multiple Machines and Schedulers Design Machine Java 10

  42. Multiple Machines and Schedulers Manually Copy Home Bash Script Design Execution Machine Machine Java UNIX 10

  43. Multiple Machines Remote and Schedulers Execution Machine Passwordless Manually SSH Copy Home Bash UNIX Script Design Execution Passwordless SSH Machine Machine Remote Execution Java UNIX Machine UNIX 10

  44. Multiple Machines Bash Remote and Schedulers Execution Machine Passwordless Manually SSH Copy Home Bash UNIX Script Design Execution Passwordless SSH Machine Machine Remote Condor Execution Java UNIX Machine Sun Grid Engine UNIX 10

  45. Other Things to Make Life Easier • Sanity checking at each step (embedded in Tool Descriptors) • Copying of files (including to HDFS) • Text-based workflow definition (in SVN) • Open-source LGPL License 11

  46. WANTED Users & Contributors Machine Translation Toolpack (released) Corpus Processing Toolpack? Parsing Toolpack? Question Answering Toolpack? Resource Directory Toolpack? Speech Recognition Toolpack? 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend