LoonyBin:
Keeping Language Technologists Sane
through Automated Management of
(Hyper)Workflows
Jonathan Clark and Alon Lavie Carnegie-Mellon University
LREC 2010
Thursday, May 20, 2010
LoonyBin: Keeping Language Technologists Sane through Automated - - PowerPoint PPT Presentation
LoonyBin: Keeping Language Technologists Sane through Automated Management of (Hyper)Workflows Jonathan Clark and Alon Lavie Carnegie-Mellon University LREC 2010 Thursday, May 20, 2010 Outline Empirical NLP Research Day-to-day issues
LREC 2010
Thursday, May 20, 2010
2
3
3
3
3
3
B A C
3
B A C
3
B A C
3
B A C
3
B A C
3
B A C
3
B A C
3
B A C
3
B A C
3
B A C
3
B A C
4
(inputs/outputs/parameters → shell commands)
(DAG of steps and dependencies)
5
(inputs/outputs/parameters → shell commands)
(DAG of steps and dependencies)
5
6
Available Tools
6
Drag and Drop Available Tools
6
Drag and Drop Available Tools
6
Drag and Drop Available Tools
6
Drag and Drop Available Tools
6
Drag and Drop Available Tools
6
Tooltips for Params Drag and Drop Available Tools
6
Tooltips for Params Drag and Drop Available Tools Machine Assignment
6
foreignCorpus nativeCorpus alignments fertility
A B W
Python Tool Descriptor
7
INPUTS PARAMETERS OUTPUTS
foreignCorpus nativeCorpus alignments fertility
A B W
Python Tool Descriptor
0.01 A’s output “x” B’s output “y”
Parameters & dependencies from workflow
7
INPUTS PARAMETERS OUTPUTS
foreignCorpus nativeCorpus alignments fertility
A B W
Python Tool Descriptor
0.01 A’s output “x” B’s output “y”
LoonyBin assigns paths
…/inputs/f …/inputs/n …/outputs/wa
Parameters & dependencies from workflow
7
INPUTS PARAMETERS OUTPUTS
foreignCorpus nativeCorpus alignments fertility
A B W
Python Tool Descriptor
0.01 A’s output “x” B’s output “y”
LoonyBin assigns paths
…/inputs/f …/inputs/n …/outputs/wa
Parameters & dependencies from workflow java edu.cmu.Tokenizer ../inputs/f ../inputs/n > ../outputs/wa
7
INPUTS PARAMETERS OUTPUTS
(inputs/outputs/parameters)
(DAG of steps and dependencies)
8
experiments
Filter Corpus {syntax-st, syntax-ch, moses} Word Alignment Stanford Parser Build Syntactic Translation Model Minimum Error Rate Training Decode Sentences Build Language Model Parallel Corpus Target Language Corpus Moses Phrase Table Training syntax moses Charniak Parser st ch {st,ch} {syntax-st, syntax-ch, moses}
9
experiments
Filter Corpus {syntax-st, syntax-ch, moses} Word Alignment Stanford Parser Build Syntactic Translation Model Minimum Error Rate Training Decode Sentences Build Language Model Parallel Corpus Target Language Corpus Moses Phrase Table Training syntax moses Charniak Parser st ch {st,ch} {syntax-st, syntax-ch, moses}
Packing Node
9
experiments
Filter Corpus {syntax-st, syntax-ch, moses} Word Alignment Stanford Parser Build Syntactic Translation Model Minimum Error Rate Training Decode Sentences Build Language Model Parallel Corpus Target Language Corpus Moses Phrase Table Training syntax moses Charniak Parser st ch {st,ch} {syntax-st, syntax-ch, moses}
Packing Node
9
experiments
Filter Corpus {syntax-st, syntax-ch, moses} Word Alignment Stanford Parser Build Syntactic Translation Model Minimum Error Rate Training Decode Sentences Build Language Model Parallel Corpus Target Language Corpus Moses Phrase Table Training syntax moses Charniak Parser st ch {st,ch} {syntax-st, syntax-ch, moses}
Packing Node
Realizations
9
experiments
Filter Corpus {syntax-st, syntax-ch, moses} Word Alignment Stanford Parser Build Syntactic Translation Model Minimum Error Rate Training Decode Sentences Build Language Model Parallel Corpus Target Language Corpus Moses Phrase Table Training syntax moses Charniak Parser st ch {st,ch} {syntax-st, syntax-ch, moses}
Packing Node
Realizations
Don’t re-run
9
experiments
Filter Corpus {syntax-st, syntax-ch, moses} Word Alignment Stanford Parser Build Syntactic Translation Model Minimum Error Rate Training Decode Sentences Build Language Model Parallel Corpus Target Language Corpus Moses Phrase Table Training syntax moses Charniak Parser st ch {st,ch} {syntax-st, syntax-ch, moses}
Packing Node
Realizations
Don’t re-run Organized directory structure & easy- to-parse logs
9
Design Machine Java
10
Design Machine Home Execution Machine Java UNIX
Manually Copy Bash Script
10
Design Machine Home Execution Machine Remote Execution Machine Remote Execution Machine Java UNIX UNIX UNIX
Manually Copy Bash Script Passwordless SSH Passwordless SSH
10
Design Machine Home Execution Machine Remote Execution Machine Remote Execution Machine Java UNIX UNIX UNIX
Manually Copy Bash Script Passwordless SSH Passwordless SSH
Bash Sun Grid Engine Condor
10
(in SVN)
11
12
13
14
Tutorial & Software at
rerun a step?
(re)run
data is useful long after we get annoyed with size of data files!
changing tools in your Loon logs -- Build them from SVN every time to ensure you’re executing that version
all steps are run in serial) -- Mid-Term
during execution -- Long-term
“tools” (hierarchical tools) -- Long-term