4
play

4 May 10, 2011 This Lecture ! global queue Lecture I & II - PowerPoint PPT Presentation

ComplexHPC Spring School Day 2: KOALA Tutorial Submitting Jobs With Koala Nezih Yigitbasi Delft University of Technology 5/10/11 4 May 10, 2011 This Lecture ! global queue Lecture I & II with grid KOALA scheduler load sharing


  1. ComplexHPC Spring School Day 2: KOALA Tutorial Submitting Jobs With Koala Nezih Yigitbasi Delft University of Technology 5/10/11 4 May 10, 2011

  2. This Lecture ! global queue Lecture I & II with grid KOALA scheduler load sharing co-allocation local queues with local LS LS LS schedulers global job clusters local jobs 2/22

  3. Outline 1. Runners 2. Preparing the Environment Login to a head-node • Set up the environment • Set SSH public key authentication • Create a Job Description File (JDF) • 3. Job Submission Prunner • OMRunner • WRunner • 4. Practical Work 3/22

  4. Runners • Extensible framework to add support for • Different application types • sequential/parallel, workflows, Bag of Tasks (BoTs) • Different middlewares/standards • Globus, DRMAA • Can submit diverse application types to a heterogeneous multi-cluster grid without changing your application • Responsible for • Stage-in/stage-out • Submitting the executable to the middleware • Monitoring job status • Responding to failures 4/22

  5. Runners • OMRunner • For OpenMPI co-allocated jobs • PRunner • Modified OMRunner for ease-of-use and non-coallocated jobs (no need to write a job description file) • KRunner • Globus job submission tool for clusters using the Globus middleware • IRunner • KRunner based Ibis job submission tool • WRunner • For running BoTs and workflows 5/22

  6. Outline 1. Runners 2. Preparing the Environment Login to a head-node • Set up the environment • Set SSH public key authentication • Create a Job Description File (JDF) • 3. Job Submission Prunner • OMRunner • WRunner • 4. Practical Work 6/22

  7. Preparing the Environment • Four-step process 1. Login to a head-node e.g., fs0.das4.cs.vu.nl 2. Set up the environment • $PATH, $LD_LIBRARY_PATH etc. 3. Set SSH public key authentication (passwordless) 4. Create a Job Description File * KOALA hands-on document: http://bit.ly/lrcDVd 7/22

  8. Login to a Head-Node 8/22

  9. Setting The Environment Variables • Need to set the required environment variables in the .bashrc • PATH environment variable should include runner executables • export PATH=$PATH:/home/koala/koala_bin/bin • LD_LIBRARY_PATH should include DRMAA libraries • export LD_LIBRARY_PATH=$LD_LIBRARY_PATH: /cm/shared/apps/sge/6.2u5/lib/lx26-amd64/ • Load required modules • Each module contains the information needed to configure the shell for a specific application • module load gcc • module load openmpi/gcc/default * This info is the hands-on document 9/22

  10. Configuring Public Key Authentication • Runners use SSH for submitting tasks to remote hosts • e.g., from fs0 to fs3 • Need passwordless authentication • Use public/private key pairs • Public key is used to encrypt messages • Private key is used to decrypt the messages encrypted with the corresponding public key • kssh_keygen.sh –all • Generates public/private key pairs • Pushes public keys to all head nodes * This info is the hands-on document 10/22

  11. Job Description File (JDF) Group Multiple Preferred Default Estimated Path of Aggregator 4 Processors Comment stdout/stderr Execution Site Components Directory Executable Runtime 11/22

  12. Outline 1. Runners 2. Preparing the Environment Login to a head-node • Set up the environment • Set SSH public key authentication • Create a Job Description File (JDF) • 3. Job Submission Prunner • OMRunner • WRunner • 4. Practical Work 12/22

  13. PRunner • -host <cluster> the preferred cluster to run the job • -c <node_count> number of nodes • -stdout <stdout_file> file used for standard output For more information http://www.st.ewi.tudelft.nl/ koala/ 13/22

  14. OMRunner • -optComm try to optimize communication (place component to site with least latency) • -f <jdf_file> job description file comma separated list of clusters to exclude • -x <clusters> 14/22

  15. Workflows • Applications with dependencies • Directed Acyclic Graph (DAG) • Nodes are executables • Edges ‏ are dependencies (files) 15/22

  16. Sample Workflow Description <job id=“0” name=“Task_0_executable” > Parent (Root) <uses file=“file0.out” link=“output” type=“data” /> <uses file=“file1.out” link=“output” type=“data”/> 0 </job> <job id=“1” name="Task_1_executable“ > <uses file=“file0.out” link=“input” type=“data”/> </job> <job id=“2” name=“Task_2_executable” > <uses file=“file1.out” link=“input” type=“data”/> </job> 1 2 <child ref=“1”> <parent ref=“0”/> </child> Dependencies <child ref=“2”> Children <parent ref=“0”/> </child> 16/22

  17. Bag of Tasks (BoT) • Conveniently parallel applications • DAG without dependencies • Usually used for parameter sweep applications • A single executable that runs for a large set of parameters (e.g., monte-carlo simulations, bioinformatics applications...) 17/22

  18. Sample BoT Description range_1.in <job id=“0” name=“ PrimeSearch.py ”> <uses file=“ range_1.in ” link=“input”/> range_2.in <uses file=“primes.out” link=“output”/> </job> range_3.in <job id=“1” name=“ PrimeSearch.py ”> <uses file=“ range_2.in ” link=“input”> <uses file=“primes.out” link=“output”/> primes.out </job> <job id=“2” name=“ PrimeSearch.py ”> primes.out <uses file=“ range_3.in ” link=“input”/> <uses file=“primes.out” link=“output”/> primes.out </job> 18/22

  19. Running BoTs & Workflows • Define a DAX (DAG in XML) file • Submit with wrunner • -f <job_description> • -p <policy> • single_site: whole BoT/workflow on a single site • -s : Preferred execution site • multi_site: tasks of the BoT/workflow are distributed to the grid based on the current load of the sites • Submit to fs3 • wrunner -f wf/Diamond.xml -p single_site -s fs3.das4.tudelft.nl • Use all sites for execution • wrunner -f wf/PrimeSearch.xml -p multi_site 19/22

  20. Workflow Engine Architecture DRMAA SSH Workflow + Description Custom Protocol DFS Execution 20/22

  21. Practical Work • Follow the steps in the practical work handout to set up your environment • After you download and extract the tar file you will have the following directory structure ComplexHPC_11 OMRunner PRunner WRunner applications -> executables wf -> DAX files 21/22

  22. Summary • How to prepare the environment for KOALA runners • Login to a head-node • Set up the environment • Set SSH public key authentication • Create a Job Description File (JDF/DAX) • How to submit jobs using KOALA • Sequential jobs (PRunner - no JDF) • Parallel jobs (OMRunner – with JDF) • Workflows (WRunner- with DAX) • BoTs (WRunner- with DAX) 22/22

  23. “M.N.Yigitbasi@tudelft.nl” http://www.st.ewi.tudelft.nl/~nezih/ More Information: • Koala Project: http://st.ewi.tudelft.nl/koala • PDS publication database: http://www.pds.twi.tudelft.nl 23/22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend