Steinbuch Centre for Computjng (SCC) Funding:
Tutorial: Advanced (Batch) Job Scriptjng Robert Barthel, SCC, KIT - - PowerPoint PPT Presentation
Tutorial: Advanced (Batch) Job Scriptjng Robert Barthel, SCC, KIT - - PowerPoint PPT Presentation
Tutorial: Advanced (Batch) Job Scriptjng Robert Barthel, SCC, KIT Steinbuch Centre for Computjng (SCC) www.bwhpc-c5.de Funding: How to read the following slides Abbreviatjon/Colour code Full meaning prompt of the interactjve shell $ c o
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
2
How to read the following slides
Abbreviatjon/Colour code Full meaning
$ c
- m
m a n d
- p
t v a l u e $ =
prompt of the interactjve shell The full prompt may look like:
u s e r @ m a c h i n e : p a t h $
The command has been entered in the interactjve shell session
< i n t e g e r > < s t r i n g > < > =
Placeholder for integer, string etc
f
- ,
b a r
Metasyntactjc variables
$ { W O R K S H O P } / p f s / d a t a 1 / s
- f
t w a r e _ u c 1 / b w h p c / k i t / w
- r
k s h
- p
/ 2 1 7
- 1
- 1
1
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
3
Goal
Be descriptjve!
Comment your code
e.g. via headers sectjons of script and functjons.
Decipherable names for variables and functjons
Organise and structure!
Break complex scripts into simpler blocks e.g. use functjons Use exit codes Use standardized parameter fmags for script invocatjon.
Write job script that runs interactjvely
→ Then add part for MOAB Header Header Declaratjons (MSUB + defaults) Declaratjons (MSUB + defaults) Functjons Functjons Input handling Input handling Main sectjon Main sectjon Footer Footer
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
4
Best Practjses – Common Problems (1)
Do not Run your code, applicatjon, job on login nodes / in $ { H O M E } :
for interactjve jobs use msub -I -V
Multjnode Job:
use workspaces Producing Tbyte of scratch fjles & >10000 File: Change your applicatjon code Need help? Apply for Tiger Team Support.
Singlenode Job:
use $ { T M P D I R } : HowTo → Case 1
Chain jobs: HowTo → Case 2 Many sequentjal tjny jobs:
Bundle to one big job: HowTo → Case 3
Handling walltjme based job aborts: HowTo → Case 4 Use of MPI/OpenMP Parallelisatjon in jobs:
htups://indico.scc.kit.edu/indico/event/310/material/slides/13.pdf
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
5
Rev: MOAB variables
htup://www.bwhpc-c5.de/wiki/index.php/Batch_Jobs#Moab_Environment_Variables MSUB variables:
# ! / b i n / b a s h # M S U B
- N
t e s t # M S U B
- l
n
- d
e s = 1 : p p n = 1 , m e m = 5 m b # M S U B
- l
w a l l t i m e = : 5 : # M S U B
- m
n # M S U B
- v
m y _
- w
n _ v a r i a b l e s = “ a r g u m e n t s “
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
6
Case 1: Jobs @ $TMPDIR (1)
If temporary fjles of job > Gbyte → Run your job at $ { T M P D I R }
but ONLY if single node jobs
What to do:
Generate subdirectory under $ { T M P D I R } = > $ { r u n _ D I R } Copy to $ { r u n _ D I R } Change to $ { r u n _ D I R } & program executjon Copy results to start DIR
How?
Start with templates:
$ { W O R K S H O P } / e x e r c i s e s / 3 / 1 _ j
- b
_ r u n _ u n d e r _ l
- c
a l _ t m p d i r . s h + $ { W O R K S H O P } / e x e r c i s e s / 3 / { 1 _ g e n _ f i l e s , 1 _ g e n _ f i l e s . i n p } $ { W O R K S H O P } / e x e r c i s e s / 3 / 1 _ j
- b
_ r u n _ u n d e r _ l
- c
a l _ t m p d i r . s h + $ { W O R K S H O P } / e x e r c i s e s / 3 / { 1 _ g e n _ f i l e s , 1 _ g e n _ f i l e s . i n p }
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
7
Case 1: Jobs @ $TMPDIR (2)
# ! / b i n / b a s h
. . .
# # a ) T u t
- r
i a l T
- D
- :
l
- a
d m
- d
u l e s I N T E L + M K L i f n
- t
l
- a
d e d # # b ) D e f i n e y
- u
r r u n d i r e c t
- r
y u n d e r t m p d i r # # i n c
- r
p
- r
a t i n g u s e r n a m e a n d J
- b
I D / P I D m k d i r
- p
v " $ { T M P D I R } / $ { U S E R } . $ { M O A B _ J O B I D :
- $
$ } " # # c ) T u t
- r
i a l T
- D
- :
C h e c k e x i s t e n c e
- f
r u n d i r e c t
- r
y # # d ) C
- p
y f i l e s f r
- m
s u b m i t d i r e c t
- r
y # # t
- r
u n d i r e c t
- r
y c d $ M O A B _ S U B M I T D I R c p
- p
v g e n _ f i l e s . x " $ { T M P D I R } / $ { U S E R } . $ { M O A B _ J O B I D :
- $
$ } " # # C h e c k i f c
- p
y s u c c e e d e d c p
- p
v g e n _ f i l e s . i n p " $ { T M P D I R } / $ { U S E R } . $ { M O A B _ J O B I D :
- $
$ } " # # e ) C h a n g e t
- r
u n d i r e c t
- r
y ( c h e c k i f s u c c e e d e d ) a n d s t a r t b i n a r y + i n p u t f i l e c d " $ { T M P } / $ { U S E R } . $ { M O A B _ J O B I D } " . / 1 _ g e n _ f i l e s . x 1 _ g e n _ f i l e s . i n p # # f ) T u t
- r
i a l T
- D
- :
c h e c k r u n s t a t u s # # g ) t r a n s f e r f i l e s t
- s
u b m i t d i r e c t
- r
y c p
- p
v f i l e s _ * .
- u
t " $ { M O A B _ S U B M I T D I R } " # # h ) T u t
- r
i a l T
- D
- :
c l e a n u p r u n _ D I R
Code snip:
T A S K / T
- D
- :
1 m i n * G e n e r a l i s e b l u e c
- d
e a v
- i
d i n g r e p e t i t i
- n
* W r i t e c
- d
e f
- r
a
- h
* R e d i r e c t
- u
t p u t
- f
b i n a r y T A S K / T
- D
- :
1 m i n * G e n e r a l i s e b l u e c
- d
e a v
- i
d i n g r e p e t i t i
- n
* W r i t e c
- d
e f
- r
a
- h
* R e d i r e c t
- u
t p u t
- f
b i n a r y
$ { W O R K S H O P } / e x e r c i s e s / 3 / 1 _ j
- b
_ r u n _ u n d e r _ l
- c
a l _ t m p d i r . s h
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
8
Case 1: Jobs @ $TMP (3)
# # 1 ) D e f i n e f u l l p a t h
- f
y
- u
r b i n a r y E X E = " $ { M O A B _ S U B M I T D I R :
- $
{ P W D } } / 1 _ g e n _ f i l e s . x " # # 2 ) D e f i n e
- u
t p u t f i l e # # = N a m e
- f
e x e c u t a b l e + J O B I D
- r
P I D
- u
t p u t = " $ ( b a s e n a m e $ { E X E } ) _ $ { M O A B _ J O B I D :
- $
$ } . l
- g
" # # 3 ) D e f i n e f u l l p a t h i n p u t f i l e s I n p u t = " $ { M O A B _ S U B M I T D I R :
- $
{ P W D } } / 1 _ g e n _ f i l e s . i n p " # # 4 ) D e f i n e i n p u t f i l e s t
- b
e c
- p
i e d c
- p
y _ l i s t = " $ { E X E } $ { i n p u t } " # # 5 ) D e f i n e f i l e s t
- b
e c
- p
i e d b a c k a f t e r r u n , i . e .
- u
t p u t f i l e s a v e _ l i s t = " $ {
- u
t p u t } f i l e s _ * .
- u
t " # # a ) L
- a
d m
- d
u l e s I N T E L + M K L i f n
- t
l
- a
d e d f
- r
m
- d
i n c
- m
p i l e r / i n t e l n u m l i b / m k l ; d
- m
- d
u l e l i s t 2 > & 1 | g r e p " $ { m
- d
} " > / d e v / n u l l | | m
- d
u l e l
- a
d " $ { m
- d
} " d
- n
e # # b ) D e f i n e y
- u
r r u n d i r e c t
- r
y a n d a n d g e n e r a t e v i a m k d i r r u n _ D I R = " $ { T M P D I R } / $ { U S E R } . $ { M O A B _ J O B I D :
- $
$ } " m k d i r
- p
v " $ { r u n _ D I R } " # # c ) C h e c k e x i s t e n c e
- f
r u n d i r e c t
- r
y i f [ !
- d
" $ { r u n _ D I R } " ] ; t h e n e c h
- "
E R R O R : R u n D I R = $ { r u n _ D I R } d
- e
s n
- t
e x i s t " ; e x i t 1 f i
- Decl. + a-c:
S
- l
u t i
- n
! S
- l
u t i
- n
!
$ { W O R K S H O P } / s
- l
u t i
- n
s / 3 / 1 _ g e n e r a l i s e d _ j
- b
_ r u n _ u n d e r _ l
- c
a l _ t m p d i r . s h
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
9
Case 1: Jobs @ $TMP (4)
# # d ) C h a n g e t
- S
u b m i t D i r
- r
P W D / C
- p
y f i l e s f r
- m
s u b m i t _ D I R t
- r
u n _ D I R c d " $ { M O A B _ S U B M I T D I R :
- $
{ P W D } } " f
- r
X i n $ { c
- p
y _ l i s t } ; d
- c
p
- p
v " $ { X } " " $ { r u n _ D I R } " i f [ $ ?
- n
e ] ; t h e n e c h
- "
E R R O R : C
- p
y
- f
$ { X } f a i l e d " ; e x i t 1 ; f i d
- n
e # # e ) C h a n g e t
- r
u n D I R a n d s t a r t b i n a r y c d " $ { r u n _ D I R } " i f [ $ ?
- n
e ] ; t h e n e c h
- "
E R R O R : E n t e r i n g $ { r u n _ D I R } f a i l e d " ; e x i t 1 ; f i . / $ E X E $ { i n p u t } > $
- u
t p u t 2 > & 1 # # f ) C h e c k r u n s t a t u s i f [ $ ?
- n
e ] ; t h e n e c h
- "
W A R N I N G : $ { E X E } d i d n
- t
r u n p r
- p
e r l y ! " f i # # g ) T r a n s f e r
- u
t p u t f i l e s t
- s
u b m i t d i r e c t
- r
y c d " $ { r u n _ D I R } " f
- r
X i n $ { s a v e _ l i s t } ; d
- c
p
- p
v " $ { X } " " $ { M O A B _ S U B M I T D I R } " i f [ $ ?
- n
e ] ; t h e n e c h
- "
W A R N I N G : C
- p
y
- f
$ { X } f a i l e d " ; f i d
- n
e # # h ) C l e a n u p r u n d i r e c t
- r
y r m
- f
$ { r u n _ D I R } / * ; r m d i r $ { r u n _ D I R } ; e x i t
Part d-h:
$ { W O R K S H O P } / s
- l
u t i
- n
s / 3 / 1 _ g e n e r a l i s e d _ j
- b
_ r u n _ u n d e r _ l
- c
a l _ t m p d i r . s h
S
- l
u t i
- n
! S
- l
u t i
- n
!
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
10
Case 2: Chain Jobs (1)
Idea:
Do N consecutjve Jobs via N MOAB Batch Jobs
Goal:
Do everything in one script Submit only at the beginning
„Pre-step“: generate script that runs interactjvely
Result:
$ { W O R K S H O P } / e x e r c i s e s / 3 / 2 _ c h a i n _ j
- b
. s h $ { W O R K S H O P } / e x e r c i s e s / 3 / 2 _ c h a i n _ j
- b
. s h
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
11
Case 2: Chain Jobs (2)
# ! / b i n / b a s h # # D e f a u l t s l
- p
_ m a x = 1 c m d = ' s l e e p 2 ' # # C h e c k i f c
- u
n t e r e n v i r
- n
m e n t v a r i a b l e i s s e t i f [
- z
" $ { m y l
- p
_ c
- u
n t e r } " ] ; t h e n e c h
- "
E R R O R : m y l
- p
_ c
- u
n t e r i s u n d e f i n e d , s t
- p
c h a i n j
- b
" ; e x i t 1 f i # # O n l y c
- n
t i n u e i f b e l
- w
l
- p
_ m a x i f [ $ { m y l
- p
_ c
- u
n t e r }
- l
t $ { l
- p
_ m a x } ] ; t h e n # # I n c r e a s e c
- u
n t e r l e t m y l
- p
_ c
- u
n t e r + = 1 # # P r i n t c u r r e n t J
- b
n u m b e r e c h
- "
C h a i n j
- b
i t e r a t i
- n
= $ { m y l
- p
_ c
- u
n t e r } " # # E x e c u t e y
- u
r c
- m
m a n d e c h
- "
- >
e x e c u t i n g $ { c m d } " $ { c m d } i f [ $ ?
- e
q ] ; t h e n # # C
- n
t i n u e
- n
l y i f l a s t c
- m
m a n d w a s s u c c e s s f u l e x p
- r
t m y l
- p
_ c
- u
n t e r = $ { m y l
- p
_ c
- u
n t e r } . / $ { } e l s e # # T e r m i n a t e c h a i n e c h
- "
E R R O R : $ { c m d }
- f
c h a i n j
- b
n
- .
$ { m y l
- p
_ c
- u
n t e r } t e r m i n a t e d u n e x p e c t e d l y " e x i t 1 f i f i
$ e x p
- r
t m y l
- p
_ c
- u
n t e r = $ . / 2 _ i n t e r a c t i v e _ c h a i n _ j
- b
$ e x p
- r
t m y l
- p
_ c
- u
n t e r = $ . / 2 _ i n t e r a c t i v e _ c h a i n _ j
- b
$ { W O R K S H O P } / e x e r c i s e s / 3 / 2 _ c h a i n _ j
- b
. s h $ { W O R K S H O P } / e x e r c i s e s / 3 / 2 _ c h a i n _ j
- b
. s h
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
12
Case 2: Chain Jobs (2) → How for MOAB?
# ! / b i n / b a s h # M S U B . . . # # D e f a u l t s l
- p
_ m a x = 1 c m d = ' s l e e p 2 ' # # C h e c k i f c
- u
n t e r e n v i r
- n
m e n t v a r i a b l e i s s e t i f [
- z
" $ { m y l
- p
_ c
- u
n t e r } " ] ; t h e n e c h
- "
E R R O R : m y l
- p
_ c
- u
n t e r i s u n d e f i n e d , s t
- p
c h a i n j
- b
" ; e x i t 1 f i # #
- n
l y c
- n
t i n u e i f b e l
- w
l
- p
_ m a x i f [ $ { m y l
- p
_ c
- u
n t e r }
- l
t $ { l
- p
_ m a x } ] ; t h e n # # i n c r e a s e c
- u
n t e r l e t m y l
- p
_ c
- u
n t e r + = 1 # # p r i n t c u r r e n t J
- b
n u m b e r e c h
- "
C h a i n j
- b
i t e r a t i
- n
= $ { m y l
- p
_ c
- u
n t e r } " # # E x e c u t e y
- u
r c
- m
m a n d e c h
- "
- >
e x e c u t i n g $ { c m d } " $ { c m d } i f [ $ ?
- e
q ] ; t h e n # # c
- n
t i n u e
- n
l y i f l a s t c
- m
m a n d w a s s u c c e s s f u l e x p
- r
t m y l
- p
_ c
- u
n t e r = $ { m y l
- p
_ c
- u
n t e r } . / $ { } e l s e # # T e r m i n a t e c h a i n e c h
- "
E R R O R : $ { c m d }
- f
c h a i n j
- b
n
- .
$ { m y l
- p
_ c
- u
n t e r } t e r m i n a t e d u n e x p e c t e d l y " e x i t 1 f i f i
T A S K / T
- D
- :
5 m i n * a d d t h e p a r t s f
- r
M O A B T A S K / T
- D
- :
5 m i n * a d d t h e p a r t s f
- r
M O A B
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
13
Case 2: Chain Jobs (3) → for Moab
# ! / b i n / b a s h # M S U B
- l
n
- d
e s = 1 : p p n = 1 , w a l l t i m e = : : 5 , p m e m = 5 m b # # D e f a u l t s l
- p
_ m a x = 1 c m d = ' s l e e p 2 ' # # C h e c k i f c
- u
n t e r e n v i r
- n
m e n t v a r i a b l e i s s e t i f [
- z
" $ { m y l
- p
_ c
- u
n t e r } " ] ; t h e n e c h
- "
E R R O R : m y l
- p
_ c
- u
n t e r i s u n d e f i n e d , s t
- p
c h a i n j
- b
" ; e x i t 1 f i # #
- n
l y c
- n
t i n u e i f b e l
- w
l
- p
_ m a x i f [ $ { m y l
- p
_ c
- u
n t e r }
- l
t $ { l
- p
_ m a x } ] ; t h e n # # i n c r e a s e c
- u
n t e r l e t m y l
- p
_ c
- u
n t e r + = 1 # # p r i n t c u r r e n t J
- b
n u m b e r e c h
- "
C h a i n j
- b
i t e r a t i
- n
= $ { m y l
- p
_ c
- u
n t e r } " # # E x e c u t e y
- u
r c
- m
m a n d e c h
- "
- >
e x e c u t i n g $ { c m d } " $ { c m d } i f [ $ ?
- e
q ] ; t h e n # # c
- n
t i n u e
- n
l y i f l a s t c
- m
m a n d w a s s u c c e s s f u l m s u b
- v
m y l
- p
_ c
- u
n t e r = $ { m y l
- p
_ c
- u
n t e r } . / 2 _ c h a i n _ j
- b
. s h e l s e # # T e r m i n a t e c h a i n e c h
- "
E R R O R : $ { c m d }
- f
c h a i n j
- b
n
- .
$ { m y l
- p
_ c
- u
n t e r } t e r m i n a t e d u n e x p e c t e d l y " e x i t 1 f i f i
$ m s u b
- v
m y l
- p
_ c
- u
n t e r = . / 2 _ c h a i n _ j
- b
. s h $ m s u b
- v
m y l
- p
_ c
- u
n t e r = . / 2 _ c h a i n _ j
- b
. s h S
- l
u t i
- n
! S
- l
u t i
- n
!
$ { W O R K S H O P } / s
- l
u t i
- n
s / 3 / 2 _ c h a i n _ j
- b
. s h $ { W O R K S H O P } / s
- l
u t i
- n
s / 3 / 2 _ c h a i n _ j
- b
. s h
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
14
Case 2: Chain Jobs (4)
moab_chain_job.sh + interactjve_chain_job.sh = → USE bash programming to generalise and unify your batch job scripts
. . . . . . i f [ $ ?
- e
q ] ; t h e n # # c
- n
t i n u e
- n
l y i f l a s t c
- m
m a n d w a s s u c c e s s f u l i f [ !
- z
$ { M O A B _ J O B N A M E } ] ; t h e n # # I f M O A B _ J O B N A M E e n v i r
- n
m e n t v a r i a b l e i s d e f i n e d # #
- >
t h i s s c r i p t i s u n d e r M O A B " c
- n
t r
- l
" m s u b
- v
m y l
- p
_ c
- u
n t e r = $ { m y l
- p
_ c
- u
n t e r } . / g e n e r a l i s e d _ c h a i n _ j
- b
. s h e l s e e x p
- r
t m y l
- p
_ c
- u
n t e r = $ { m y l
- p
_ c
- u
n t e r } . / $ { } f i e l s e # # T e r m i n a t e c h a i n e c h
- "
E R R O R : $ { c m d }
- f
c h a i n j
- b
n
- .
$ { m y l
- p
_ c
- u
n t e r } t e r m i n a t e d u n e x p e c t e d l y " e x i t 1 f i . . . . . .
$ { W O R K S H O P } / s
- l
u t i
- n
s / 3 / 2 _ g e n e r a l i s e d _ c h a i n _ j
- b
. s h $ { W O R K S H O P } / s
- l
u t i
- n
s / 3 / 2 _ g e n e r a l i s e d _ c h a i n _ j
- b
. s h
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
15
Chain Jobs – Alternatjve (1)
Problem of moab_chain_job.sh: Waitjng tjme!
Solutjon: two scripts +
- 1. script:
m s u b
- l
d e p e n d = a f t e r
- k
: < j
- b
I D >
# ! / b i n / b a s h # M S U B . . . # # D e f i n e y
- u
r c
- m
m a n d c m d = ' s l e e p 3 ' # # E x e c u t e y
- u
r c
- m
m a n d e c h
- "
- >
e x e c u t i n g $ { c m d } " $ { c m d } # # D
- y
- u
c h e c k i f c
- r
r e c t l y t e r m i n a t e d i f [ $ ?
- n
e ] ; t h e n # # T e r m i n a t e c h a i n e c h
- "
E R R O R : $ { c m d }
- f
c h a i n j
- b
n
- .
$ { m y l
- p
_ c
- u
n t e r :
- 1
} t e r m i n a t e d u n e x p e c t e d l y " e x i t 1 f i
$ { W O R K S H O P } / s
- l
u t i
- n
s / 3 / 2 _ c h a i n _ l i n k _ j
- b
. s h $ { W O R K S H O P } / s
- l
u t i
- n
s / 3 / 2 _ c h a i n _ l i n k _ j
- b
. s h
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
16
Chain Jobs – Alternatjve (2)
# ! / b i n / b a s h m a x _ n
- j
- b
= $ { 1 :
- 5
} c h a i n _ l i n k _ j
- b
= $ { P W D } / 2 _ c h a i n _ l i n k _ j
- b
. s h d e p _ t y p e = " $ { 2 :
- a
f t e r
- k
} " c
- u
n t e r = 1 w h i l e [ $ { c
- u
n t e r }
- l
e $ { m a x _ n
- j
- b
} ] ; d
- #
# D i f f e r m s u b _
- p
t d e p e n d i n g
- n
c h a i n l i n k n u m b e r i f [ $ { c
- u
n t e r }
- e
q 1 ] ; t h e n m s u b _
- p
t = " " e l s e m s u b _
- p
t = "
- l
d e p e n d = $ { d e p _ t y p e } : $ { j
- b
I D } " f i e c h
- "
C h a i n j
- b
i t e r a t i
- n
= $ { c
- u
n t e r } " e c h
- "
m s u b
- v
m y l
- p
_ c
- u
n t e r = $ { c
- u
n t e r } $ { m s u b _
- p
t } $ { c h a i n _ l i n k _ j
- b
} " # # S t
- r
e j
- b
I D f
- r
n e x t i t e r a t i
- n
b y s t
- r
i n g
- u
t p u t
- f
m s u b c
- m
m a n d w i t h e m p t y l i n e s j
- b
I D = $ ( m s u b
- v
m y l
- p
_ c
- u
n t e r = $ { c
- u
n t e r } $ { m s u b _
- p
t } $ { c h a i n _ l i n k _ j
- b
} 2 > & 1 | s e d ' / ^ $ / d ' ) # # C h e c k i f E R R O R
- c
c u r e d i f [ [ " $ { j
- b
I D } " = ~ " E R R O R " ] ] ; t h e n e c h
- "
- >
s u b m i s s i
- n
f a i l e d ! " ; e x i t 1 e l s e e c h
- "
- >
j
- b
n u m b e r = $ { j
- b
I D } " f i # # I n c r e a s e c
- u
n t e r l e t c
- u
n t e r + = 1 d
- n
e
- 2. script:
$ { W O R K S H O P } / s
- l
u t i
- n
s / 3 / 2 _ m
- a
b _ s u b m i t t e r _ f _ c h a i n _ j
- b
. s h $ { W O R K S H O P } / s
- l
u t i
- n
s / 3 / 2 _ m
- a
b _ s u b m i t t e r _ f _ c h a i n _ j
- b
. s h
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
17
Case 3: Pseudo Parallelisatjon (1)
If you have many (>100) tjny jobs (subjobs) Pack in one job (masterjob) doing: Defjne number of Cores got by queueing system Queue subjobs and assign step by step to free Cores of masterjob
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
22
Case 3: Pseudo Parallelisatjon - Alternatjve
Parbatch → MPI task based Example: job script + joblist.txt
# ! / b i n / b a s h # M S U B
- l
n
- d
e s = 1 : p p n = 4 # M S U B
- l
m e m = 1 5 m b # M S U B
- l
w a l l t i m e = : 3 : m
- d
u l e l
- a
d s y s t e m / p a r b a t c h p a r b a t c h j
- b
l i s t . t x t h
- s
t n a m e ; s l e e p 2 ; e c h
- "
H e l l
- 1
- a
" h
- s
t n a m e ; s l e e p 2 ; e c h
- "
H e l l
- 2
- b
" h
- s
t n a m e ; s l e e p 2 ; e c h
- "
H e l l
- 3
- c
" h
- s
t n a m e ; s l e e p 2 ; e c h
- "
H e l l
- 4
- d
" h
- s
t n a m e ; s l e e p 2 ; e c h
- "
H e l l
- 5
- e
" h
- s
t n a m e ; s l e e p 2 ; e c h
- "
H e l l
- 6
- f
" h
- s
t n a m e ; s l e e p 2 ; e c h
- "
H e l l
- 7
- g
" h
- s
t n a m e ; s l e e p 2 ; e c h
- "
H e l l
- 8
- h
“ $ { W O R K S H O P } / e x e r c i s e s / 3 / 3 _ m s u b _ p a r b a t c h . s h $ { W O R K S H O P } / e x e r c i s e s / 3 / 3 _ m s u b _ p a r b a t c h . s h $ { W O R K S H O P } / e x e r c i s e s / 3 / 3 _ j
- b
l i s t . t x t $ { W O R K S H O P } / e x e r c i s e s / 3 / 3 _ j
- b
l i s t . t x t
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
23
Case 4: Handling walltjme based job aborts
Use: „msub -l signal“ and „trap“ to abort job on own terms
# ! / b i n / b a s h # # P r e
- t
e r m i n a t i
- n
v i a M O A B # # s e n d i n g s i g n a l w i t h d e f i n e d
- f
f s e t # M S U B
- l
n
- d
e s = 1 : p p n = 1 , w a l l t i m e = : 1 : , m e m = 1 m b # M S U B
- l
s i g n a l = 1 5 @ 1 2 # M S U B
- l
a d v r e s = w
- r
k s h
- p
_ s i n g l e . 5 c l e a n u p ( ) { e c h
- "
C l e a n u p b e f
- r
e w a l l t i m e r e a c h e d " e x i t } t r a p c l e a n u p 1 5 e c h
- "
R e p e a t i n g \ " s l e e p 1 \ " l
- p
u n t i l S I G T E R M " w h i l e t r u e ; d
- s
l e e p 1 d
- n
e
$ { W O R K S H O P } / e x e r c i s e s / 3 / 4 _ h a n d l e _ a b
- r
t s . s h $ { W O R K S H O P } / e x e r c i s e s / 3 / 4 _ h a n d l e _ a b
- r
t s . s h
MOAB sends SIGTERM (kill -15) 120 seconds before walltjme is reached
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
24
Best Practjses – Batch jobs with input parsing
Not working:
msub your_script -x argument → msub will interprete -x as an own
- ptjon
Solutjon: (A) Submit wrapper script: (B) Export your script optjons and arguments to environment variable; read in that variable during runtjme of script,
- cf. wiki
(C) Use msub wrapper via:
# ! / b i n / b a s h y
- u
r _ s c r i p t
- x
a r g u m e n t
$ m
- d
u l e l
- a
d s y s t e m / m s u b _ a d d
- n
/ 1 . $ m s u b <
- p
t i
- n
s > j
- b
. s h i f [
- n
" $ { S C R I P T _ F L A G S } " ] ; t h e n i f [
- z
" $ { * } " ] ; t h e n s e t
- $
{ S C R I P T _ F L A G S } f i f i
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
25
Best Practjses – Common problems (2)
Manual defjning of MPI tasks for mpirun?
False: Do not use if your job solely does MPI: m p i r u n – m a c h i n e f i l e = f i l e b i n a r y m p i r u n
- n
< i n t > b i n a r y
→ Correct way:
m p i r u n b i n a r y (because the resource manager tells mpirun what to do)
If you want to know about job allocated hosts in your script to: (A) Use msub wrapper via: (B) Write loop into your batch job script → returns hostname of each task:
$ m
- d
u l e l
- a
d s y s t e m / m s u b _ a d d
- n
/ 1 . $ m s u b <
- p
t i
- n
s > j
- b
. s h
f
- r
t a s k s i n $ ( s r u n h
- s
t n a m e ) ; d
- e
c h
- $
t a s k s d
- n
e
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
26
Best Practjses - Common problems (3)
Core binding
Use ALWAYS the MPI optjons
- b
i n d
- t
- c
- r
e
- m
a p
- b
y c
- r
e | s
- c
k e t | n
- d
e . . .
Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017
27