Tutorial: Advanced (Batch) Job Scriptjng Robert Barthel, SCC, KIT - - PowerPoint PPT Presentation

tutorial
SMART_READER_LITE
LIVE PREVIEW

Tutorial: Advanced (Batch) Job Scriptjng Robert Barthel, SCC, KIT - - PowerPoint PPT Presentation

Tutorial: Advanced (Batch) Job Scriptjng Robert Barthel, SCC, KIT Steinbuch Centre for Computjng (SCC) www.bwhpc-c5.de Funding: How to read the following slides Abbreviatjon/Colour code Full meaning prompt of the interactjve shell $ c o


slide-1
SLIDE 1

Steinbuch Centre for Computjng (SCC) Funding:

www.bwhpc-c5.de

Tutorial:

Advanced (Batch) Job Scriptjng

Robert Barthel, SCC, KIT

slide-2
SLIDE 2

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

2

How to read the following slides

Abbreviatjon/Colour code Full meaning

$ c

  • m

m a n d

  • p

t v a l u e $ =

prompt of the interactjve shell The full prompt may look like:

u s e r @ m a c h i n e : p a t h $

The command has been entered in the interactjve shell session

< i n t e g e r > < s t r i n g > < > =

Placeholder for integer, string etc

f

  • ,

b a r

Metasyntactjc variables

$ { W O R K S H O P } / p f s / d a t a 1 / s

  • f

t w a r e _ u c 1 / b w h p c / k i t / w

  • r

k s h

  • p

/ 2 1 7

  • 1
  • 1

1

slide-3
SLIDE 3

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

3

Goal

Be descriptjve!

Comment your code

e.g. via headers sectjons of script and functjons.

Decipherable names for variables and functjons

Organise and structure!

Break complex scripts into simpler blocks e.g. use functjons Use exit codes Use standardized parameter fmags for script invocatjon.

Write job script that runs interactjvely

→ Then add part for MOAB Header Header Declaratjons (MSUB + defaults) Declaratjons (MSUB + defaults) Functjons Functjons Input handling Input handling Main sectjon Main sectjon Footer Footer

slide-4
SLIDE 4

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

4

Best Practjses – Common Problems (1)

Do not Run your code, applicatjon, job on login nodes / in $ { H O M E } :

for interactjve jobs use msub -I -V

Multjnode Job:

use workspaces Producing Tbyte of scratch fjles & >10000 File: Change your applicatjon code Need help? Apply for Tiger Team Support.

Singlenode Job:

use $ { T M P D I R } : HowTo → Case 1

Chain jobs: HowTo → Case 2 Many sequentjal tjny jobs:

Bundle to one big job: HowTo → Case 3

Handling walltjme based job aborts: HowTo → Case 4 Use of MPI/OpenMP Parallelisatjon in jobs:

htups://indico.scc.kit.edu/indico/event/310/material/slides/13.pdf

slide-5
SLIDE 5

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

5

Rev: MOAB variables

htup://www.bwhpc-c5.de/wiki/index.php/Batch_Jobs#Moab_Environment_Variables MSUB variables:

# ! / b i n / b a s h # M S U B

  • N

t e s t # M S U B

  • l

n

  • d

e s = 1 : p p n = 1 , m e m = 5 m b # M S U B

  • l

w a l l t i m e = : 5 : # M S U B

  • m

n # M S U B

  • v

m y _

  • w

n _ v a r i a b l e s = “ a r g u m e n t s “

slide-6
SLIDE 6

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

6

Case 1: Jobs @ $TMPDIR (1)

If temporary fjles of job > Gbyte → Run your job at $ { T M P D I R }

but ONLY if single node jobs

What to do:

Generate subdirectory under $ { T M P D I R } = > $ { r u n _ D I R } Copy to $ { r u n _ D I R } Change to $ { r u n _ D I R } & program executjon Copy results to start DIR

How?

Start with templates:

$ { W O R K S H O P } / e x e r c i s e s / 3 / 1 _ j

  • b

_ r u n _ u n d e r _ l

  • c

a l _ t m p d i r . s h + $ { W O R K S H O P } / e x e r c i s e s / 3 / { 1 _ g e n _ f i l e s , 1 _ g e n _ f i l e s . i n p } $ { W O R K S H O P } / e x e r c i s e s / 3 / 1 _ j

  • b

_ r u n _ u n d e r _ l

  • c

a l _ t m p d i r . s h + $ { W O R K S H O P } / e x e r c i s e s / 3 / { 1 _ g e n _ f i l e s , 1 _ g e n _ f i l e s . i n p }

slide-7
SLIDE 7

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

7

Case 1: Jobs @ $TMPDIR (2)

# ! / b i n / b a s h

. . .

# # a ) T u t

  • r

i a l T

  • D
  • :

l

  • a

d m

  • d

u l e s I N T E L + M K L i f n

  • t

l

  • a

d e d # # b ) D e f i n e y

  • u

r r u n d i r e c t

  • r

y u n d e r t m p d i r # # i n c

  • r

p

  • r

a t i n g u s e r n a m e a n d J

  • b

I D / P I D m k d i r

  • p

v " $ { T M P D I R } / $ { U S E R } . $ { M O A B _ J O B I D :

  • $

$ } " # # c ) T u t

  • r

i a l T

  • D
  • :

C h e c k e x i s t e n c e

  • f

r u n d i r e c t

  • r

y # # d ) C

  • p

y f i l e s f r

  • m

s u b m i t d i r e c t

  • r

y # # t

  • r

u n d i r e c t

  • r

y c d $ M O A B _ S U B M I T D I R c p

  • p

v g e n _ f i l e s . x " $ { T M P D I R } / $ { U S E R } . $ { M O A B _ J O B I D :

  • $

$ } " # # C h e c k i f c

  • p

y s u c c e e d e d c p

  • p

v g e n _ f i l e s . i n p " $ { T M P D I R } / $ { U S E R } . $ { M O A B _ J O B I D :

  • $

$ } " # # e ) C h a n g e t

  • r

u n d i r e c t

  • r

y ( c h e c k i f s u c c e e d e d ) a n d s t a r t b i n a r y + i n p u t f i l e c d " $ { T M P } / $ { U S E R } . $ { M O A B _ J O B I D } " . / 1 _ g e n _ f i l e s . x 1 _ g e n _ f i l e s . i n p # # f ) T u t

  • r

i a l T

  • D
  • :

c h e c k r u n s t a t u s # # g ) t r a n s f e r f i l e s t

  • s

u b m i t d i r e c t

  • r

y c p

  • p

v f i l e s _ * .

  • u

t " $ { M O A B _ S U B M I T D I R } " # # h ) T u t

  • r

i a l T

  • D
  • :

c l e a n u p r u n _ D I R

Code snip:

T A S K / T

  • D
  • :

1 m i n * G e n e r a l i s e b l u e c

  • d

e a v

  • i

d i n g r e p e t i t i

  • n

* W r i t e c

  • d

e f

  • r

a

  • h

* R e d i r e c t

  • u

t p u t

  • f

b i n a r y T A S K / T

  • D
  • :

1 m i n * G e n e r a l i s e b l u e c

  • d

e a v

  • i

d i n g r e p e t i t i

  • n

* W r i t e c

  • d

e f

  • r

a

  • h

* R e d i r e c t

  • u

t p u t

  • f

b i n a r y

$ { W O R K S H O P } / e x e r c i s e s / 3 / 1 _ j

  • b

_ r u n _ u n d e r _ l

  • c

a l _ t m p d i r . s h

slide-8
SLIDE 8

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

8

Case 1: Jobs @ $TMP (3)

# # 1 ) D e f i n e f u l l p a t h

  • f

y

  • u

r b i n a r y E X E = " $ { M O A B _ S U B M I T D I R :

  • $

{ P W D } } / 1 _ g e n _ f i l e s . x " # # 2 ) D e f i n e

  • u

t p u t f i l e # # = N a m e

  • f

e x e c u t a b l e + J O B I D

  • r

P I D

  • u

t p u t = " $ ( b a s e n a m e $ { E X E } ) _ $ { M O A B _ J O B I D :

  • $

$ } . l

  • g

" # # 3 ) D e f i n e f u l l p a t h i n p u t f i l e s I n p u t = " $ { M O A B _ S U B M I T D I R :

  • $

{ P W D } } / 1 _ g e n _ f i l e s . i n p " # # 4 ) D e f i n e i n p u t f i l e s t

  • b

e c

  • p

i e d c

  • p

y _ l i s t = " $ { E X E } $ { i n p u t } " # # 5 ) D e f i n e f i l e s t

  • b

e c

  • p

i e d b a c k a f t e r r u n , i . e .

  • u

t p u t f i l e s a v e _ l i s t = " $ {

  • u

t p u t } f i l e s _ * .

  • u

t " # # a ) L

  • a

d m

  • d

u l e s I N T E L + M K L i f n

  • t

l

  • a

d e d f

  • r

m

  • d

i n c

  • m

p i l e r / i n t e l n u m l i b / m k l ; d

  • m
  • d

u l e l i s t 2 > & 1 | g r e p " $ { m

  • d

} " > / d e v / n u l l | | m

  • d

u l e l

  • a

d " $ { m

  • d

} " d

  • n

e # # b ) D e f i n e y

  • u

r r u n d i r e c t

  • r

y a n d a n d g e n e r a t e v i a m k d i r r u n _ D I R = " $ { T M P D I R } / $ { U S E R } . $ { M O A B _ J O B I D :

  • $

$ } " m k d i r

  • p

v " $ { r u n _ D I R } " # # c ) C h e c k e x i s t e n c e

  • f

r u n d i r e c t

  • r

y i f [ !

  • d

" $ { r u n _ D I R } " ] ; t h e n e c h

  • "

E R R O R : R u n D I R = $ { r u n _ D I R } d

  • e

s n

  • t

e x i s t " ; e x i t 1 f i

  • Decl. + a-c:

S

  • l

u t i

  • n

! S

  • l

u t i

  • n

!

$ { W O R K S H O P } / s

  • l

u t i

  • n

s / 3 / 1 _ g e n e r a l i s e d _ j

  • b

_ r u n _ u n d e r _ l

  • c

a l _ t m p d i r . s h

slide-9
SLIDE 9

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

9

Case 1: Jobs @ $TMP (4)

# # d ) C h a n g e t

  • S

u b m i t D i r

  • r

P W D / C

  • p

y f i l e s f r

  • m

s u b m i t _ D I R t

  • r

u n _ D I R c d " $ { M O A B _ S U B M I T D I R :

  • $

{ P W D } } " f

  • r

X i n $ { c

  • p

y _ l i s t } ; d

  • c

p

  • p

v " $ { X } " " $ { r u n _ D I R } " i f [ $ ?

  • n

e ] ; t h e n e c h

  • "

E R R O R : C

  • p

y

  • f

$ { X } f a i l e d " ; e x i t 1 ; f i d

  • n

e # # e ) C h a n g e t

  • r

u n D I R a n d s t a r t b i n a r y c d " $ { r u n _ D I R } " i f [ $ ?

  • n

e ] ; t h e n e c h

  • "

E R R O R : E n t e r i n g $ { r u n _ D I R } f a i l e d " ; e x i t 1 ; f i . / $ E X E $ { i n p u t } > $

  • u

t p u t 2 > & 1 # # f ) C h e c k r u n s t a t u s i f [ $ ?

  • n

e ] ; t h e n e c h

  • "

W A R N I N G : $ { E X E } d i d n

  • t

r u n p r

  • p

e r l y ! " f i # # g ) T r a n s f e r

  • u

t p u t f i l e s t

  • s

u b m i t d i r e c t

  • r

y c d " $ { r u n _ D I R } " f

  • r

X i n $ { s a v e _ l i s t } ; d

  • c

p

  • p

v " $ { X } " " $ { M O A B _ S U B M I T D I R } " i f [ $ ?

  • n

e ] ; t h e n e c h

  • "

W A R N I N G : C

  • p

y

  • f

$ { X } f a i l e d " ; f i d

  • n

e # # h ) C l e a n u p r u n d i r e c t

  • r

y r m

  • f

$ { r u n _ D I R } / * ; r m d i r $ { r u n _ D I R } ; e x i t

Part d-h:

$ { W O R K S H O P } / s

  • l

u t i

  • n

s / 3 / 1 _ g e n e r a l i s e d _ j

  • b

_ r u n _ u n d e r _ l

  • c

a l _ t m p d i r . s h

S

  • l

u t i

  • n

! S

  • l

u t i

  • n

!

slide-10
SLIDE 10

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

10

Case 2: Chain Jobs (1)

Idea:

Do N consecutjve Jobs via N MOAB Batch Jobs

Goal:

Do everything in one script Submit only at the beginning

„Pre-step“: generate script that runs interactjvely

Result:

$ { W O R K S H O P } / e x e r c i s e s / 3 / 2 _ c h a i n _ j

  • b

. s h $ { W O R K S H O P } / e x e r c i s e s / 3 / 2 _ c h a i n _ j

  • b

. s h

slide-11
SLIDE 11

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

11

Case 2: Chain Jobs (2)

# ! / b i n / b a s h # # D e f a u l t s l

  • p

_ m a x = 1 c m d = ' s l e e p 2 ' # # C h e c k i f c

  • u

n t e r e n v i r

  • n

m e n t v a r i a b l e i s s e t i f [

  • z

" $ { m y l

  • p

_ c

  • u

n t e r } " ] ; t h e n e c h

  • "

E R R O R : m y l

  • p

_ c

  • u

n t e r i s u n d e f i n e d , s t

  • p

c h a i n j

  • b

" ; e x i t 1 f i # # O n l y c

  • n

t i n u e i f b e l

  • w

l

  • p

_ m a x i f [ $ { m y l

  • p

_ c

  • u

n t e r }

  • l

t $ { l

  • p

_ m a x } ] ; t h e n # # I n c r e a s e c

  • u

n t e r l e t m y l

  • p

_ c

  • u

n t e r + = 1 # # P r i n t c u r r e n t J

  • b

n u m b e r e c h

  • "

C h a i n j

  • b

i t e r a t i

  • n

= $ { m y l

  • p

_ c

  • u

n t e r } " # # E x e c u t e y

  • u

r c

  • m

m a n d e c h

  • "
  • >

e x e c u t i n g $ { c m d } " $ { c m d } i f [ $ ?

  • e

q ] ; t h e n # # C

  • n

t i n u e

  • n

l y i f l a s t c

  • m

m a n d w a s s u c c e s s f u l e x p

  • r

t m y l

  • p

_ c

  • u

n t e r = $ { m y l

  • p

_ c

  • u

n t e r } . / $ { } e l s e # # T e r m i n a t e c h a i n e c h

  • "

E R R O R : $ { c m d }

  • f

c h a i n j

  • b

n

  • .

$ { m y l

  • p

_ c

  • u

n t e r } t e r m i n a t e d u n e x p e c t e d l y " e x i t 1 f i f i

$ e x p

  • r

t m y l

  • p

_ c

  • u

n t e r = $ . / 2 _ i n t e r a c t i v e _ c h a i n _ j

  • b

$ e x p

  • r

t m y l

  • p

_ c

  • u

n t e r = $ . / 2 _ i n t e r a c t i v e _ c h a i n _ j

  • b

$ { W O R K S H O P } / e x e r c i s e s / 3 / 2 _ c h a i n _ j

  • b

. s h $ { W O R K S H O P } / e x e r c i s e s / 3 / 2 _ c h a i n _ j

  • b

. s h

slide-12
SLIDE 12

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

12

Case 2: Chain Jobs (2) → How for MOAB?

# ! / b i n / b a s h # M S U B . . . # # D e f a u l t s l

  • p

_ m a x = 1 c m d = ' s l e e p 2 ' # # C h e c k i f c

  • u

n t e r e n v i r

  • n

m e n t v a r i a b l e i s s e t i f [

  • z

" $ { m y l

  • p

_ c

  • u

n t e r } " ] ; t h e n e c h

  • "

E R R O R : m y l

  • p

_ c

  • u

n t e r i s u n d e f i n e d , s t

  • p

c h a i n j

  • b

" ; e x i t 1 f i # #

  • n

l y c

  • n

t i n u e i f b e l

  • w

l

  • p

_ m a x i f [ $ { m y l

  • p

_ c

  • u

n t e r }

  • l

t $ { l

  • p

_ m a x } ] ; t h e n # # i n c r e a s e c

  • u

n t e r l e t m y l

  • p

_ c

  • u

n t e r + = 1 # # p r i n t c u r r e n t J

  • b

n u m b e r e c h

  • "

C h a i n j

  • b

i t e r a t i

  • n

= $ { m y l

  • p

_ c

  • u

n t e r } " # # E x e c u t e y

  • u

r c

  • m

m a n d e c h

  • "
  • >

e x e c u t i n g $ { c m d } " $ { c m d } i f [ $ ?

  • e

q ] ; t h e n # # c

  • n

t i n u e

  • n

l y i f l a s t c

  • m

m a n d w a s s u c c e s s f u l e x p

  • r

t m y l

  • p

_ c

  • u

n t e r = $ { m y l

  • p

_ c

  • u

n t e r } . / $ { } e l s e # # T e r m i n a t e c h a i n e c h

  • "

E R R O R : $ { c m d }

  • f

c h a i n j

  • b

n

  • .

$ { m y l

  • p

_ c

  • u

n t e r } t e r m i n a t e d u n e x p e c t e d l y " e x i t 1 f i f i

T A S K / T

  • D
  • :

5 m i n * a d d t h e p a r t s f

  • r

M O A B T A S K / T

  • D
  • :

5 m i n * a d d t h e p a r t s f

  • r

M O A B

slide-13
SLIDE 13

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

13

Case 2: Chain Jobs (3) → for Moab

# ! / b i n / b a s h # M S U B

  • l

n

  • d

e s = 1 : p p n = 1 , w a l l t i m e = : : 5 , p m e m = 5 m b # # D e f a u l t s l

  • p

_ m a x = 1 c m d = ' s l e e p 2 ' # # C h e c k i f c

  • u

n t e r e n v i r

  • n

m e n t v a r i a b l e i s s e t i f [

  • z

" $ { m y l

  • p

_ c

  • u

n t e r } " ] ; t h e n e c h

  • "

E R R O R : m y l

  • p

_ c

  • u

n t e r i s u n d e f i n e d , s t

  • p

c h a i n j

  • b

" ; e x i t 1 f i # #

  • n

l y c

  • n

t i n u e i f b e l

  • w

l

  • p

_ m a x i f [ $ { m y l

  • p

_ c

  • u

n t e r }

  • l

t $ { l

  • p

_ m a x } ] ; t h e n # # i n c r e a s e c

  • u

n t e r l e t m y l

  • p

_ c

  • u

n t e r + = 1 # # p r i n t c u r r e n t J

  • b

n u m b e r e c h

  • "

C h a i n j

  • b

i t e r a t i

  • n

= $ { m y l

  • p

_ c

  • u

n t e r } " # # E x e c u t e y

  • u

r c

  • m

m a n d e c h

  • "
  • >

e x e c u t i n g $ { c m d } " $ { c m d } i f [ $ ?

  • e

q ] ; t h e n # # c

  • n

t i n u e

  • n

l y i f l a s t c

  • m

m a n d w a s s u c c e s s f u l m s u b

  • v

m y l

  • p

_ c

  • u

n t e r = $ { m y l

  • p

_ c

  • u

n t e r } . / 2 _ c h a i n _ j

  • b

. s h e l s e # # T e r m i n a t e c h a i n e c h

  • "

E R R O R : $ { c m d }

  • f

c h a i n j

  • b

n

  • .

$ { m y l

  • p

_ c

  • u

n t e r } t e r m i n a t e d u n e x p e c t e d l y " e x i t 1 f i f i

$ m s u b

  • v

m y l

  • p

_ c

  • u

n t e r = . / 2 _ c h a i n _ j

  • b

. s h $ m s u b

  • v

m y l

  • p

_ c

  • u

n t e r = . / 2 _ c h a i n _ j

  • b

. s h S

  • l

u t i

  • n

! S

  • l

u t i

  • n

!

$ { W O R K S H O P } / s

  • l

u t i

  • n

s / 3 / 2 _ c h a i n _ j

  • b

. s h $ { W O R K S H O P } / s

  • l

u t i

  • n

s / 3 / 2 _ c h a i n _ j

  • b

. s h

slide-14
SLIDE 14

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

14

Case 2: Chain Jobs (4)

moab_chain_job.sh + interactjve_chain_job.sh = → USE bash programming to generalise and unify your batch job scripts

. . . . . . i f [ $ ?

  • e

q ] ; t h e n # # c

  • n

t i n u e

  • n

l y i f l a s t c

  • m

m a n d w a s s u c c e s s f u l i f [ !

  • z

$ { M O A B _ J O B N A M E } ] ; t h e n # # I f M O A B _ J O B N A M E e n v i r

  • n

m e n t v a r i a b l e i s d e f i n e d # #

  • >

t h i s s c r i p t i s u n d e r M O A B " c

  • n

t r

  • l

" m s u b

  • v

m y l

  • p

_ c

  • u

n t e r = $ { m y l

  • p

_ c

  • u

n t e r } . / g e n e r a l i s e d _ c h a i n _ j

  • b

. s h e l s e e x p

  • r

t m y l

  • p

_ c

  • u

n t e r = $ { m y l

  • p

_ c

  • u

n t e r } . / $ { } f i e l s e # # T e r m i n a t e c h a i n e c h

  • "

E R R O R : $ { c m d }

  • f

c h a i n j

  • b

n

  • .

$ { m y l

  • p

_ c

  • u

n t e r } t e r m i n a t e d u n e x p e c t e d l y " e x i t 1 f i . . . . . .

$ { W O R K S H O P } / s

  • l

u t i

  • n

s / 3 / 2 _ g e n e r a l i s e d _ c h a i n _ j

  • b

. s h $ { W O R K S H O P } / s

  • l

u t i

  • n

s / 3 / 2 _ g e n e r a l i s e d _ c h a i n _ j

  • b

. s h

slide-15
SLIDE 15

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

15

Chain Jobs – Alternatjve (1)

Problem of moab_chain_job.sh: Waitjng tjme!

Solutjon: two scripts +

  • 1. script:

m s u b

  • l

d e p e n d = a f t e r

  • k

: < j

  • b

I D >

# ! / b i n / b a s h # M S U B . . . # # D e f i n e y

  • u

r c

  • m

m a n d c m d = ' s l e e p 3 ' # # E x e c u t e y

  • u

r c

  • m

m a n d e c h

  • "
  • >

e x e c u t i n g $ { c m d } " $ { c m d } # # D

  • y
  • u

c h e c k i f c

  • r

r e c t l y t e r m i n a t e d i f [ $ ?

  • n

e ] ; t h e n # # T e r m i n a t e c h a i n e c h

  • "

E R R O R : $ { c m d }

  • f

c h a i n j

  • b

n

  • .

$ { m y l

  • p

_ c

  • u

n t e r :

  • 1

} t e r m i n a t e d u n e x p e c t e d l y " e x i t 1 f i

$ { W O R K S H O P } / s

  • l

u t i

  • n

s / 3 / 2 _ c h a i n _ l i n k _ j

  • b

. s h $ { W O R K S H O P } / s

  • l

u t i

  • n

s / 3 / 2 _ c h a i n _ l i n k _ j

  • b

. s h

slide-16
SLIDE 16

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

16

Chain Jobs – Alternatjve (2)

# ! / b i n / b a s h m a x _ n

  • j
  • b

= $ { 1 :

  • 5

} c h a i n _ l i n k _ j

  • b

= $ { P W D } / 2 _ c h a i n _ l i n k _ j

  • b

. s h d e p _ t y p e = " $ { 2 :

  • a

f t e r

  • k

} " c

  • u

n t e r = 1 w h i l e [ $ { c

  • u

n t e r }

  • l

e $ { m a x _ n

  • j
  • b

} ] ; d

  • #

# D i f f e r m s u b _

  • p

t d e p e n d i n g

  • n

c h a i n l i n k n u m b e r i f [ $ { c

  • u

n t e r }

  • e

q 1 ] ; t h e n m s u b _

  • p

t = " " e l s e m s u b _

  • p

t = "

  • l

d e p e n d = $ { d e p _ t y p e } : $ { j

  • b

I D } " f i e c h

  • "

C h a i n j

  • b

i t e r a t i

  • n

= $ { c

  • u

n t e r } " e c h

  • "

m s u b

  • v

m y l

  • p

_ c

  • u

n t e r = $ { c

  • u

n t e r } $ { m s u b _

  • p

t } $ { c h a i n _ l i n k _ j

  • b

} " # # S t

  • r

e j

  • b

I D f

  • r

n e x t i t e r a t i

  • n

b y s t

  • r

i n g

  • u

t p u t

  • f

m s u b c

  • m

m a n d w i t h e m p t y l i n e s j

  • b

I D = $ ( m s u b

  • v

m y l

  • p

_ c

  • u

n t e r = $ { c

  • u

n t e r } $ { m s u b _

  • p

t } $ { c h a i n _ l i n k _ j

  • b

} 2 > & 1 | s e d ' / ^ $ / d ' ) # # C h e c k i f E R R O R

  • c

c u r e d i f [ [ " $ { j

  • b

I D } " = ~ " E R R O R " ] ] ; t h e n e c h

  • "
  • >

s u b m i s s i

  • n

f a i l e d ! " ; e x i t 1 e l s e e c h

  • "
  • >

j

  • b

n u m b e r = $ { j

  • b

I D } " f i # # I n c r e a s e c

  • u

n t e r l e t c

  • u

n t e r + = 1 d

  • n

e

  • 2. script:

$ { W O R K S H O P } / s

  • l

u t i

  • n

s / 3 / 2 _ m

  • a

b _ s u b m i t t e r _ f _ c h a i n _ j

  • b

. s h $ { W O R K S H O P } / s

  • l

u t i

  • n

s / 3 / 2 _ m

  • a

b _ s u b m i t t e r _ f _ c h a i n _ j

  • b

. s h

slide-17
SLIDE 17

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

17

Case 3: Pseudo Parallelisatjon (1)

If you have many (>100) tjny jobs (subjobs) Pack in one job (masterjob) doing: Defjne number of Cores got by queueing system Queue subjobs and assign step by step to free Cores of masterjob

slide-18
SLIDE 18

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

22

Case 3: Pseudo Parallelisatjon - Alternatjve

Parbatch → MPI task based Example: job script + joblist.txt

# ! / b i n / b a s h # M S U B

  • l

n

  • d

e s = 1 : p p n = 4 # M S U B

  • l

m e m = 1 5 m b # M S U B

  • l

w a l l t i m e = : 3 : m

  • d

u l e l

  • a

d s y s t e m / p a r b a t c h p a r b a t c h j

  • b

l i s t . t x t h

  • s

t n a m e ; s l e e p 2 ; e c h

  • "

H e l l

  • 1
  • a

" h

  • s

t n a m e ; s l e e p 2 ; e c h

  • "

H e l l

  • 2
  • b

" h

  • s

t n a m e ; s l e e p 2 ; e c h

  • "

H e l l

  • 3
  • c

" h

  • s

t n a m e ; s l e e p 2 ; e c h

  • "

H e l l

  • 4
  • d

" h

  • s

t n a m e ; s l e e p 2 ; e c h

  • "

H e l l

  • 5
  • e

" h

  • s

t n a m e ; s l e e p 2 ; e c h

  • "

H e l l

  • 6
  • f

" h

  • s

t n a m e ; s l e e p 2 ; e c h

  • "

H e l l

  • 7
  • g

" h

  • s

t n a m e ; s l e e p 2 ; e c h

  • "

H e l l

  • 8
  • h

“ $ { W O R K S H O P } / e x e r c i s e s / 3 / 3 _ m s u b _ p a r b a t c h . s h $ { W O R K S H O P } / e x e r c i s e s / 3 / 3 _ m s u b _ p a r b a t c h . s h $ { W O R K S H O P } / e x e r c i s e s / 3 / 3 _ j

  • b

l i s t . t x t $ { W O R K S H O P } / e x e r c i s e s / 3 / 3 _ j

  • b

l i s t . t x t

slide-19
SLIDE 19

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

23

Case 4: Handling walltjme based job aborts

Use: „msub -l signal“ and „trap“ to abort job on own terms

# ! / b i n / b a s h # # P r e

  • t

e r m i n a t i

  • n

v i a M O A B # # s e n d i n g s i g n a l w i t h d e f i n e d

  • f

f s e t # M S U B

  • l

n

  • d

e s = 1 : p p n = 1 , w a l l t i m e = : 1 : , m e m = 1 m b # M S U B

  • l

s i g n a l = 1 5 @ 1 2 # M S U B

  • l

a d v r e s = w

  • r

k s h

  • p

_ s i n g l e . 5 c l e a n u p ( ) { e c h

  • "

C l e a n u p b e f

  • r

e w a l l t i m e r e a c h e d " e x i t } t r a p c l e a n u p 1 5 e c h

  • "

R e p e a t i n g \ " s l e e p 1 \ " l

  • p

u n t i l S I G T E R M " w h i l e t r u e ; d

  • s

l e e p 1 d

  • n

e

$ { W O R K S H O P } / e x e r c i s e s / 3 / 4 _ h a n d l e _ a b

  • r

t s . s h $ { W O R K S H O P } / e x e r c i s e s / 3 / 4 _ h a n d l e _ a b

  • r

t s . s h

MOAB sends SIGTERM (kill -15) 120 seconds before walltjme is reached

slide-20
SLIDE 20

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

24

Best Practjses – Batch jobs with input parsing

Not working:

msub your_script -x argument → msub will interprete -x as an own

  • ptjon

Solutjon: (A) Submit wrapper script: (B) Export your script optjons and arguments to environment variable; read in that variable during runtjme of script,

  • cf. wiki

(C) Use msub wrapper via:

# ! / b i n / b a s h y

  • u

r _ s c r i p t

  • x

a r g u m e n t

$ m

  • d

u l e l

  • a

d s y s t e m / m s u b _ a d d

  • n

/ 1 . $ m s u b <

  • p

t i

  • n

s > j

  • b

. s h i f [

  • n

" $ { S C R I P T _ F L A G S } " ] ; t h e n i f [

  • z

" $ { * } " ] ; t h e n s e t

  • $

{ S C R I P T _ F L A G S } f i f i

slide-21
SLIDE 21

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

25

Best Practjses – Common problems (2)

Manual defjning of MPI tasks for mpirun?

False: Do not use if your job solely does MPI: m p i r u n – m a c h i n e f i l e = f i l e b i n a r y m p i r u n

  • n

< i n t > b i n a r y

→ Correct way:

m p i r u n b i n a r y (because the resource manager tells mpirun what to do)

If you want to know about job allocated hosts in your script to: (A) Use msub wrapper via: (B) Write loop into your batch job script → returns hostname of each task:

$ m

  • d

u l e l

  • a

d s y s t e m / m s u b _ a d d

  • n

/ 1 . $ m s u b <

  • p

t i

  • n

s > j

  • b

. s h

f

  • r

t a s k s i n $ ( s r u n h

  • s

t n a m e ) ; d

  • e

c h

  • $

t a s k s d

  • n

e

slide-22
SLIDE 22

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

26

Best Practjses - Common problems (3)

Core binding

Use ALWAYS the MPI optjons

  • b

i n d

  • t
  • c
  • r

e

  • m

a p

  • b

y c

  • r

e | s

  • c

k e t | n

  • d

e . . .

slide-23
SLIDE 23

Tutorial: Adv. Batch Job Scriptjng / R. Barthel 11/10/2017

27

Thank you for your atuentjon!