European HTCondor Workshop December 2014 summary Ian Collier - - PowerPoint PPT Presentation

▶

Feb 17, 2023 485 likes •665 views

European HTCondor Workshop December 2014 summary Ian Collier (Brial Bockelman, Greg Thain, Todd Tannenbaum) GDB 10th December 2014 Background European HTCondor Admins Workshop At CERN, December 8 th -9 th 2014 Idea at HEPiX in

SLIDE 1

European HTCondor Workshop December 2014 summary

Ian Collier (Brial Bockelman, Greg Thain, Todd Tannenbaum) GDB 10th December 2014

SLIDE 2

Background

European HTCondor Admins Workshop

– At CERN, December 8th-9th 2014 – Idea at HEPiX in Nebraska – Several years since last European Condor Week – 30-40 people in the room – 5-10 remote – Followed by individual meetings today & tomorrow

Agenda & slides:

https://indico.cern.ch/event/272794/

Notes:

https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20141208

SLIDE 3

European HTCondor Meeting 8/9 December

Agenda included:

– Introduction to HT Computing & HTCondor – Essentials of setting up and running HTCondor – Site experiences – Monitoring – Advanced management of HTCondor

Condor Scripting, Job Scheduling, Security, Putting your

users in a box

– HTCondor & European grid – Integrating HTCondor & private clouds – Ask/Stump the experts panel discussions

SLIDE 4

Introductory Sessions

Talks by Greg Thain & Todd Tannenbaum – see

slides

HTComputing – emphasis on getting work

done by ensuring job slots are utilised as

pposed to the fastest machines possible

SLIDE 5

Introductory Sessions

High performance

SLIDE 6

Introductory Sessions

High throughput

SLIDE 7

Introductory Sessions

Talks by Greg Thain & Todd Tannenbaum – see

slides

HTComputing – emphasis on getting work

done by ensuring job slots are utilised as

pposed to the fastest machines possible
Tension maximum number of machines (by

minimizing constraints on them) and number

f job run (jobs everywhere)

SLIDE 8

Introductory Sessions – Using HTCondor

Jobs state their requirements and preferences, and attributes about themselves:

Requirements:

– I require a Linux/x86 platform – I require 500MB RAM

Preferences ("Rank"):

– I prefer a machine in the chemistry department – I prefer a machine with the fastest floating point

Custom Attributes

– I am a job of type “analysis”

SLIDE 9

Introductory Sessions – Using HTCondor

Machines specify:
Requirements:

– Require that jobs run only when there is no keyboard activity – Never run jobs labeled as “production”

Preferences ("Rank"):

– I prefer to run Todd’s jobs

Custom attributes
I am a machine in the chemistry department

SLIDE 10

Introductory Sessions – Using HTCondor

HTCondor brings them together

Submit Node (schedd) condor_submit Execute Node (startd) Execute Node (startd) Execute Node (startd) Central Manager (collector, negotiator)

SLIDE 11

Site Experiences

Fermilab, INFN Milan, Instituto de Astrofísica

de Canarias (IAC) & RAL presented:

– Their experience deploying & running HTCondor – FNAL started ~20 years ago, RAL last year – Approaches to monitoring & ‘care and feeding’ – Integrating with the European Grid

Issues with Creame & ARC Ces

– Integrating with virtulaisation & clouds

SLIDE 12

Site Experiences

Fermilab, INFN Milan, Instituto de Astrofísica

de Canarias (IAC) & RAL presented:

– Their experience deploying & running HTCondor – FNAL started ~20 years ago, RAL last year – Approaches to monitoring & ‘care and feeding’ – Integrating with the European Grid

Issues with Creame & ARC Ces

– Integrating with virtulaisation & clouds

SLIDE 13

Advanced Topics

See slides. Topics included:

Scripting Condor – APIs etc
Job/Startd Policy and Config
User and Group scheduling
Security
Putting your users in a box :

– Protecting

the machine from the job
the job from the machine
one job (and user) from another

– Containers, CPU Affinity PID Namespaces, mount under scratch, named chroots, Control Groups (cgroups), Docker

SLIDE 14

Panels

See linked notes. Questions discussed include:

– What alternative to queues to organize host groups and job priorities? – Any way to throttle job submission from a misbehaving user submitting a large number of jobs that are failing immediately? – Status of AFS integration – How to control/restrict the WN admission to a white list without introducing inefficiencies, management nightmares...?

SLIDE 15

Links ets

HTCondor Home:

– http://research.cs.wisc.edu/htcondor/

Agenda & notes again

– https://indico.cern.ch/event/272794/ – https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes201 41208

SLIDE 16

European HTCondor Workshop December 2014 summary

Ian Collier (Brial Bockelman, Greg Thain, Todd Tannenbaum) GDB 10th December 2014

Background

– At CERN, December 8th-9th 2014 – Idea at HEPiX in Nebraska – Several years since last European Condor Week – 30-40 people in the room – 5-10 remote – Followed by individual meetings today & tomorrow

https://indico.cern.ch/event/272794/

European HTCondor Meeting 8/9 December

– Introduction to HT Computing & HTCondor – Essentials of setting up and running HTCondor – Site experiences – Monitoring – Advanced management of HTCondor

– HTCondor & European grid – Integrating HTCondor & private clouds – Ask/Stump the experts panel discussions

Introductory Sessions

slides

done by ensuring job slots are utilised as

Introductory Sessions

High performance

Introductory Sessions

High throughput

Introductory Sessions

slides

done by ensuring job slots are utilised as

minimizing constraints on them) and number

Introductory Sessions – Using HTCondor

Jobs state their requirements and preferences, and attributes about themselves:

– I require a Linux/x86 platform – I require 500MB RAM

– I prefer a machine in the chemistry department – I prefer a machine with the fastest floating point

– I am a job of type “analysis”

Introductory Sessions – Using HTCondor

– Require that jobs run only when there is no keyboard activity – Never run jobs labeled as “production”

– I prefer to run Todd’s jobs

Introductory Sessions – Using HTCondor

HTCondor brings them together

Site Experiences

de Canarias (IAC) & RAL presented:

– Their experience deploying & running HTCondor – FNAL started ~20 years ago, RAL last year – Approaches to monitoring & ‘care and feeding’ – Integrating with the European Grid

– Integrating with virtulaisation & clouds

Site Experiences

de Canarias (IAC) & RAL presented:

– Their experience deploying & running HTCondor – FNAL started ~20 years ago, RAL last year – Approaches to monitoring & ‘care and feeding’ – Integrating with the European Grid

– Integrating with virtulaisation & clouds

Advanced Topics

See slides. Topics included:

– Protecting

– Containers, CPU Affinity PID Namespaces, mount under scratch, named chroots, Control Groups (cgroups), Docker

Panels

Links ets

Questions

?