European HTCondor Workshop December 2014 summary Ian Collier - - PowerPoint PPT Presentation

european htcondor workshop
SMART_READER_LITE
LIVE PREVIEW

European HTCondor Workshop December 2014 summary Ian Collier - - PowerPoint PPT Presentation

European HTCondor Workshop December 2014 summary Ian Collier (Brial Bockelman, Greg Thain, Todd Tannenbaum) GDB 10th December 2014 Background European HTCondor Admins Workshop At CERN, December 8 th -9 th 2014 Idea at HEPiX in


slide-1
SLIDE 1

European HTCondor Workshop December 2014 summary

Ian Collier (Brial Bockelman, Greg Thain, Todd Tannenbaum) GDB 10th December 2014

slide-2
SLIDE 2

Background

  • European HTCondor Admins Workshop

– At CERN, December 8th-9th 2014 – Idea at HEPiX in Nebraska – Several years since last European Condor Week – 30-40 people in the room – 5-10 remote – Followed by individual meetings today & tomorrow

  • Agenda & slides:

https://indico.cern.ch/event/272794/

  • Notes:

https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20141208

slide-3
SLIDE 3

European HTCondor Meeting 8/9 December

  • Agenda included:

– Introduction to HT Computing & HTCondor – Essentials of setting up and running HTCondor – Site experiences – Monitoring – Advanced management of HTCondor

  • Condor Scripting, Job Scheduling, Security, Putting your

users in a box

– HTCondor & European grid – Integrating HTCondor & private clouds – Ask/Stump the experts panel discussions

slide-4
SLIDE 4

Introductory Sessions

  • Talks by Greg Thain & Todd Tannenbaum – see

slides

  • HTComputing – emphasis on getting work

done by ensuring job slots are utilised as

  • pposed to the fastest machines possible
slide-5
SLIDE 5

Introductory Sessions

High performance

slide-6
SLIDE 6

Introductory Sessions

High throughput

slide-7
SLIDE 7

Introductory Sessions

  • Talks by Greg Thain & Todd Tannenbaum – see

slides

  • HTComputing – emphasis on getting work

done by ensuring job slots are utilised as

  • pposed to the fastest machines possible
  • Tension maximum number of machines (by

minimizing constraints on them) and number

  • f job run (jobs everywhere)
slide-8
SLIDE 8

Introductory Sessions – Using HTCondor

Jobs state their requirements and preferences, and attributes about themselves:

  • Requirements:

– I require a Linux/x86 platform – I require 500MB RAM

  • Preferences ("Rank"):

– I prefer a machine in the chemistry department – I prefer a machine with the fastest floating point

  • Custom Attributes

– I am a job of type “analysis”

slide-9
SLIDE 9

Introductory Sessions – Using HTCondor

  • Machines specify:
  • Requirements:

– Require that jobs run only when there is no keyboard activity – Never run jobs labeled as “production”

  • Preferences ("Rank"):

– I prefer to run Todd’s jobs

  • Custom attributes
  • I am a machine in the chemistry department
slide-10
SLIDE 10

Introductory Sessions – Using HTCondor

HTCondor brings them together

Submit Node (schedd) condor_submit Execute Node (startd) Execute Node (startd) Execute Node (startd) Central Manager (collector, negotiator)

slide-11
SLIDE 11

Site Experiences

  • Fermilab, INFN Milan, Instituto de Astrofísica

de Canarias (IAC) & RAL presented:

– Their experience deploying & running HTCondor – FNAL started ~20 years ago, RAL last year – Approaches to monitoring & ‘care and feeding’ – Integrating with the European Grid

  • Issues with Creame & ARC Ces

– Integrating with virtulaisation & clouds

slide-12
SLIDE 12

Site Experiences

  • Fermilab, INFN Milan, Instituto de Astrofísica

de Canarias (IAC) & RAL presented:

– Their experience deploying & running HTCondor – FNAL started ~20 years ago, RAL last year – Approaches to monitoring & ‘care and feeding’ – Integrating with the European Grid

  • Issues with Creame & ARC Ces

– Integrating with virtulaisation & clouds

slide-13
SLIDE 13

Advanced Topics

See slides. Topics included:

  • Scripting Condor – APIs etc
  • Job/Startd Policy and Config
  • User and Group scheduling
  • Security
  • Putting your users in a box :

– Protecting

  • the machine from the job
  • the job from the machine
  • one job (and user) from another

– Containers, CPU Affinity PID Namespaces, mount under scratch, named chroots, Control Groups (cgroups), Docker

slide-14
SLIDE 14

Panels

  • See linked notes. Questions discussed include:

– What alternative to queues to organize host groups and job priorities? – Any way to throttle job submission from a misbehaving user submitting a large number of jobs that are failing immediately? – Status of AFS integration – How to control/restrict the WN admission to a white list without introducing inefficiencies, management nightmares...?

slide-15
SLIDE 15

Links ets

  • HTCondor Home:

– http://research.cs.wisc.edu/htcondor/

  • Agenda & notes again

– https://indico.cern.ch/event/272794/ – https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes201 41208

slide-16
SLIDE 16

Questions

?