At HTCondor Week 2019 Presented by Igor Sfiligoi, UCSD for the PRP team
An An opportunis istic ic HT HTCo Condor po pool in insid ide an an in interac activ ive-fr friendl ndly Ku Kubernetes cluster
HTCondor Week, May 2019 1
An An opportunis istic ic HT HTCo Condor po pool insid in - - PowerPoint PPT Presentation
At HTCondor Week 2019 Presented by Igor Sfiligoi, UCSD for the PRP team An An opportunis istic ic HT HTCo Condor po pool insid in ide an an in interac activ ive-fr friendl ndly Ku Kubernetes cluster HTCondor Week, May 2019 1
At HTCondor Week 2019 Presented by Igor Sfiligoi, UCSD for the PRP team
HTCondor Week, May 2019 1
HTCondor Week, May 2019 2
regional networking project
between 10Gbps and 100Gbps
(GDC)
HTCondor Week, May 2019 3
PRP
PRPv2 Nautilus Transoceanic Nodes Guam Asian Pacific RP Transoceanic Nodes Australia Korea Singapore Netherlands 10G 35TB UvA FIONA6 10G 35TB KISTI 10G 35TB U of Guam 10G 35TB U of Queenslandregional networking project
between 10Gbps and 100Gbps
CENIC/PW Link 40G FIONA
UIUC 40G 160TB U Hawaii 40G 160TB NCAR-WY 40G 192TB UWashington
40G FIONA
I2 Chicago
40G FIONA
I2 NYC
40G FIONA
I2 Kansas City
40G FIONA1
UIC HTCondor Week, May 2019 4
all the resources PRP has to offer
HTCondor Week, May 2019 5
HTCondor Week, May 2019 6
HTCondor Week, May 2019 7
No congestion Idle compute resources Time for
use
HTCondor Week, May 2019 8
from higher priority ones
Priorities natively supported in Kubernetes
a high priority pod needs the resources
Preemption out of the box
Perfect for opportunistic use
https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/
HTCondor Week, May 2019 9
PRP wanted to give opportunistic resources to Open Science Grid (OSG) users
But OSG does not have native support for Kubernetes
We thus instantiated an HTCondor pool
HTCondor Week, May 2019 10
Putting HTCondor in a set
HTCondor deals nicely with ephemeral IPs
Persistency needed for the Schedd(s)
HTCondor Week, May 2019 11
Collector and Schedd(s) deployed as high priority service pods
Startds deployed as low priority pods
Pure opportunistic
HTCondor Week, May 2019 12
Everything was working nicely, until we let in real users
OSG users got used to rely on Containers
But HTCondor Startd already running inside a container!
So I need to provide user-specific execute pods
HTCondor Week, May 2019 13
Not without elevated privileges
Everything was working nicely, until we let in real users
OSG users got used to rely on Containers
But HTCondor Startd already running inside a container!
So I need to provide user-specific execute pods
HTCondor Week, May 2019 14
Having idle Startd pods not OK anymore
Keeping pods without users not OK anymore
How do I manage fair share between different pod types?
How am I to know what Container images users want?
HTCondor Week, May 2019 15
Having idle Startd pods not OK anymore
Keeping pods without users not OK anymore
How do I manage fair share between different pod types?
How am I to know what Container images users want?
I know how to implement this.
HTCondor Week, May 2019 16
Having idle Startd pods not OK anymore
Keeping pods without users not OK anymore
How do I manage fair share between different pod types?
How am I to know what Container images users want?
I was told this is coming.
HTCondor Week, May 2019 17
Having idle Startd pods not OK anymore
Keeping pods without users not OK anymore
How do I manage fair share between different pod types?
How am I to know what Container images users want?
In OSG-land, glideinWMS solves this for me.
HTCondor Week, May 2019 18
Having idle Startd pods not OK anymore
Keeping pods without users not OK anymore
How do I manage fair share between different pod types?
How am I to know what Container images users want?
No concrete plans on how to address these yet.
HTCondor Week, May 2019 19
HTCondor Week, May 2019 20
Ideally, I do want to use user-provided, per-job Containers
not an option due to opportunistic nature
But Kubernetes pods are made of several Containers
Pretty sure currently not supported
Pod HTCondor container User job container
HTCondor Week, May 2019 21
It has been pointed out to me that latest CentOS supports unprivileged Singularity Have not tied it out
Cannot currently assume all of my nodes have a recent-enough kernel
HTCondor Week, May 2019 22
HTCondor Week, May 2019 23
HTCondor Week, May 2019 24
pool in the PRP Kubernetes cluster
any otherwise-unused cycles
forces us to have multiple execute pod types
currently needed, hoping for more automation in the future
HTCondor Week, May 2019 25
This work was partially funded by US National Science Foundation (NSF) awards CNS-1456638, CNS-1730158, ACI-1540112, ACI-1541349, OAC-1826967, OAC 1450871, OAC-1659169 and OAC-1841530.
HTCondor Week, May 2019 26