Connecting Resources with Science via HTCondor-CE Brian Lin OSG - - PowerPoint PPT Presentation

connecting resources with science via htcondor ce
SMART_READER_LITE
LIVE PREVIEW

Connecting Resources with Science via HTCondor-CE Brian Lin OSG - - PowerPoint PPT Presentation

Connecting Resources with Science via HTCondor-CE Brian Lin OSG All Hands 2017 Connecting Resources with Science | OSG All Hands 2017 | Brian Lin A fundamental problem of scientific computing at scale is matchmaking Connecting Resources with


slide-1
SLIDE 1

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

Connecting Resources with Science via HTCondor-CE

Brian Lin OSG All Hands 2017

slide-2
SLIDE 2

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

A fundamental problem of scientific computing at scale is matchmaking

2

slide-3
SLIDE 3

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

3

slide-4
SLIDE 4

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

Managing Scale

4

slide-5
SLIDE 5

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

Managing Scale

5

slide-6
SLIDE 6

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

The OSG Model

6

OSG Site Gateway User Submit

slide-7
SLIDE 7

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

The OSG Model

7

OSG Site Gateway User Submit

slide-8
SLIDE 8

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

The OSG Model

8

OSG Site Gateway User Submit

slide-9
SLIDE 9

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

The OSG Model

9

OSG Site Gateway User Submit

slide-10
SLIDE 10

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

HTCondor-CE: Site Gateway

  • Site gateway = HTCondor-CE on batch

system submit host

  • OSG entry point for pilot jobs
  • Filter and transform incoming jobs for

compatibility with site policy

  • Based on core HTCondor features

10

Site Gateway HTCondor-CE Site Submit Software

slide-11
SLIDE 11

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

The OSG Model: HTCondor-based

11

OSG Site Gateway User Submit

slide-12
SLIDE 12

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

HTCondor-CE: Central Collector

  • Central storage for site details
  • Takes advantage of core HTCondor

‘advertising’ feature

  • Allows us to transition away extra

supporting software/protocols

12

OSG Site Gateway

S i t e I n f

  • r

m a t i

  • n
slide-13
SLIDE 13

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

HTCondor-CE: Scalable

13

  • Benefit from HTCondor scale

improvements

  • Last round of scale tests by Edgar

in 2015

  • 16k* jobs, 2 ports per-job with a

start-up rate of 70 jobs/min

  • Scales horizontally!

* bottlenecked by the backend cluster

slide-14
SLIDE 14

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

HTCondor-CE: In the Wild

14

Site Cluster Type Site Policy Vanderbilt Slurm Stakeholder jobs run in preferred Slurm partitions; incoming jobs modified to accommodate hyper-threading Purdue HTCondor Avoid subclusters that can’t run OSG jobs PBS Set PBS accounting group based on job submitter Nebraska Slurm GPU jobs should run under a separate Slurm partition HTCondor Jobs need to run inside Docker containers Syracuse HTCondor Jobs run under custom VM infrastructure Langston University HTCondor Separate cluster for specific OSG jobs via chained CEs!

slide-15
SLIDE 15

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

HTCondor-CE: Job Router, HTCondor backend

15

Site Gateway HTCondor-CE Site Submit Software

Distro = RHEL7 VM_NAME = "ITS-SL72-OSG..."

Job Router

Syracuse HTCondor Jobs run under custom VM infrastructure

slide-16
SLIDE 16

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

HTCondor-CE: Job Router, non-HTCondor backend

16

Site Gateway HTCondor-CE Site Submit Software

User = “cms”; CPUs = 3

Job Router Gridmanager blahp

Partition = “high_prio”; CPUs = 2

Vanderbilt Slurm Stakeholder jobs run in preferred Slurm partitions; incoming jobs modified to accommodate hyper-threading

slide-17
SLIDE 17

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

HTCondor-CE: Looking Forward

17

  • We have pilot job tracking and

introspection

  • Missing easy payload job introspection

and history

slide-18
SLIDE 18

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

HTCondor-CE: Summary

Pros

  • Public, uniform job entry point
  • Scalable
  • Site-local, flexible configuration

18

Cons

  • Site-local, flexible configuration
  • Administrative overhead
slide-19
SLIDE 19

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

Not ready to run your own HTCondor-CE?

19

See next talk on OSG-hosted CEs!

slide-20
SLIDE 20

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

Site Admin Sessions

20

Office Hours - Thursday @ 9 AM Site Installation Overview - Thursday @ 11 AM

slide-21
SLIDE 21

Connecting Resources with Science | OSG All Hands 2017 | Brian Lin Connecting Resources with Science | OSG All Hands 2017 | Brian Lin

Questions?

21