What is Condor? Specialized job and resource management system - PDF document

What is Condor? • Specialized job and resource management system (RMS) for compute intensive jobs 1. User submit their jobs to Condor Condor and the Grid 2. Condor chooses when and where to run them Authors: D. Thain, T. Tannenbaum, and M. Livny based upon a policy 3. Condor monitors their progress 4. Condor informs the user upon completion Presenter: Ibrahim H Suslu Submit CSC 7700 Jobs Data Intensive Distributed Computing Fall 2006 Feedback Condor Provide Why Condor ? • High-throughput computing • A job management mechanism – Provide large amounts of fault-tolerant computational power • Scheduling policy – Effective utilization of resource • Opportunistic computing • Priority schema – Use resource whenever available • Resource monitoring • ClassAds – Resource allocation Language that describe resources and jobs • Resource management • Job checkpoint and migration – Record a checkpoint and resume the application from it. – A checkpoint permit a job to migrate from one machine to other (like other full-featured systems) • Remote system calls – Preserve local execution environment Condor Kernel The Philosophy of Flexibility Matchmaker (Central manager) • Let communities grow naturally – Relationships and obligations will develop according to user necessity ClassAds Plan of • Plan without being picky job jobs Problem Solver Agent Resource – Be prepared to retry or reassign work when failures User (Master-Worker) (schedd) (startd) come claim (DAGMan) • Leave the owner in control – Happy owners � more resources � higher throughput Shadow Sandbox • Land and borrow Details of the Environment – Collaborate with related fields job • Understand previous research Job 1

Typical Condor Pool Flocking Links pools of resources �� Gateway Flocking �� Organizational level �� Transparent �� Direct Flocking �� One individual to another �� Organization �� Planning and Scheduling Matchmaker • Bridge between planning and scheduling • Planning • Agents and resources advertise – Acquisition of resources by users characteristics and requirements as – Concerned with ‘what’ and ‘where’ ClassAds • Scheduling • Pairs satisfying each other’s constraints – Management of a resource by its owner are created – Concerned with ‘who’ and ‘when’ • Both parties are informed • Claiming- independent authorization and authentication Condor Architecture overview II Condor Architecture overview I �� #��$�� !�" ��!�" ��!�" ��!�" �� 2

ClassAds Problem Solvers • Resource allocation Language – Attribute name-value pairs • High level structure built on top of the Condor agent – No specific schema • Manage large number of jobs • Requirements – Concern with the application-specific details of ordering and task – Constraints, for a match these should evaluate to true selection • Rank • Relies on a Condor agent in two ways – Desirability of a match – Uses agent as service for reliably executing jobs – Making the problem solver itself reliable Job ClassAd Machine ClassAd [ [ • Two are provided with Condor MyType = ‘‘Job’’ MyType=“Machine” – Master-worker (MW) TargetType = ‘‘Machine’’ TargetType=“Job” Requirements = Machine=“tnt.isi.edu” • System for solving a problem of indeterminate size on a large and ((other.Arch==‘‘INTEL’’&& Requirements= unreliable workforce other.OpSys==‘‘LINUX’’) (Load<3000) && other.Disk > my.DiskUsage) Rank=dept==self.dept – Directed acyclic graph manager (DAGMAN) Rank = (Memory � 10000) + KFlops Arch=“Intel” • Service for executing multiple jobs with dependencies in a Cmd = ‘‘/home-exe’’ OpSys=“Linux” Department = ‘‘CompSci’’ Disk=600000 declarative form Owner = ‘‘tannenba’’ ] DiskUsage = 6000 ] Split Execution Condor Universes • Facilitates successful remote execution of • Create a specific job environment jobs • Defined by a matched sandbox and shadow • Shadow represents the user to the system • Different Universes provide different functionality – Has information that specifies the job at run for your job: time – Standard � Support for transparent process • Executables, arguments, input files..... checkpoint and restart – Vanilla � Run any Serial Job • Sandbox is responsible for giving the job a � Provide a complete Java environment – Java safe place to play – Globus � Manage your Grid jobs – Creates an environment for job execution • A Matched Sandbox and Shadow form the universe Standard Universe Vanilla Universe • Requires re-linking your program with special • You can run any program library provided by condor – C/C++/Perl/Python/Fortran/Java/Lisp… • Allows checkpointing and remote System Calls – Checkpointing – No checkpointing: if your job is interrupted or • Condor’s Process Checkpointing mechanism saves all the the machine crashes, Condor has to restart it state of a process into a checkpoint file from the beginning. • Memory, CPU, I/O, job details, etc. • The process can then be restarted from right where it left off – No remote system calls – Remote System Calls • Input and output files • Provides an I/O service over secure RPC channel • Provides remote access to the user’s home storage device – Multi-process jobs are not allowed – Interprocess communication is not allowed 3

Java Universe Globus Universe • Works better for Java programs • Advantages of using Condor-G to manage your Grid jobs • Checks for valid Java environment – Full-featured queuing service • Distinguishes Java environment – Credential Management exceptions from program exceptions – Fault-tolerance • No checkpointing • Disadvantages – No matchmaking or dynamic scheduling of jobs • Remote I/O – No job checkpoint or migration – No remote system calls “Gliding in”: allows to reach of Condor-G and the features of Condor Condor-G • Computation management agent for Grid Computing – Merges Globus and Condor technologies Application, problem solver… Job submission Condor-G Resource discovery, Globus Toolkit authentication…. Job execution Condor Processing, storage….. Access to Data in Condor Which Universe? • Use shared filesystem if available • Standard: • No shared filesystem? – Good for mixed Condor pools, flocked pools, and the – Condor can transfer files Grid at large. • Can automatically send back changed files • Vanilla: • Atomic transfer of multiple files – Good for a Condor pool of identical machines • Can be encrypted over the wire – Remote I/O Socket • Java: – Standard Universe can use remote system – Good for Java application calls • Globus: – Good for Globus jobs 4

What is Condor? Specialized job and resource management system - PDF document

What is Condor? Specialized job and resource management system (RMS) for compute intensive jobs 1. User submit their jobs to Condor Condor and the Grid 2. Condor chooses when and where to run them Authors: D. Thain, T. Tannenbaum, and M.

Getting popular Figure 1 : Condor downloads by platform Figure 2 : Known # of Condor hosts

Condor Gold plc www.condorgold.com 1 CONDOR GOLD PLC DISCLAIMER This written presentation (the

CONDOR GOLD Presentation PDAC 3 - 6 March 2019 CONDOR GOLD PLC Disclaimer This

Condor Gold plc www.condorgold.com 4 th to 5 th March 2013 1 CONDOR GOLD PLC DISCLAIMER This

CONDOR GOLD Presentation 121 Mining Investment 20 & 21 May 2019 1 CONDOR GOLD PLC

Condor Resources plc Ocean Equities Mining for Growth Conference 7 th -8 th September 2011

Condor Resources plc Master Investor Conference 16 th April 2011 www.condorresourcesplc.com 1

Condor Resources plc Proactive Investors Presentation 10 th February 2011

OSG All-Hands Meeting UNC - Chapel Hill March 4, 2008 Condor on RCAC Clusters Campus Condor

A Decade of Condor at Fermilab Steven Timm timm@fnal.gov Fermilab Grid & Cloud Computing

USA Site Report: DOSAR C.M. Jenkins 9/23/2009 DOSAR Site Report - C M Jenkins 1 Condor Cluster

Interactive NanoAOD analysis Nick Amin Aug 19, 2019 Introduction Condor jobs have a lot of

WG-Condor Preparation of book 20: Smart interface between indoor luminaires and

For personal use only World class projects in a world-class region Condor Blanco Mines

Whats Next for HTCondor-CE? Brian Bockelman OSG AHM 2015 HTCondor-CE in a slide Submit Host

Makeflow Work Local Condor Torque Queue W W Makefile FutureGrid Private Torque W

On the Discussion Rate Region for the PIN Model Qiaoqiao Zhou (zq115@ie.cuhk.edu.hk) Institute

LES OF PASSIVE SCALAR DISPERSION FROM AN AREA SOURCE Bharathi Boppana, Zheng-Tong Xie and Ian P.

Your questions please? (if you dont see the control panel, click on the orange arrow icon to

Programming in Python Lecture 2: Sequences Michael Schroeder Sven Schreiber

PRECISION PREDICTIONS AT N3LO FOR THE HIGGS BOSON RAPIDITY DISTRIBUTION with Falko Dulat and

Spectral methods to compute a solution to some H interpolation problems A. E. Frazho The talk

Natlang Code book Based on a true story by Ramtin M. Seraj Spring 2015 1 Natural Language

1. Introduction and Reciprocity Non Commutative (NC) spaces are defined by

What is Condor? Specialized job and resource management system - PDF document

What is Condor? Specialized job and resource management system (RMS) for compute intensive jobs 1. User submit their jobs to Condor Condor and the Grid 2. Condor chooses when and where to run them Authors: D. Thain, T. Tannenbaum, and M.

Getting popular Figure 1 : Condor downloads by platform Figure 2 : Known # of Condor hosts

Condor Gold plc www.condorgold.com 1 CONDOR GOLD PLC DISCLAIMER This written presentation (the

CONDOR GOLD Presentation PDAC 3 - 6 March 2019 CONDOR GOLD PLC Disclaimer This

Condor Gold plc www.condorgold.com 4 th to 5 th March 2013 1 CONDOR GOLD PLC DISCLAIMER This

CONDOR GOLD Presentation 121 Mining Investment 20 &amp; 21 May 2019 1 CONDOR GOLD PLC

Condor Resources plc Ocean Equities Mining for Growth Conference 7 th -8 th September 2011

Condor Resources plc Master Investor Conference 16 th April 2011 www.condorresourcesplc.com 1

Condor Resources plc Proactive Investors Presentation 10 th February 2011

OSG All-Hands Meeting UNC - Chapel Hill March 4, 2008 Condor on RCAC Clusters Campus Condor

A Decade of Condor at Fermilab Steven Timm timm@fnal.gov Fermilab Grid &amp; Cloud Computing

USA Site Report: DOSAR C.M. Jenkins 9/23/2009 DOSAR Site Report - C M Jenkins 1 Condor Cluster

Interactive NanoAOD analysis Nick Amin Aug 19, 2019 Introduction Condor jobs have a lot of

WG-Condor Preparation of book 20: Smart interface between indoor luminaires and

For personal use only World class projects in a world-class region Condor Blanco Mines

Whats Next for HTCondor-CE? Brian Bockelman OSG AHM 2015 HTCondor-CE in a slide Submit Host

Makeflow Work Local Condor Torque Queue W W Makefile FutureGrid Private Torque W

On the Discussion Rate Region for the PIN Model Qiaoqiao Zhou (zq115@ie.cuhk.edu.hk) Institute

LES OF PASSIVE SCALAR DISPERSION FROM AN AREA SOURCE Bharathi Boppana, Zheng-Tong Xie and Ian P.

Your questions please? (if you dont see the control panel, click on the orange arrow icon to

Programming in Python Lecture 2: Sequences Michael Schroeder Sven Schreiber

PRECISION PREDICTIONS AT N3LO FOR THE HIGGS BOSON RAPIDITY DISTRIBUTION with Falko Dulat and

Spectral methods to compute a solution to some H interpolation problems A. E. Frazho The talk

Natlang Code book Based on a true story by Ramtin M. Seraj Spring 2015 1 Natural Language

1. Introduction and Reciprocity Non Commutative (NC) spaces are defined by

CONDOR GOLD Presentation 121 Mining Investment 20 & 21 May 2019 1 CONDOR GOLD PLC

A Decade of Condor at Fermilab Steven Timm timm@fnal.gov Fermilab Grid & Cloud Computing