[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Parallel job with flocking, condor_tail does not work, upload/download to/from a running job, slots in Claimed-Idle state, ...



On 3/25/19 2:14 PM, Alexander Prokhorov wrote:



Also keep in mind that if you have any idle parallel universe jobs in your queue, the dedicated scheduler is going to try its best to claim resources for each of those jobs, and those resources are going to be claimed/idle until the scheduler is able to claim enough resources for the job to start.

This strategy is fine for me now. Can I be sure that deadlock will not happen if there are multiple parallel jobs are waiting in the queue at the same time?


Each machine that is willing to be scheduled by the parallel universe declares, via the DedicatedScheduler attribute, the one schedd it is willing to allow parallel jobs from. Because only one schedd can run parallel jobs on any given machine, and because each schedd is single threaded, there is no way to hit deadlock. There are two phases to scheduling dedicated jobs -- first, the acquisition of resources, and second, assigning those resources to job. As the job queue can change between these two phases, it is possible that the dedicated scheduler can acquire resources for job A, but before those resources are given to it by the negotiator, a higher priority job, B, arrives, which will start before A.


-greg