[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Parallel job with flocking, condor_tail does not work, upload/download to/from a running job, slots in Claimed-Idle state, ...



Dear Greg,

Thank you for the explanation.

All the best,
Alexander A. Prokhorov


On 25 Mar 2019, at 22:40, Greg Thain <gthain@xxxxxxxxxxx> wrote:

On 3/25/19 2:14 PM, Alexander Prokhorov wrote:



Also keep in mind that if you have any idle parallel universe jobs in your queue, the dedicated scheduler is going to try its best to claim resources for each of those jobs, and those resources are going to be claimed/idle until the scheduler is able to claim enough resources for the job to start.

This strategy is fine for me now. Can I be sure that deadlock will not happen if there are multiple parallel jobs are waiting in the queue at the same time?


Each machine that is willing to be scheduled by the parallel universe declares, via the DedicatedScheduler attribute, the one schedd it is willing to allow parallel jobs from.  Because only one schedd can run parallel jobs on any given machine, and because each schedd is single threaded, there is no way to hit deadlock.  There are two phases to scheduling dedicated jobs -- first, the acquisition of resources, and second, assigning those resources to job.  As the job queue can change between these two phases, it is possible that the dedicated scheduler can acquire resources for job A, but before those resources are given to it by the negotiator, a higher priority job, B, arrives, which will start before A.


-greg
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/