[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Start only one job instance with multiple slots available



2010/11/24 Horvátth Szabolcs <szabolcs@xxxxxxxxxxxxx>
What I'd like to have is simply not starting JobTypeA if one of the slots already runs a JobTypeA.
I was thinking about publishing the type of the currently running jobs as machine attributes and modify the start _expression_ based on that,
but I haven't found a way to be able to read them from the start _expression_.

With the slot approach this is impossible have function 100% correctly, 100% of the time. There's a race condition between when a node is claimed and the job's attributes are pushed in to the slot's attributes and made available to the other slots. You could start multiple instances of the restricted job at the same time on the same machine and they wouldn't know about each other until it was too late to do anything about it (or they'd all just end up killing each other).

You might have better luck with the dynamic machine approach. That might be made to work. Since the entire machine is considered as a whole for the job and, if I understand it correctly, the remainder is made available to other jobs after the current job starts to run so the race condition should be eliminated. But I may be wrong about that.

The traditional approach to solving this problem is to dedicate one slot, say slot1, to these jobs and have a preempt _expression_ that allows non-type jobs be always preempted by type jobs when they show up in the queue. That way you're not getting starved.

- Ian