[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] State transition for peempted jobs and its implication with Condor-G
- Date: Mon, 18 Feb 2008 11:42:31 -0600
- From: Jaime Frey <jfrey@xxxxxxxxxxx>
- Subject: Re: [Condor-users] State transition for peempted jobs and its implication with Condor-G
On Feb 15, 2008, at 1:56 PM, Barnett P. Chiu wrote:
When a job is temporarily suspended by a higher priority job, what
state does it go into? I got the impression that job state will
become idle and the job will sit in the queue, waiting for a match
again. But will it go through 'hold' state before becoming 'idle'
and if so, will this transition (R-> (H?) -> I) reflect on condor_q?
I guess a possibility that a job being preempted could go into a
'hold' state is when this particular job is being checkpointed
(therefore, file staging is involved => hold state).
The startd may choose to suspend a running job for a number of reasons
(configurable by the admin), one of which may be a job running on a
different slot on that machine. In this case, the suspended job will
be marked as Running in the job queue.
A startd may also decide to evict a job from the execution machine.
One reason for this is there's another job the startd would rather run
in that slot. In this case, the evicted job returns to Idle status,
awaiting another match.
This reminds me of another question: when a job is submitted in
Condor-G, grid manager on the remote gatekeeper will forward this
job to Condor (assuming the underlying batch system is Condor) and
let it schedule the job, but in which universe will the site's
native Condor run the job in?
The default for GRAM is to submit the job in the vanilla universe.
If job ends up being scheduled as a Vanilla job, then how would this
job receive a checkpointing service? Is it the case that the
jobmanager, in the meantime, also somehow watches over the job while
it is being executed on the worker node and hence, even though it is
being run as a Vanilla job, checkpointing could still be achieved?
Of course, thoughts above were based on my impression that Condor-G
does support checkpointing but I am not sure on which level it is
achieved. Or Condor-G job does not support checkpointing at all?
Condor-G and GRAM do not directly support checkpointing of jobs. The
batch scheduler behind GRAM may support it, though.
Is there a possibility that jobmanager on gatekeeper could somehow
"inform" the its native Condor to scheduler jobs in a universe other
When Condor is the batch system behind GRAM, the client can make
additions and modifications to the Condor submit file that GRAM writes
with the 'condor_submit' RSL attribute. Here's an example of how to
use it in a Condor-G submit file:
globus_rsl = (condor_submit=(universe standard)(priority 10))
Thanks and regards,
UW-Madison Condor Team