[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] numjobstarts vs numshadowstarts
- Date: Mon, 23 Mar 2015 16:40:51 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] numjobstarts vs numshadowstarts
On 3/23/2015 11:59 AM, Suchandra Thapa wrote:
Are there any situations where numjobstarts will be different than
numshadowstarts? Is this something that'll occur frequently?
NumShadowStarts is incremented by the schedd whenever it launches a
condor_shadow (or, in the case of a local universe job, when the schedd
launches a condor_starter on the submit machine).
NumJobStarts is incremented by the condor_starter or condor_gridmanager
right before it spawns the job, but after the execute node has been
successfully claimed and the job's input files have been transferred.
I could imagine several scenarios where they will be different. Some
1. If the job specifies a universe that does not launch a shadow (e.g.
grid universe, local universe), NumJobStarts would exceed NumShadowStarts.
2. If the condor_shadow is successfully started but encounters some
error before spawning the job, such as an error transferring the input
files or spawning the job itself (i.e. execute node is missing required
shared libraries, executable does not exit on the execute node, etc),
then NumShadowStarts could exceed NumJobStarts.
3. If the job is a parallel universe job, NumJobStarts is incremented
for each node (mpi rank) that joins the computation. Thus NumJobStarts
would likely exceed NumShadowStarts.
Hope the above helps,