[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job status of jobs when using USER_JOB_WRAPPER



There isnât an easy way for HTCondor to determine whether a USER_JOB_WRAPPER has started the job. HTCondor treats the wrapper as part of the job for all practical purposes. This includes when the jobâs status is reported as Running and the exit status thatâs reported back to the user.

 - Jaime

> On Apr 13, 2017, at 9:28 AM, Weiming Shi <swmtrc@xxxxxxxxx> wrote:
> 
> Hi Brian,
> 
> Thanks for your detailed explanation.
> 
> To clarify my question a little. By JobStatus, I mean the ST field
> returned by 'condor_q -nobatch' query from a submit node. It seems
> that this JobStatus is not designed to match the STAT field (i.e.,
> PROCESS STATE CODE) that could be collected by the 'ps aux' query on
> the condor execute node where a job is executed.
> 
> Is there a easy way for a user to know the exactly the PROCESS STATE
> CODE of the process that execute a job other than issuing remote ssh
> command?
> 
> Thanks
> 
> On Wed, Apr 12, 2017 at 4:08 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
>> Hi Weiming,
>> 
>> The job's exit status will be the Unix exit status of the process launched by HTCondor.  If a USER_JOB_WRAPPER is used, then it is the exit status of the wrapper.
>> 
>> NOTE - HTCondor launches the USER_JOB_WRAPPER, with the expectation that the job wrapper will call 'exec' on the user's binaries.  So,
>> - If the job wrapper succeeds in calling 'exec', then the job's exit status refers to the command requested by the user.
>> - If the job wrapper fails in calling 'exec', then the job's exit status refers to the exit status of the wrapper.
>> 
>> Since the setup can be awfully confusing for end-users, it's suggested that job wrappers are used sparingly - or, at least, kept relatively simple!
>> 
>> Brian
>> 
>>> On Apr 1, 2017, at 4:40 AM, Weiming Shi <swmtrc@xxxxxxxxx> wrote:
>>> 
>>> Dear Condor users and developers,
>>> 
>>> I have  a question regarding to the job status returned by 'condor_q
>>> -nobatch' when USER_JOB_WRAPPER is used. I am trying to understand if
>>> the returned Job Status of the status of the user job wrapper or the
>>> status of the user application wrapped by the user job wrapper?
>>> 
>>> Thanks

Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project