[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Error from starter, jobs put on hold



Hello everyone,
 
every day a couple (around 20) of jobs from our central submitter are put on hold.
condor_q -l says:
 
LastHoldReason = "Error from starter on pc-name.ourlocalnetwork.plymouth.ac.uk: STARTER failed to receive file(s) from <x.x.x.x:19086> Download acknowledgment missing attribute: Result"
LastHoldReasonCode = 11
LastHoldReasonSubCode = 0
 
Is there any way to release these jobs automatically or to avoid them being put on hold?
I think this comes from the fact that we reset the PCs at midnight and some of them are transfering results back right in that moment.
 
It would be good to just restart the job somewhere else or mark it as idle and run it again if the results can not be transfered back. Has anyone the same problem and solved this ? Or can someone think of a way of solving this ?
 
Best regards,
 
Michael Hess