[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Error from starter, jobs put on hold



Re the below...

See the condor_submit man page, and check out periodic_release.  Maybe you'd be happy with something like the following in your submit file:
  periodic_release = LastHoldReasonCode == 11

you could even make it conditional based on time as well (i.e. Only around midnight) if you wanted.

---
Todd Tannenbaum
Dept of Computer Sciences
University of Wisconsin-Madison
..Sent from a Palm Treo 680..

-----Original Message-----

From:  "Michael Hess" <michael.hess@xxxxxxxxxxxxxx>
Subj:  [Condor-users] Error from starter, jobs put on hold
Date:  Tue Dec 26, 2006 10:48 am
Size:  1K
To:  <condor-users@xxxxxxxxxxx>

Hello everyone,
 
every day a couple (around 20) of jobs from our central submitter are put on hold.
condor_q -l says:
 
LastHoldReason = "Error from starter on pc-name.ourlocalnetwork.plymouth.ac.uk: STARTER failed to receive file(s) from <x.x.x.x:19086> Download acknowledgment missing attribute: Result"
LastHoldReasonCode = 11
LastHoldReasonSubCode = 0
 
Is there any way to release these jobs automatically or to avoid them being put on hold?
I think this comes from the fact that we reset the PCs at midnight and some of them are transfering results back right in that moment.
 
It would be good to just restart the job somewhere else or mark it as idle and run it again if the results can not be transfered back. Has anyone the same problem and solved this ? Or can someone think of a way of solving this ?
 
Best regards,
 
Michael Hess

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR