[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Behavior of Condor jobs held for file transfer errors



On 6/20/2012 3:00 PM, Steven C Timm wrote:
What about held grid universe jobs which first require a
Condor_rm
And then a
Condor_rm -forcex
Any way to do that with a SYSTEM_PERIODIC_REMOVE?

Steve Timm


Do you always want to simply remove held grid jobs?

If so, you can put the following into the submit file of a grid universe job:

   +nonessential = true

This tells Condor to simply abort (remove) any problematic job instead of putting the job on hold. Condor will try to remove it nicely, but will not let it stick around in the queue even if it fails to confirm what happened on the execute node. So placing the nonessential attribute in the job ad is equal to doing condor_rm followed by condor_rm -forcex anytime the job would have otherwise gone on hold.

regards,
Todd





-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
Sent: Wednesday, June 20, 2012 2:56 PM
To: condor-users@xxxxxxxxxxx
Subject: Re: [Condor-users] Behavior of Condor jobs held for file transfer errors


Removing jobs that are on hold can be achieved by using the periodic_remove expression in the job submit file or by the SYSTEM_PERIODIC_REMOVE expression in the submit machine condor configuration.

Example:

SYSTEM_PERIODIC_REMOVE = HoldReasonCode == 12 || HoldReasonCode == 14

The HoldReasonCodes are defined in the manual:

http://research.cs.wisc.edu/condor/manual/v7.6/10_Appendix_A.html#82773

--Dan

On 6/20/12 12:25 PM, Myung Cho wrote:
Hi , I did a quick search for this topic but haven't found any
relevant posts. Is there a way to change/specify the default behavior
in Condor for jobs with file transfer errors? Our jobs with any error
in file transfer, for example a missing file specified in
transfer_output_files, seem to cause the job to be in held state for
ever. Is there a way for the job to just complete with error? I rather
see it finish with an error reported rather than have it just hang
around in hold state.

Thanks.
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
Condor Project Technical Lead          1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685