[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] on_exit_remove and multiple runs into a single job cluster


I am trying to submit multiple runs from a single submit description file and want the jobs that do not succeed (RC=129) to return to the queue waiting for another oportunity to run. I am using the on_exit_remove command as follow:

universe                = vanilla
on_exit_remove          = (ExitBySignal == TRUE) || (ExitCode != 129)
run_as_owner            = false
Executable              = teste-bind.bat
Arguments               = ENG_TESTE 5
InitialDir              = caso-$(process)
when_to_transfer_output = ON_EXIT
output                  = teste-bind-$(process).out
error                   = teste-bind-$(process).err
Log                     = ..\teste-bind-job.log
queue 20

Under some conditions the exitcode is set to 129 meaning that the job is to be placed back into the Idle state and tried to be run in the next negotiation cycle.

My problem is that all jobs that run terminating with RC=129 were terminated instead of placed back into the idle state.

I have run this procedure with queue 1 and it have run to the end if the condition of error was not met and performed accordingly when the error condition was raised placing the job back into the idle state.

It seems that the on_exit_remove _expression_ is not valid for multiple runs as stated in my submit description file. Is there any way to make it associated for each single run?

Attached is the log file of the runs.


This message is intended solely for the use of its addressee and may contain privileged or confidential information. If you are not the addressee you should not distribute, copy or file this message. In this case, please notify the sender and destroy its contents immediately.
Esta mensagem é para uso exclusivo de seu destinatário e pode conter informações privilegiadas e confidenciais. Se você não é o destinatário não deve distribuir, copiar ou arquivar a mensagem. Neste caso, por favor, notifique o remetente da mesma e destrua imediatamente a mensagem.

Attachment: teste-bind-job.log
Description: Binary data