[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] on_exit_remove and multiple runs into a single job cluster




I already noticed this after I have sent this message.
Sorry for that. I will check my scripts why they have not worked.


Klaus




Matthew Farrellee <matt@xxxxxxxxxx>
Sent by: condor-users-bounces@xxxxxxxxxxx

31/10/2008 01:57

Please respond to
Condor-Users Mail List <condor-users@xxxxxxxxxxx>

To
Condor-Users Mail List <condor-users@xxxxxxxxxxx>
cc
Subject
Re: [Condor-users] on_exit_remove and multiple runs into a single job        cluster





According to your log your job is exiting with code 0, not 129. So I'd
expect them to be marked as Completed and removed instead of put back
into the Idle state.

You should run a program that you know exits with code 129 to test this.
I just did and it worked fine, though I'm not running Windows. If the
program is exiting with 129 and Condor is not recognizing that on
Windows then that would be a bug.

Best,


matt

kschwarz@xxxxxxxxxxxxxx wrote:
> Hi,
>
> I am trying to submit multiple runs from a single submit description file
> and want the jobs that do not succeed (RC=129) to return to the queue
> waiting for another oportunity to run. I am using the on_exit_remove
> command as follow:
>
> #
> universe                = vanilla
> on_exit_remove          = (ExitBySignal == TRUE) || (ExitCode != 129)
> run_as_owner            = false
> Executable              = teste-bind.bat
> Arguments               = ENG_TESTE 5
> InitialDir              = caso-$(process)
> when_to_transfer_output = ON_EXIT
> output                  = teste-bind-$(process).out
> error                   = teste-bind-$(process).err
> Log                     = ..\teste-bind-job.log
> queue 20
>
> Under some conditions the exitcode is set to 129 meaning that the job is
> to be placed back into the Idle state and tried to be run in the next
> negotiation cycle.
>
> My problem is that all jobs that run terminating with RC=129 were
> terminated instead of placed back into the idle state.
>
> I have run this procedure with queue 1 and it have run to the end if the
> condition of error was not met and performed accordingly when the error
> condition was raised placing the job back into the idle state.
>
> It seems that the on_exit_remove _expression_ is not valid for multiple runs
> as stated in my submit description file. Is there any way to make it
> associated for each single run?
>
> Attached is the log file of the runs.
>  
>
>
> Klaus
> This message is intended solely for the use of its addressee and may
> contain privileged or confidential information. If you are not the
> addressee you should not distribute, copy or file this message. In this
> case, please notify the sender and destroy its contents immediately.
> Esta mensagem é para uso exclusivo de seu destinatário e pode conter
> informações privilegiadas e confidenciais. Se você não é o destinatário
> não deve distribuir, copiar ou arquivar a mensagem. Neste caso, por favor,
> notifique o remetente da mesma e destrua imediatamente a mensagem.
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



This message is intended solely for the use of its addressee and may contain privileged or confidential information. If you are not the addressee you should not distribute, copy or file this message. In this case, please notify the sender and destroy its contents immediately.
Esta mensagem é para uso exclusivo de seu destinatário e pode conter informações privilegiadas e confidenciais. Se você não é o destinatário não deve distribuir, copiar ou arquivar a mensagem. Neste caso, por favor, notifique o remetente da mesma e destrua imediatamente a mensagem.