[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Help needed in using job_lease_duration



In my case, the job is not killed and keeps on running even crossing the time limit set in the job_lease_duration.  When i vacate jobs using condor_vacate, i could see the jobs are getting rescheduled but not happening when there is a network failure.  I just want to test this scenario whether condor can identify the network failure and identify the disconnected machine and reschedule the jobs to some other machine.  Have you tried this scnenario or whether this scenario will work?

On Thu, Jun 5, 2008 at 7:30 PM, Robert Rati <rrati@xxxxxxxxxx> wrote:
job_lease_duration deals with how long the the lease is for a given job,
and leases are renewed so long as the submitter and execute node are in
contact.  In your situation, the job_lease_duration would only cause the
job to be killed if the execute machine looses contact with the
submitter for longer than job_lease_duration and the job doesn't
complete.  Here's more information on job leases:

http://www.cs.wisc.edu/condor/manual/v7.1/2_15Special_Environment.html#6012

You probably want to use periodic_remove in either your job submit file
(if the limitation is specific to that specific job) or
SYSTEM_PERIODIC_REMOVE as a configuration parameter for schedd in
condor_config if this restraint is meant to be for all jobs.

Job:
http://www.cs.wisc.edu/condor/manual/v7.1/condor_submit.html

Schedd:
http://www.cs.wisc.edu/condor/manual/v7.1/3_3Configuration.html

Rob

Lakshmi Narayanan wrote:
> Hi Rob,
>
> I am trying to run a vanilla universe job.  I have set the
> job_lease_duration to 120 sec.  But the job is running for more than 10
> mins.
>
> Following is my job submit file
>
> --------------------------------------------------------------------------
> #!/bin/bash
>
> rm rnd_job.error.* rnd_job.output.* rnd_job.log
> rm rnd_job.input.*
> php arrange.php
>
> condor_status
> date
> condor_submit <<EOT
> Universe                = vanilla
> Executable              = /usr/local/bin/php
> Arguments               = findexectime.php input_jobs.php
> rnd_job.input.\$(Process).php
> Output                  = rnd_job.output.\$(Process)
> Error                   = rnd_job.error.\$(Process)
> Log                     = rnd_job.log
> Transfer_input_files    =
> findexectime.php,input_jobs.php,rnd_job.input.\$(Process).php
> Should_transfer_files   = YES
> When_to_transfer_output = ON_EXIT
> job_lease_duration      = 120
> Queue $1
> EOT
> condor_wait rnd_job.log
> date
> -----------------------------------------------------------------------
>
> On Wed, Jun 4, 2008 at 10:54 PM, Robert Rati <rrati@xxxxxxxxxx
> <mailto:rrati@xxxxxxxxxx>> wrote:
>
>     What job are you trying to run (namely how long is it expected to run),
>     and what values are you using for job_lease_duration?  Have you tried
>     setting job_lease_duration to 1?  When you say it's not working, do you
>     mean the job is not being rescheduled and it running longer than the
>     value you have set for JobLeaseDuration?
>
>     Rob
>
>     Lakshmi Narayanan wrote:
>      > Hi,
>      >
>      > I am trying to use job_lease_duration for my vanilla job to see
>     whether
>      > the job submitted is rescheduled from that machine to anyother if
>     it is
>      > down.  But it is not working.  Can anyone help me by giving the
>     correct
>      > usage of this and the configurations needed to acheive this.
>      >
>      > Thanks
>      >
>      > --
>      > Lakshmi Narayanan
>      > 98409 89530
>      >
>      >
>      >
>     ------------------------------------------------------------------------
>      >
>      > _______________________________________________
>      > Condor-users mailing list
>      > To unsubscribe, send a message to
>     condor-users-request@xxxxxxxxxxx
>     <mailto:condor-users-request@xxxxxxxxxxx> with a
>      > subject: Unsubscribe
>      > You can also unsubscribe by visiting
>      > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>      >
>      > The archives can be found at:
>      > https://lists.cs.wisc.edu/archive/condor-users/
>     _______________________________________________
>     Condor-users mailing list
>     To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
>     <mailto:condor-users-request@xxxxxxxxxxx> with a
>     subject: Unsubscribe
>     You can also unsubscribe by visiting
>     https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>     The archives can be found at:
>     https://lists.cs.wisc.edu/archive/condor-users/
>
>
>
>
> --
> Lakshmi Narayanan
> 98409 89530
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



--
Lakshmi Narayanan
98409 89530