[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Avoid restarting of jobs



Peter,

What version of Condor are you using? What job universe is being used for the jobs that are restarted?

In modern versions of Condor in vanilla universe, I would expect file permissions problems to cause jobs to go on hold by default rather than running again automatically.

--Dan

On 11/22/11 12:54 AM, Peter Ellevseth wrote:
Hi
We have had som issues with user permissions of shadow exceptions, due to that sometimes cooperate on jobs and have multiple people owning files for a job. Then when the job exited it did not have permissions to write one or moe of the files and it restarted. For our users it would look like the job was running, while in fact it had restarted several times. It would have been a lot easier if the job had stopped after crashing once.

The log file would usually say something like "Shadow exception.....Job resubmitted". This would vary a little as we were a couple errors, mostly related to permissions.

Peter


-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Lukas Slebodnik
Sent: 16. november 2011 09:41
To: Condor-Users Mail List
Subject: Re: [Condor-users] Avoid restarting of jobs

Hi,

What do you mean "something happens to the job". Could you give an example?

What is in your user log file?
log = bigJob.log

Condor will place a log entry into this file when and where the job begins running or moves (migrates)  to another machine ...
If no log entry is specified, Condor does not create a log for this cluster.

Regards,
Lukas

On Wed, Nov 16, 2011 at 09:24:09AM +0100, Peter Ellevseth wrote:
Hi

We have some jobs that print out results continously while running in condor. If something happens to the job, then condor restarts the job. This is very unpractical as the previous results will then be overwritten. Is there a way to force condor not to restart the jobs? It would be preferable to us if the job exits instead of restarting.

Regards Peter

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/