[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Avoid restarting of jobs



Sorry to dredge up an old thread, but we're seeing this behavior with
our jobs and file permissions.  They do infact go into HOLD state,
which is not what I want.

Is there a method to forcibly remove the job from the queue when a
permissions problem hits?  (specifically we've had issues were the
standard output file gets a perms problem)

On Tue, Nov 22, 2011 at 11:22 AM, Dan Bradley <dan@xxxxxxxxxxxx> wrote:
> Peter,
>
> What version of Condor are you using?  What job universe is being used for
> the jobs that are restarted?
>
> In modern versions of Condor in vanilla universe, I would expect file
> permissions problems to cause jobs to go on hold by default rather than
> running again automatically.
>
> --Dan
>
>
> On 11/22/11 12:54 AM, Peter Ellevseth wrote:
>>
>> Hi
>> We have had som issues with user permissions of shadow exceptions, due to
>> that sometimes cooperate on jobs and have multiple people owning files for a
>> job. Then when the job exited it did not have permissions to write one or
>> moe of the files and it restarted. For our users it would look like the job
>> was running, while in fact it had restarted several times. It would have
>> been a lot easier if the job had stopped after crashing once.
>>
>> The log file would usually say something like "Shadow exception.....Job
>> resubmitted". This would vary a little as we were a couple errors, mostly
>> related to permissions.
>>
>> Peter
>>
>>
>> -----Original Message-----
>> From: condor-users-bounces@xxxxxxxxxxx
>> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Lukas Slebodnik
>> Sent: 16. november 2011 09:41
>> To: Condor-Users Mail List
>> Subject: Re: [Condor-users] Avoid restarting of jobs
>>
>> Hi,
>>
>> What do you mean "something happens to the job". Could you give an
>> example?
>>
>> What is in your user log file?
>> log = bigJob.log
>>
>> Condor will place a log entry into this file when and where the job begins
>> running or moves (migrates)  to another machine ...
>> If no log entry is specified, Condor does not create a log for this
>> cluster.
>>
>> Regards,
>> Lukas
>>
>> On Wed, Nov 16, 2011 at 09:24:09AM +0100, Peter Ellevseth wrote:
>>>
>>> Hi
>>>
>>> We have some jobs that print out results continously while running in
>>> condor. If something happens to the job, then condor restarts the job. This
>>> is very unpractical as the previous results will then be overwritten. Is
>>> there a way to force condor not to restart the jobs? It would be preferable
>>> to us if the job exits instead of restarting.
>>>
>>> Regards Peter
>>>
>>> _______________________________________________
>>> Condor-users mailing list
>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
>>> with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/condor-users/
>>
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>>
>>
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/