[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Avoid restarting of jobs



Hi

We are using condor version  7.4.4 and Vanilla universe.

We solved this by not using file transfer. The problem would arrise when condor wrote the files back and multiple users were owners. We are using ACL's to solve our multiple users situation, but this didn't work well with Condor. When condor writes files it uses 744 permissions so only the owner got permissions and ACL's were in practice ignored.

We would still like to find out how to avvoid condor restarting the files and instead stop the job and present an error message.

Peter

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Michael Di Domenico
Sent: 7. mars 2012 15:20
To: Condor-Users Mail List
Subject: Re: [Condor-users] Avoid restarting of jobs

Sorry to dredge up an old thread, but we're seeing this behavior with our jobs and file permissions.  They do infact go into HOLD state, which is not what I want.

Is there a method to forcibly remove the job from the queue when a permissions problem hits?  (specifically we've had issues were the standard output file gets a perms problem)

On Tue, Nov 22, 2011 at 11:22 AM, Dan Bradley <dan@xxxxxxxxxxxx> wrote:
> Peter,
>
> What version of Condor are you using?  What job universe is being used 
> for the jobs that are restarted?
>
> In modern versions of Condor in vanilla universe, I would expect file 
> permissions problems to cause jobs to go on hold by default rather 
> than running again automatically.
>
> --Dan
>
>
> On 11/22/11 12:54 AM, Peter Ellevseth wrote:
>>
>> Hi
>> We have had som issues with user permissions of shadow exceptions, 
>> due to that sometimes cooperate on jobs and have multiple people 
>> owning files for a job. Then when the job exited it did not have 
>> permissions to write one or moe of the files and it restarted. For 
>> our users it would look like the job was running, while in fact it 
>> had restarted several times. It would have been a lot easier if the job had stopped after crashing once.
>>
>> The log file would usually say something like "Shadow 
>> exception.....Job resubmitted". This would vary a little as we were a 
>> couple errors, mostly related to permissions.
>>
>> Peter
>>
>>
>> -----Original Message-----
>> From: condor-users-bounces@xxxxxxxxxxx 
>> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Lukas 
>> Slebodnik
>> Sent: 16. november 2011 09:41
>> To: Condor-Users Mail List
>> Subject: Re: [Condor-users] Avoid restarting of jobs
>>
>> Hi,
>>
>> What do you mean "something happens to the job". Could you give an 
>> example?
>>
>> What is in your user log file?
>> log = bigJob.log
>>
>> Condor will place a log entry into this file when and where the job 
>> begins running or moves (migrates)  to another machine ...
>> If no log entry is specified, Condor does not create a log for this 
>> cluster.
>>
>> Regards,
>> Lukas
>>
>> On Wed, Nov 16, 2011 at 09:24:09AM +0100, Peter Ellevseth wrote:
>>>
>>> Hi
>>>
>>> We have some jobs that print out results continously while running 
>>> in condor. If something happens to the job, then condor restarts the 
>>> job. This is very unpractical as the previous results will then be 
>>> overwritten. Is there a way to force condor not to restart the jobs? 
>>> It would be preferable to us if the job exits instead of restarting.
>>>
>>> Regards Peter
>>>
>>> _______________________________________________
>>> Condor-users mailing list
>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx 
>>> with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting 
>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/condor-users/
>>
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx 
>> with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>>
>>
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx 
>> with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx 
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/