[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Locking errors with dagman?

re the below -

If you know that each job will log to its own event log file (i.e. you do not have multiple jobs writing to the same log file), you can safely put the following into your condor_config file:
to solve the problem.

Happily, the next version of Condor due in Oct (v6.9.5) restructures the way user log locking works, and hopefully headaches like the below will become history in most environments.


< Sent from a Palm Treo 680 >
-----Original Message-----
From: John Wheez <john@xxxxxxxxxx>
Date: Friday, Sep 28, 2007 3:21 am
Subject: [Condor-users] Locking errors with dagman?
To: Condor-Users Mail List <condor-users@xxxxxxxxxxx>Reply-To: Condor-Users Mail List <condor-users@xxxxxxxxxxx>

Hi all,
>I'm saving my submit files on a remote file server. The .dag and .sub  
>and log and lock files are all in the same remote directory.
>The submitting OS X machine is connected via samba to the remote  
>linux location. The dagman process starts the first 5 jobs but then
>stops creating more jobs and the error listed blow appears. The first  
>5 jobs do run correctly.
>9/28 00:48:32 From submit: Submitting job(s).
>9/28 00:48:32 From submit: Logging submit event(s).
>9/28 00:48:32 From submit: 1 job(s) submitted to cluster 752.
>9/28 00:48:32 	assigned Condor ID (752.0)
>9/28 00:48:32 Just submitted 5 jobs this cycle...
>9/28 00:48:32 FileLock::obtain(1) failed - errno 45 (Operation not  
>9/28 00:48:32 ERROR "Assertion ERROR on (is_locked)" at line 916 in  
>file user_log.C
>I've disabled opslocks on the samba server as well.
>I read that placing the .lock file on a local drive might help but i  
>do not see a way to do this.
>Thanks for any tips.
>Condor-users mailing list
>To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe
>You can also unsubscribe by visiting
>The archives can be found at: