[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] ERROR “Assertion ERROR on (m_lock->isLocked()) ” at line 1312 in file read_user_log.cpp



On May 26, 2010, at 11:56 AM, David Brodbeck wrote:

> 
> On May 26, 2010, at 11:42 AM, Nick LeRoy wrote:
> 
>>> One of the users of my cluster is having DAGMAN job failures.  It seems to
>>> consistently happen with certain job clusters, while others run to
>>> completion.  He gets the following error in dagman.log: …Job was evicted.
>>> (0)    Job was not checkpointed.
>>> 
>>> And in dagman out:
>>> … ERROR “Assertion ERROR on (m_lock->isLocked())” at line 1312 in file
>>> read_user_log.cpp
>> 
>> What version of Condor is this?
> 
> 7.2.5.  Sorry, I should have included that in the post.

OK, after talking to the user again it turns out he didn't move the DAG log file, just the individual job log files.  I helped him figure out how to move the DAG log off of the NFS filesystem, and so far it looks like it's working much better.  So I'm thinking this might be a straightforward NFS locking problem.

-- 

David Brodbeck
System Administrator, Linguistics
University of Washington