[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Dagman job cannot start second node



On Mon, 7 Jul 2008, Vigilant Lionel wrote:

We are running on Condor 7.0.1.
I want to use dag jobs so i tested with two simple Cpp progs :

un submit file :
Universe = standard
Executable = un
Log = un.log
Output = un.out
Error = un.err
Arguments = 35
Queue

7/7 11:24:37 Bootstrapping...
7/7 11:24:37 Number of pre-completed nodes: 0
7/7 11:24:37 Running in RECOVERY mode...
7/7 11:25:37 FileLock::obtain(1) failed - errno 5 (Input/output error)
7/7 11:25:37 ERROR "Assertion ERROR on (m_is_locked)" at line 1125 in file read_user_log.C

(various details removed above)

Okay, my first question is whether un.log is on a shared filesystem. If so, is it possible to move it to a place that's on a local disk on your submit machine?

You *should* also be able to work around this (somewhat dangerously) by setting ENABLE_USERLOG_LOCKING to false in your configuration, but we just found a bug with that, which is probably in 7.0.1 (it's known to be in 7.0.2). The fix should be in 7.0.4.

Kent Wenger
Condor Team