Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Dagman job cannot start second node
- Date: Thu, 10 Jul 2008 12:06:16 -0500 (CDT)
- From: "R. Kent Wenger" <wenger@xxxxxxxxxxx>
- Subject: Re: [Condor-users] Dagman job cannot start second node
On Mon, 7 Jul 2008, Vigilant Lionel wrote:
We are running on Condor 7.0.1.
I want to use dag jobs so i tested with two simple Cpp progs :
un submit file :
Universe = standard
Executable = un
Log = un.log
Output = un.out
Error = un.err
Arguments = 35
Queue
7/7 11:24:37 Bootstrapping...
7/7 11:24:37 Number of pre-completed nodes: 0
7/7 11:24:37 Running in RECOVERY mode...
7/7 11:25:37 FileLock::obtain(1) failed - errno 5 (Input/output error)
7/7 11:25:37 ERROR "Assertion ERROR on (m_is_locked)" at line 1125 in file read_user_log.C
(various details removed above)
Okay, my first question is whether un.log is on a shared filesystem. If
so, is it possible to move it to a place that's on a local disk on your
submit machine?
You *should* also be able to work around this (somewhat dangerously) by
setting ENABLE_USERLOG_LOCKING to false in your configuration, but we just
found a bug with that, which is probably in 7.0.1 (it's known to be in
7.0.2). The fix should be in 7.0.4.
Kent Wenger
Condor Team