[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] condor_master startup error



> There are some problems when I start up condor_master on an executing
node
> The MasterLog on this machine :
> 
> --------------------------------------------------------------------
> 6/8 11:22:44 passwd_cache::cache_uid(): getpwnam("condor") failed:
> Inappropriate ioctl for device
> 
> 6/8 11:22:44 passwd_cache::cache_uid(): getpwnam("condor") failed:
> Inappropriate ioctl for device

This one's interesting...the getpwnam() system call is failing, but with
an error that isn't documented in the man page.  Ahhh, Linux.  I'm going
to guess that this is (somehow) related to the problem below...

> 6/8 11:22:44 ******************************************************
> 6/8 11:22:44 ** condor_master (CONDOR_MASTER) STARTING UP
> 6/8 11:22:44 **
/home/lyho/condor_world/condor-6.7.7/sbin/condor_master
> 6/8 11:22:44 ** $CondorVersion: 6.7.7 Apr 27 2005 $
> 6/8 11:22:44 ** $CondorPlatform: I386-LINUX_RH9 $
> 6/8 11:22:44 ** PID = 13543
> 6/8 11:22:44 ******************************************************
> 6/8 11:22:44 Using config file: /home/lyho/condor_world/condor-
> 6.7.7/etc/condor_config
> 6/8 11:22:44 Using local config files: /home/lyho/condor_world/condor-
> 6.7.7/hosts/pragma003/condor_config.local
> 6/8 11:22:44 FileLock::obtain(1) failed - errno 37 (No locks
available)
> 6/8 11:22:44 ERROR "Can't get lock on "/home/lyho/condor_world/condor-
> 6.7.7/hosts/pragma003/InstanceLock"" at line 899 in file master.C
> ---------------------------------------------------------------------
> 
> I can start up other condor_masters on the machines which are the same
> file system with this one .
> 
> Any idea about this error ?

I've seen this before.  The problem is with NFS - In our local pool we
have one machine that refuses to work nicely with NFS locks.  I see by
the location of your local config file that your LOCAL_DIR is pointing
to a shared file system (probably $(RELEASE_DIR)/hosts/$(HOSTNAME)).
This is fine, but your really must change the LOCK variable.  The condor
manual has this to say about LOCK:

------
Condor needs to create lock files to synchronize access to various log
files. Because of problems with network file systems and file locking
over the years, we highly recommend that you put these lock files on a
local partition on each machine. If you do not have your $(LOCAL_DIR) on
a local partition, be sure to change this entry. 

If no value for LOCK is provided, the value of LOG is used. 
------

The solution is to find some location on a local file system (that is
the same on all your machines) and set LOCK to some directory under
there.  That directory ought to be owned by user condor.

Mike Yoder
Principal Member of Technical Staff
Direct : +1.408.321.9000
Fax    : +1.408.321.9030
Mobile : +1.408.497.7597
yoderm@xxxxxxxxxx

Optena Corporation
2860 Zanker Road, Suite 201
San Jose, CA 95134
http://www.optena.com