[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[condor-users] Starter crashes, Shadow exception



Dear condor-users,

I installed condor-6.6.1 on our computers, one with glibc 2.3 (the master) 
and the others with glibc 2.2. Each machine has a condor user, with the 
home dir on the local disk in /var/home/condor . The release directory 
(glibc-2.2 version) is exported over nfs to the clients, and is mounted in 
/var/home/condor/release .

Now here's the problem: Jobs which are scheduled to run on the clients are 
sent back to the master, creating this entry in the Job log:
....
007 (013.000.000) 03/30 20:07:47 Shadow exception!
        Can no longer talk to condor_starter on execute machine 
(192.168.185.14)
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
...

On the execute machine, the StarterLog says: 

3/30 20:07:47 couldn't create dir /var/home/condor/execute/dir_2064: 
Permission
denied

(See below for context in StarterLog)

/var/home/condor/execute is rwxrwxrwxt, and dir creation works if I do it 
by hand. The condor daemons are started as root, as recommended.

The funny thing is, that it _did_ work when I used one of the glibc 2.2 
boxes as the master, and had the release directory on a local disk. But I 
don't see why having the release dir on an nfs mount should prevent dirs 
from being created on the local disk.

So, did anybody already have a similar problem? Or can you give me a hint 
what might be going wrong here?

Thanks in advance,

Michael


StarterLog:
[...]
3/30 20:07:47 ******************************************************
3/30 20:07:47 ** condor_starter (CONDOR_STARTER) STARTING UP
3/30 20:07:47 ** $CondorVersion: 6.6.1 Feb  5 2004 $
3/30 20:07:47 ** $CondorPlatform: I386-LINUX-RH72 $
3/30 20:07:47 ** PID = 2064
3/30 20:07:47 ******************************************************
3/30 20:07:47 Using config file: /var/home/condor/condor_config
3/30 20:07:47 Using local config files: 
/var/home/condor/condor_config.local
3/30 20:07:47 DaemonCore: Command Socket at <192.168.185.14:32941>
3/30 20:07:47 Done setting resource limits
3/30 20:07:47 Starter communicating with condor_shadow 
<192.168.185.96:36570>
3/30 20:07:47 Submitting machine is "bommel.bcc.local"
3/30 20:07:47 couldn't create dir /var/home/condor/execute/dir_2064: 
Permission
denied
3/30 20:07:47 Failed to initialize JobInfoCommunicator, aborting
3/30 20:07:47 Unable to start job.
3/30 20:07:47 **** condor_starter (condor_STARTER) EXITING WITH STATUS 1
[...]

-- 
Michael Schmuker
University of Frankfurt
Chair of Cheminformatics


Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>