[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] Starter crashes, Shadow exception



Can you create the directory (e.g., /var/home/condor/execute/dir_2064) when under 'nobody' or 'condor' account?

Alexander Klyubin

Michael Schmuker wrote:
Dear condor-users,

I installed condor-6.6.1 on our computers, one with glibc 2.3 (the master) and the others with glibc 2.2. Each machine has a condor user, with the home dir on the local disk in /var/home/condor . The release directory (glibc-2.2 version) is exported over nfs to the clients, and is mounted in /var/home/condor/release .

Now here's the problem: Jobs which are scheduled to run on the clients are sent back to the master, creating this entry in the Job log:
....
007 (013.000.000) 03/30 20:07:47 Shadow exception!
Can no longer talk to condor_starter on execute machine (192.168.185.14)
0 - Run Bytes Sent By Job
0 - Run Bytes Received By Job
...


On the execute machine, the StarterLog says:

3/30 20:07:47 couldn't create dir /var/home/condor/execute/dir_2064: Permission
denied


(See below for context in StarterLog)

/var/home/condor/execute is rwxrwxrwxt, and dir creation works if I do it by hand. The condor daemons are started as root, as recommended.

The funny thing is, that it _did_ work when I used one of the glibc 2.2 boxes as the master, and had the release directory on a local disk. But I don't see why having the release dir on an nfs mount should prevent dirs from being created on the local disk.

So, did anybody already have a similar problem? Or can you give me a hint what might be going wrong here?

Thanks in advance,

Michael


StarterLog:
[...]
3/30 20:07:47 ******************************************************
3/30 20:07:47 ** condor_starter (CONDOR_STARTER) STARTING UP
3/30 20:07:47 ** $CondorVersion: 6.6.1 Feb 5 2004 $
3/30 20:07:47 ** $CondorPlatform: I386-LINUX-RH72 $
3/30 20:07:47 ** PID = 2064
3/30 20:07:47 ******************************************************
3/30 20:07:47 Using config file: /var/home/condor/condor_config
3/30 20:07:47 Using local config files: /var/home/condor/condor_config.local
3/30 20:07:47 DaemonCore: Command Socket at <192.168.185.14:32941>
3/30 20:07:47 Done setting resource limits
3/30 20:07:47 Starter communicating with condor_shadow <192.168.185.96:36570>
3/30 20:07:47 Submitting machine is "bommel.bcc.local"
3/30 20:07:47 couldn't create dir /var/home/condor/execute/dir_2064: Permission
denied
3/30 20:07:47 Failed to initialize JobInfoCommunicator, aborting
3/30 20:07:47 Unable to start job.
3/30 20:07:47 **** condor_starter (condor_STARTER) EXITING WITH STATUS 1
[...]


Attachment: signature.asc
Description: OpenPGP digital signature