[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor q high availability problem



Howdy,

I'm currently in the process of implementing high availability of our
condor queue. I've setup SPOOL to point to a shared space, ad
described in 3.10.1 of the condor manual.
However, this shared space is ONLY writable and readable by the condor
user - this is a current limitation of the way we're creatinig and
sharing a common mount between the machines which will run the shedd.

Currently, things seem to be running ok. I've switched one machine
over to the new configuration and it locks the spool and manages the
queue without problems - other machines will be switched over during
downtime.

However, i'm seeing the following error messages in my Shedlog. I
assume this has to do with the limitations of our mount. I'm wondering
if this is a serious problem which will bite us later on? As I
mentioned, at the moment, things seem to be running fine.

SchedLog:3/7 16:25:58 (fd:11) (pid:2235) Error: Unable to chown
'/opt/sw/Sponge/share/spool/cluster1.proc0.subproc0' from 109 to
42407.1089
SchedLog:3/7 16:25:58 (fd:11) (pid:2235) (1.0) Failed to chown
/opt/sw/Sponge/share/spool/cluster1.proc0.subproc0 from 109 to
42407.1089. Job may run into permissions problems when it starts.

SchedLog.old:3/7 16:09:29 (fd:11) (pid:1688) Error: Unable to chown
'/opt/sw/Sponge/share/spool/cluster569.proc43.subproc0' from 42407 to
109.109
SchedLog.old:3/7 16:09:29 (fd:11) (pid:1688) (569.43) Failed to chown
/opt/sw/Sponge/share/spool/cluster569.proc43.subproc0 from 42407 to
109.109.  User may run into permissions problems when fetching
sandbox.

Any comments or suggestions are most welcome.

Regards,

James