[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor q high availability problem



Hi James--if this works out for you, there are other people who
would be interested to know what type of shared disk space you
were using for the high-availability schedd.  If I remember correctly,
each of the cluster* directories in the schedd spool area is
supposed to be owned by the individual user who is
running the job, at least if you are in a unix uid/gid model where
every condor user is running as an independent uid.

Steve Timm


On Fri, 7 Mar 2008, Wojtek Goscinski wrote:

Howdy,

I'm currently in the process of implementing high availability of our
condor queue. I've setup SPOOL to point to a shared space, ad
described in 3.10.1 of the condor manual.
However, this shared space is ONLY writable and readable by the condor
user - this is a current limitation of the way we're creatinig and
sharing a common mount between the machines which will run the shedd.

Currently, things seem to be running ok. I've switched one machine
over to the new configuration and it locks the spool and manages the
queue without problems - other machines will be switched over during
downtime.

However, i'm seeing the following error messages in my Shedlog. I
assume this has to do with the limitations of our mount. I'm wondering
if this is a serious problem which will bite us later on? As I
mentioned, at the moment, things seem to be running fine.

SchedLog:3/7 16:25:58 (fd:11) (pid:2235) Error: Unable to chown
'/opt/sw/Sponge/share/spool/cluster1.proc0.subproc0' from 109 to
42407.1089
SchedLog:3/7 16:25:58 (fd:11) (pid:2235) (1.0) Failed to chown
/opt/sw/Sponge/share/spool/cluster1.proc0.subproc0 from 109 to
42407.1089. Job may run into permissions problems when it starts.

SchedLog.old:3/7 16:09:29 (fd:11) (pid:1688) Error: Unable to chown
'/opt/sw/Sponge/share/spool/cluster569.proc43.subproc0' from 42407 to
109.109
SchedLog.old:3/7 16:09:29 (fd:11) (pid:1688) (569.43) Failed to chown
/opt/sw/Sponge/share/spool/cluster569.proc43.subproc0 from 42407 to
109.109.  User may run into permissions problems when fetching
sandbox.

Any comments or suggestions are most welcome.

Regards,

James
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.