Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor q high availability problem

Date: Wed, 9 Apr 2008 15:20:52 +1000
From: "Wojtek Goscinski" <Wojtek.Goscinski@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [Condor-users] condor q high availability problem

Hi Steve,

Sorry, this is so delayed. This was just a regular NFS exported
directory, which wasn't exported in the required fashion. The
permission problems concerning the directories in spool which couldn't
be a chowned didn't turn out to be the killer. The basic result was
that it failed when trying to manage the lock files (SCHEDD.lock) -
root one machine A couldn't write to a file owned by root on machine
B.

The fix is obvioulsly a properly exported NFS mount.

Regards,

James

On Sat, Mar 8, 2008 at 12:41 AM, Steven Timm <timm@xxxxxxxx> wrote:
> Hi James--if this works out for you, there are other people who
>  would be interested to know what type of shared disk space you
>  were using for the high-availability schedd.  If I remember correctly,
>  each of the cluster* directories in the schedd spool area is
>  supposed to be owned by the individual user who is
>  running the job, at least if you are in a unix uid/gid model where
>  every condor user is running as an independent uid.
>
>  Steve Timm
>
>
>
>
>  On Fri, 7 Mar 2008, Wojtek Goscinski wrote:
>
>  > Howdy,
>  >
>  > I'm currently in the process of implementing high availability of our
>  > condor queue. I've setup SPOOL to point to a shared space, ad
>  > described in 3.10.1 of the condor manual.
>  > However, this shared space is ONLY writable and readable by the condor
>  > user - this is a current limitation of the way we're creatinig and
>  > sharing a common mount between the machines which will run the shedd.
>  >
>  > Currently, things seem to be running ok. I've switched one machine
>  > over to the new configuration and it locks the spool and manages the
>  > queue without problems - other machines will be switched over during
>  > downtime.
>  >
>  > However, i'm seeing the following error messages in my Shedlog. I
>  > assume this has to do with the limitations of our mount. I'm wondering
>  > if this is a serious problem which will bite us later on? As I
>  > mentioned, at the moment, things seem to be running fine.
>  >
>  > SchedLog:3/7 16:25:58 (fd:11) (pid:2235) Error: Unable to chown
>  > '/opt/sw/Sponge/share/spool/cluster1.proc0.subproc0' from 109 to
>  > 42407.1089
>  > SchedLog:3/7 16:25:58 (fd:11) (pid:2235) (1.0) Failed to chown
>  > /opt/sw/Sponge/share/spool/cluster1.proc0.subproc0 from 109 to
>  > 42407.1089. Job may run into permissions problems when it starts.
>  >
>  > SchedLog.old:3/7 16:09:29 (fd:11) (pid:1688) Error: Unable to chown
>  > '/opt/sw/Sponge/share/spool/cluster569.proc43.subproc0' from 42407 to
>  > 109.109
>  > SchedLog.old:3/7 16:09:29 (fd:11) (pid:1688) (569.43) Failed to chown
>  > /opt/sw/Sponge/share/spool/cluster569.proc43.subproc0 from 42407 to
>  > 109.109.  User may run into permissions problems when fetching
>  > sandbox.
>  >
>  > Any comments or suggestions are most welcome.
>  >
>  > Regards,
>  >
>  > James
>  > _______________________________________________
>  > Condor-users mailing list
>  > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>  > subject: Unsubscribe
>  > You can also unsubscribe by visiting
>  > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>  >
>  > The archives can be found at:
>  > https://lists.cs.wisc.edu/archive/condor-users/
>  >
>
>  --
>  ------------------------------------------------------------------
>  Steven C. Timm, Ph.D  (630) 840-8525
>  timm@xxxxxxxx  http://home.fnal.gov/~timm/
>  Fermilab Computing Division, Scientific Computing Facilities,
>  Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.
>  _______________________________________________
>  Condor-users mailing list
>  To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>  subject: Unsubscribe
>  You can also unsubscribe by visiting
>  https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>  The archives can be found at:
>  https://lists.cs.wisc.edu/archive/condor-users/
>

Prev by Date: Re: [Condor-users] How to submit a job via SOAP API
Next by Date: Re: [Condor-users] status of stork tools
Previous by thread: Re: [Condor-users] Quill not saving historical job data (Condor 7.0.1)?
Next by thread: [Condor-users] Why does condor touch all logs/*Log files?
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] condor q high availability problem