Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] high availability schedd - condor_submit failover

Date: Thu, 04 Jun 2009 08:51:43 -0500
From: Matthew Farrellee <matt@xxxxxxxxxx>
Subject: Re: [Condor-users] high availability schedd - condor_submit failover

Rob de Graaf wrote:
> Hello,
> 
> I'm looking into setting up a highly available job queue. I've followed 
> the instructions in section 3.10.1 of the manual, and set up a spool dir 
> on nfs. I've set SCHEDD_NAME = ha-schedd@ on the machines sharing that 
> spool, and when I disconnect the active schedd, another will spawn and 
> take control of the queue.
> 
> According to the manual, I should be able to submit jobs to the highly 
> available queue using condor_submit -n ha-schedd@ but this only works on 
> the machine which is the active schedd, not on the possible replacement 
> schedds. When I try from one of the backup schedds, I get:
> 
> user@host:~/job$ condor_submit -n ha-schedd@ job.submit
> Submitting job(s)
> ERROR: Failed to connect to queue manager ha-schedd@
> AUTHENTICATE:1003:Failed to authenticate with any method
> AUTHENTICATE:1004:Failed to authenticate using GSI
> GSI:5003:Failed to authenticate.  Globus is reporting error (851968:45). 
>   There is probably a problem with your credentials.  (Did you run 
> grid-proxy-init?)
> AUTHENTICATE:1004:Failed to authenticate using KERBEROS
> AUTHENTICATE:1004:Failed to authenticate using FS
> 
> I'm guessing this is because I'm using host-based authentication, and 
> "remote" submission to the active schedd requires a stronger mechanism? 
> Is there a simple way around this, or do I have to set up kerberos / GSI 
> to be able to have condor_submit failover? I only need job submission to 
> work from those machines who may become the active schedd at some point.
> 
> Thanks!
> 
> Rob

It's probably because all your submit nodes are not setup with the same
security settings. Your backup may be asking for stronger authentication
than your primary.

Best,


matt

References:
- [Condor-users] high availability schedd - condor_submit failover
  - From: Rob de Graaf

Prev by Date: Re: [Condor-users] using $$(Arch) and $$(OpSys) with condor_submit-remote
Next by Date: Re: [Condor-users] using $$(Arch) and $$(OpSys) with condor_submit-remote
Previous by thread: [Condor-users] high availability schedd - condor_submit failover
Next by thread: Re: [Condor-users] Howto make vanilla universive only run serial jobs and don't suspend any jobs on one-machine Condor?
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] high availability schedd - condor_submit failover