[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] newbie negotiator error
- Date: Tue, 22 Nov 2011 13:17:32 -0800
- From: Tom Melendez <tom@xxxxxxxxxxxx>
- Subject: Re: [Condor-users] newbie negotiator error
Thanks very much for your response. You are correct, I had
HOSTALLOW_NEGOTIATOR set to CONDOR_HOST, when I opened it up, that
error went way. And yes, the error was in the CollectorLog as well.
I have other questions, but I'll ask them in a separate thread.
On Mon, Nov 21, 2011 at 7:43 AM, Dan Bradley <dan@xxxxxxxxxxxx> wrote:
> You should only need one negotiator daemon in your condor pool, not one on
> each machine.
> It seems likely your security settings do not allow the failing negotiator
> to access the collector (i.e. ALLOW_NEGOTIATOR configuration setting). You
> should be able to see that by looking in CollectorLog.
> On 11/20/11 1:59 AM, Tom Melendez wrote:
>> Hi Folks,
>> I'm a newbie to condor (as in, "today") and I've done the "personal
>> condor" tutorial without issue. I've also reviewed some of the slides
>> on the site. I'm now trying to span my job across multiple machines,
>> but can see from the job log that only one is executing it.
>> On the "other machine", I see this in the error log:
>> 11/19 23:46:27 ---------- Started Negotiation Cycle ----------
>> 11/19 23:46:27 Phase 1: Obtaining ads from collector ...
>> 11/19 23:46:27 Getting all public ads ...
>> 11/19 23:46:27 Sorting 9 ads ...
>> 11/19 23:46:27 Getting startd private ads ...
>> 11/19 23:46:27 condor_read(): recv() returned -1, errno = 104,
>> assuming failure reading 5 bytes from unknown source.
>> 11/19 23:46:27 IO: Failed to read packet header
>> 11/19 23:46:27 Couldn't fetch ads: communication error
>> 11/19 23:46:27 Aborting negotiation cycle
>> Just a little about my setup to give you some context:
>> I have two machines (technically, these are VMs):
>> condor-server: this is the central manager and has submit, manager and
>> execute abilities.
>> condor-exec: this has execute and submit abilities
>> - both running Ubuntu 10.04, I installed condor via the packages and
>> use the start/stop scripts to execute it
>> - both machines are on the same subnet and I have entries in
>> /etc/hosts that point to each other with FQDNs.
>> - I did not set the NO_DNS option, I did set the DEFAULT_DOMAIN_NAME
>> option, but I don't think I need it due to the host settings above
>> - I tried using the NETWORK_INTERFACE option with the IP of the
>> condor-exec VM with no luck
>> - both machines are running all of the same daemons. This is contrary
>> to some of the docs I've seen online (seems like
>> - the allow read and allow write options in the condor_config on both
>> machines is set to *
>> - the condor_host var in condor-exec points to the hostname of the
>> - condor_status can see both machines (the slots are all "unclaimed")
>> - condor_q on condor-server shows the jobs, condor_q on condor-exec does
>> - I have no file I/O. At this point, I'm just using the simple.c
>> example from here:
>> Any ideas suggestions greatly appreciated.
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> The archives can be found at: