[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] newbie negotiator error



Hi Dan,

Thanks very much for your response.  You are correct, I had
HOSTALLOW_NEGOTIATOR set to CONDOR_HOST, when I opened it up, that
error went way.  And yes, the error was in the CollectorLog as well.

I have other questions, but I'll ask them in a separate thread.

Thanks!

Tom


On Mon, Nov 21, 2011 at 7:43 AM, Dan Bradley <dan@xxxxxxxxxxxx> wrote:
> Tom,
>
> You should only need one negotiator daemon in your condor pool, not one on
> each machine.
>
> It seems likely your security settings do not allow the failing negotiator
> to access the collector (i.e. ALLOW_NEGOTIATOR configuration setting).  You
> should be able to see that by looking in CollectorLog.
>
> --Dan
>
> On 11/20/11 1:59 AM, Tom Melendez wrote:
>>
>> Hi Folks,
>>
>> I'm a newbie to condor (as in, "today") and I've done the "personal
>> condor" tutorial without issue.  I've also reviewed some of the slides
>> on the site.  I'm now trying to span my job across multiple machines,
>> but can see from the job log that only one is executing it.
>>
>> On the "other machine", I see this in the error log:
>> 11/19 23:46:27 ---------- Started Negotiation Cycle ----------
>> 11/19 23:46:27 Phase 1:  Obtaining ads from collector ...
>> 11/19 23:46:27   Getting all public ads ...
>> 11/19 23:46:27   Sorting 9 ads ...
>> 11/19 23:46:27   Getting startd private ads ...
>> 11/19 23:46:27 condor_read(): recv() returned -1, errno = 104,
>> assuming failure reading 5 bytes from unknown source.
>> 11/19 23:46:27 IO: Failed to read packet header
>> 11/19 23:46:27 Couldn't fetch ads: communication error
>> 11/19 23:46:27 Aborting negotiation cycle
>>
>> Just a little about my setup to give you some context:
>>
>> I have two machines (technically, these are VMs):
>> condor-server: this is the central manager and has submit, manager and
>> execute abilities.
>> condor-exec: this has execute and submit abilities
>> - both running Ubuntu 10.04, I installed condor via the packages and
>> use the start/stop scripts to execute it
>> - both machines are on the same subnet and I have entries in
>> /etc/hosts that point to each other with FQDNs.
>> - I did not set the NO_DNS option, I did set the DEFAULT_DOMAIN_NAME
>> option, but I don't think I need it due to the host settings above
>> - I tried using the NETWORK_INTERFACE option with the IP of the
>> condor-exec VM with no luck
>> - both machines are running all of the same daemons.  This is contrary
>> to some of the docs I've seen online (seems like
>> - the allow read and allow write options in the condor_config on both
>> machines is set to *
>> - the condor_host var in condor-exec points to the hostname of the
>> condor-server
>> - condor_status can see both machines (the slots are all "unclaimed")
>> - condor_q on condor-server shows the jobs, condor_q on condor-exec does
>> not
>> - I have no file I/O.  At this point, I'm just using the simple.c
>> example from here:
>>
>> http://research.cs.wisc.edu/condor/tutorials/cw2005-condor/submit_first.html
>>
>> Any ideas suggestions greatly appreciated.
>>
>> Thanks,
>>
>> Tom
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>