[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_write(): Socket closed when trying to write



On Wed, Aug 06, 2014 at 03:10:53PM -0500, john alexander sanabria ordonez wrote:
> Hi,
> 
> I have two HTCondor clusters (8.0.6), each cluster has two nodes (a master
> and worker node), and I want to enable the flocking service.
> 
> The first cluster has the following members
> master01.demo01.org -> 192.168.251.2
> wn01.demo01.org -> 192.168.251.3
> 
> And, those are the members of the second cluster
> master02.demo02.org -> 192.168.252.2
> wn02.demo02.org -> 192.168.252.3
> 
> I have added some configuration variables on the condor configuration files
> and I am able to successfully run the command condor_status -pool master0*.
> 
> However, when I try to run a "large" task (which run 350 times the hostname
> command) the flocking mechanism does not work. The task is launched from
> master01.demo01.org.

Did you set the "FLOCK_FROM" macro in the condor_config for each pool?
Doing so then allows the negotiators from other pools to be authorized
to connect to the schedd, and from the looks of your log snippet I think
the schedd on master01 did not authorize the negotiator on master02 and
hung up the connection. (Do you see "PERMISSION DENIED" in the SchedLog?)


Cheers,
-zach