[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Multiple CCB on a single server



Hi Todd,

I found some error messages like this in the log of CCB (CollectorLog):

04/02/18 17:07:47 condor_write(): Socket closed when trying to write 46 bytes to SHADOW <10.40.249.13:9618?addrs=10.40.249.13-9618&noUDP&sock=13209_af72_11500> on <10.40.249.13:35898>, fd is 14
04/02/18 17:07:47 Buf::write(): condor_write() failed

10.40.249.13 is the ip address of the submit node and where CCB is also running.

I am not sure this will be an issue or not. AsÂthe EXITSTATUS of the jobs are 0,ÂI assume that the jobs finish successfully.ÂThese condor_write failures usually happened at the end of life cycle of the condor jobs.ÂCould you shed some light on what could be the reason that caused these condor_write failures?


Thanks

On Mon, Apr 2, 2018 at 2:03 PM, Weiming Shi <swmtrc@xxxxxxxxx> wrote:
Hi Todd,

Thanks for your information.Â

Our use case is like this:

We have some execute nodes which don't have a public ip address in the private network. We would like them to join the condor pool in the public network. CCB seems to be a good solution for us to expose the execute nodes in the private network to the collector in the public network.ÂWe would like to make the system more robust by running multiple CCBs. Right nowÂwe have already had multiple submit nodes (each has aÂrunningÂschedd) in the public network.ÂSo we decided to colocate each schedd with a CCB. So, in the configuration of the execute nodes, we specify the list of CCBs that are colocated with the schedds in the public network.ÂWe hope that the shadow daemon on submit node could heartbeat with startd on the execute node in the private network through the local CCB. But it seems that the schedd could connect to a remote CCB on other submit nodes to communicate with the startd on the execute node in the private network.

I am not sure if my understanding of CCB and the interaction between the daemons is reasonable or not. Feel free to correct me or make any suggestions on the setup.


Thanks



On Mon, Apr 2, 2018 at 1:09 PM, Todd L Miller <tlmiller@xxxxxxxxxxx> wrote:
Is there a way to make a daemon to select to use a local CCB that is
colocated with the daemon on the same server if available?

    Not that I'm aware of. However, I'm curious about your use case. In the usual situation, where the execute nodes are behind the firewall and the CCB is on the central manager, this situation should never occur, because the central manager only needs to start connections with the schedds.


- ToddM
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxx.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/