[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Collector load-balancing



You should be aware that the technique for setting the COLLECTOR_LOG shown in that page until about 10 minutes ago was deprecated about 5 years ago, and stopped working late in the 8.5 series.
 
I have updated the wiki to show the correct method, although you no longer have to set the COLLECTOR_LOG at all.
In 8.6 HTCondor will choose a default based on the name you give to your daemon.

-tj

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Vladimir Brik
Sent: Tuesday, March 21, 2017 10:09 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Collector load-balancing

Hello.

I'd like to ask for advice/feedback on scaling the collector of our glidein pool.

Some background: the number of worker nodes in the pool varies widely, but probably never exceeds 12k slots. Slot lifetime can be pretty short, so there is a lot of turnover. Many worker nodes are behind NATs and firewalls, so CCB is used. A pool password is used for authentication. 
Network latency is probably an issue. Lastly, our central manager is a VM with 8 virtual CPUs, and it uses shared_port.

Periodically, we've been observing spikes in numbers of log entries about timeouts, disconnects, ccb, shared_port failures and job restarts, so we've concluded that we need to run multiple collectors.

The plan is to follow
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToConfigCollectors
to create 5 additional collectors with custom socket names (since we use
shared_port) and to configure glidein worker nodes to pick COLLECTOR_HOST and CCB_ADDRESS randomly among those 5 additional collectors.

Does anybody see potential issues with this? Or, maybe there is a better approach? Is there anything to be careful of?

Would it be advantageous if glideins used the same collector process for both COLLECTOR_HOST and CCB_ADDRESS? Or, maybe it would be advantageous to use some collectors exclusively for CCB_ADDRESS and other collectors exclusively for COLLECTOR_HOST?

I heard a few mentions of people running CCB on separate servers, but I am not sure why. Are there advantages to this if the central manager has idle cores and isn't running out of ports?


Thanks very much,

Vlad
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/