[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] CCB problems and hight daemon load



Hello all,

we have about 2000 VM workernodes ( ~ 8000 cores ) which are behind a NAT. We start up to 10 VMs every 30 sec. Sometimes we got problems with the CCB

CCBClient: Failed to read response from CCB server collector...

Failed to reverse connect to startd workernode via CCB.

Also the Collector, Negotiator and Scheduler get up to a daemon load of 100% and condor_q /condor_status became slow. However the machines has free resources in memory and CPU. The Collector, Negotiator and Scheduler run Condor version 8.4.8/9 and the workernodes version 8.5.7

The network between the VMs and the Collector looks stable. Our plan is to start additional Collectors with CCBs. Would that help? How much Collectors do we need and how we should configure our system?

Thanks and best regards,

Matthias


Thank you for any help you can provide.
Thank you for any help you can provide.
Thank you for any help you can provide.
Thank you for any help you can provide.
Thank you for any help you can provide.