[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Collector (multi-tier) or different machine


On Fri, Sep 17, 2010 at 10:35 AM, Michael O'Donnell <odonnellm@xxxxxxxx> wrote:
Does anyone have any insight as to how one might configure a pool where the collector is located on a different server than that of the central manager. I am asking because we have expanded our pool and our server (2 CPUs--3GHs) is running at 80% CPU load.

That seems high, but not unusual. Matchmaking can be compute-intensive depending on how you have your system configured. I wouldn't expect that load to be sustained -- it should be bursty as negotiation cycles start and stop, there's a pause between each cycle while Condor waits for the collector database to be refreshed.
Any jobs that we submit are taking as long as 10 hours before a match is found and the jobs run. The only thing I can think of is that the loading on the server is causing the problems, which has significantly increased after doubling the size of our pool.

As others have pointed out: that is highly unusual. Are you setting custom attributes for auto-clustering by any chance? I've seen negotiation cycles blow up when the attributes for auto-clustering weren't carefully considered. Is a single negotiation cycle lasting many hours or do many cycles happen before a job is matched? Has your pool ever been completely utilized? Or do you only see a trickle of jobs through the system, and all running on the same few machines?
I have read the wiki (https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToConfigCollectors) about using milti-tier collectors, but for our system I think all I need to do is locate the collector on a different machine. I have set up the local configuration files so the COLLECTOR daemon is running on the second host, and the global config files specifies the host of the collector with COLLECTOR_HOST. The manual has no information on doing this, and I am not sure if the Collector daemon is required to run on the CM as well as the second host. After I made these changes, I can no longer query the machines in the pool.

It's okay to run the collector and the negotiator on separate machines, though it's usually unnecessary in all but the largest installations. You're far from being a large installation.

How often do you have the schedd's updating the collector with information (SCHEDD_INTERVAL)? Do your negotiator logs say much about what's going on? Are they seeing new jobs added to schedd's in a timely fashion or is that information stale?

- Ian

Cycle Computing, LLC
The Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools