On Fri, Sep 17, 2010 at 10:35 AM, Michael O'Donnell
<odonnellm@xxxxxxxx> wrote:
Does anyone have any insight as to how
one might configure a pool where the collector is located on a different
server than that of the central manager. I am asking because we have expanded
our pool and our server (2 CPUs--3GHs) is running at 80% CPU load.
That seems high, but not unusual. Matchmaking can be compute-intensive depending on how you have your system configured. I wouldn't expect that load to be sustained -- it should be bursty as negotiation cycles start and stop, there's a pause between each cycle while Condor waits for the collector database to be refreshed.
Any
jobs that we submit are taking as long as 10 hours before a match is found
and the jobs run. The only thing I can think of is that the loading on
the server is causing the problems, which has significantly increased after
doubling the size of our pool.
As others have pointed out: that is highly unusual. Are you setting custom attributes for auto-clustering by any chance? I've seen negotiation cycles blow up when the attributes for auto-clustering weren't carefully considered. Is a single negotiation cycle lasting many hours or do many cycles happen before a job is matched? Has your pool ever been completely utilized? Or do you only see a trickle of jobs through the system, and all running on the same few machines?
I have read the wiki (https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToConfigCollectors)
about using milti-tier collectors, but for our system I think all I need
to do is locate the collector on a different machine. I have set up the
local configuration files so the COLLECTOR daemon is running on the second
host, and the global config files specifies the host of the collector with
COLLECTOR_HOST. The manual has no information on doing this, and I am not
sure if the Collector daemon is required to run on the CM as well as the
second host. After I made these changes, I can no longer query the machines
in the pool.
It's okay to run the collector and the negotiator on separate machines, though it's usually unnecessary in all but the largest installations. You're far from being a large installation.
How often do you have the schedd's updating the collector with information (SCHEDD_INTERVAL)? Do your negotiator logs say much about what's going on? Are they seeing new jobs added to schedd's in a timely fashion or is that information stale?
Regards,
- Ian