Subject: Re: [Condor-users] Collector (multi-tier) or different machine
same here. Our central manager is able to handle several thousand slots but I do see slowness in scheduling not 10 hours :-)
On Mon, Sep 20, 2010 at 10:36 AM, Dan Bradley <dan@xxxxxxxxxxxx> wrote:
Mike,
It would be helpful to know what processes are using so much CPU on
your central manager. I am used to seeing central managers handle
thousands of slots without trouble (albeit in Linux), so something
must be unusual about your situation.
--Dan
On 9/20/10 8:26 AM, Michael O'Donnell wrote:
Thank you Mag. I did a couple
things
and will explain what I found out.
First, I attempted to move the
collector
to a second server by specifying a different host. This would
not work
and as a result the central manager could not pick up any
machines in the
pool. I then tried to have the collector daemon run on both the
central
manager server as well as a second server but have the second
server be
the collector_host. This still did not work. I guess I do not
understand
the concepts here because it seems like one should be able to
have the
collector or multiple collectors run on different hosts.
Second, I decided to change
servers
for my central manager. We have about 115 slots in our pool.
Every slot
is a windows OS and including the central manager. I moved the
central
manager from a windows 2008 (32 bit) server with 2 physical CPUs
to a dual
quad core running as 64bit. My overall CPU load dropped from 80%
to about
6-10% distributed across all cores. When I did this I was
finally able
to submit jobs.
We did not have any problems when
we
were running about 50 slots. As soon as I doubled our pool size,
any submitted
jobs would sit in the pool for up to 10 hours before running.
The CPU load
increased from about 20% to 80% after doubling the pool size on
the windows
2008 server (2 physical CPUs). Every machine could be tracked in
the pool
with Condor, but jobs were not be submitted because no matches
could be
made. If I looked at the classAds, there were a ton of machines
that were
available. So either the collector was not working properly or
the negotiator
was not working. It was probably related to the negotiator, but
I thought
if I could off load the server by moving the collector this
would help.
As soon as I changed to a dual
quad
core, everything worked instantly. Based on everything I have
read, our
server should have been plenty to handle such a small pool.
It would be extremely interesting
to
see a graph noting the performance of Condor with increasing
pool sizes.
I do not know if anyone has any data on this, but if you do I
would love
to see it.
I too would like to know but unfortunately I
don't
think its possible.
I don't think the collector is the problem in your situation.
How many
machines are in your pool? How many matches are occurring
every X
minutes and how many free slots are available?
On Fri, Sep 17, 2010 at 10:35 AM, Michael O'Donnell
<odonnellm@xxxxxxxx>
wrote:
>
> Does anyone have any insight as to how one might
configure a pool
where the
> collector is located on a different server than that of
the central
manager.
> I am asking because we have expanded our pool and our
server (2 CPUs--3GHs)
> is running at 80% CPU load. Any jobs that we submit are
taking as
long as 10
> hours before a match is found and the jobs run. The only
thing I can
think
> of is that the loading on the server is causing the
problems, which
has
> significantly increased after doubling the size of our
pool.
>
> Our pool includes all windows systems, we have
approximately 115 slots
and
> we are using the strictest of security settings (SSL,
authentication,
> encryption, authorization, integrity).
>
> I have read the wiki
> (https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToConfigCollectors)
> about using milti-tier collectors, but for our system I
think all
I need to
> do is locate the collector on a different machine. I have
set up the
local
> configuration files so the COLLECTOR daemon is running on
the second
host,
> and the global config files specifies the host of the
collector with
> COLLECTOR_HOST. The manual has no information on doing
this, and I
am not
> sure if the Collector daemon is required to run on the CM
as well
as the
> second host. After I made these changes, I can no longer
query the
machines
> in the pool.
>
> Thank you for your suggestions,
> Mike
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to
condor-users-request@xxxxxxxxxxx
with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
>
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to
condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users