[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Collector (multi-tier) or different machine



Just an added note -

The "Central Manager" is only a convenient name for a common deployment decision - a machine running Collector process(es) and a Negotiator process. Condor has no architectural requirement for where those processes run, only that all nodes can talk to them and they can talk to each other.

Michael, this is a bit of a shot in the dark, but if there are many possible matches (condor_q -better-analyze) but your jobs are not running, there may be security or firewall (esp on Windows it seems lately) setups getting in your way. It is entirely possible that only a few of your slots are on machines that have proper communication paths to the Collector and Schedd. You should check your Condor logs for PERMISSION DENIED messages, and verify that you do not have host firewalls blocking communication.

Communication paths you must have open -
 o Negotiator, Schedd, and Startds to Collector
 o Schedd to and from Startds
 o Schedd to and from Negotiator
should have open -
 o Negotiator to Startds

Best,


matt


On 09/20/2010 07:36 AM, Dan Bradley wrote:
  Mike,

It would be helpful to know what processes are using so much CPU on your
central manager. I am used to seeing central managers handle thousands
of slots without trouble (albeit in Linux), so something must be unusual
about your situation.

--Dan

On 9/20/10 8:26 AM, Michael O'Donnell wrote:

Thank you Mag. I did a couple things and will explain what I found out.

First, I attempted to move the collector to a second server by
specifying a different host. This would not work and as a result the
central manager could not pick up any machines in the pool. I then
tried to have the collector daemon run on both the central manager
server as well as a second server but have the second server be the
collector_host. This still did not work. I guess I do not understand
the concepts here because it seems like one should be able to have the
collector or multiple collectors run on different hosts.

Second, I decided to change servers for my central manager. We have
about 115 slots in our pool. Every slot is a windows OS and including
the central manager. I moved the central manager from a windows 2008
(32 bit) server with 2 physical CPUs to a dual quad core running as
64bit. My overall CPU load dropped from 80% to about 6-10% distributed
across all cores. When I did this I was finally able to submit jobs.

We did not have any problems when we were running about 50 slots. As
soon as I doubled our pool size, any submitted jobs would sit in the
pool for up to 10 hours before running. The CPU load increased from
about 20% to 80% after doubling the pool size on the windows 2008
server (2 physical CPUs). Every machine could be tracked in the pool
with Condor, but jobs were not be submitted because no matches could
be made. If I looked at the classAds, there were a ton of machines
that were available. So either the collector was not working properly
or the negotiator was not working. It was probably related to the
negotiator, but I thought if I could off load the server by moving the
collector this would help.

As soon as I changed to a dual quad core, everything worked instantly.
Based on everything I have read, our server should have been plenty to
handle such a small pool.


It would be extremely interesting to see a graph noting the
performance of Condor with increasing pool sizes. I do not know if
anyone has any data on this, but if you do I would love to see it.

Thank you,
Mike






From: 	Mag Gam <magawake@xxxxxxxxx>
To: 	Condor-Users Mail List <condor-users@xxxxxxxxxxx>
Date: 	09/18/2010 04:55 AM
Subject: 	Re: [Condor-users] Collector (multi-tier) or different machine
Sent by: 	condor-users-bounces@xxxxxxxxxxx


------------------------------------------------------------------------



I too would like to know but unfortunately I don't think its possible.

I don't think the collector is the problem in your situation. How many
machines are in your pool? How many matches are occurring every X
minutes and how many free slots are available?

On Fri, Sep 17, 2010 at 10:35 AM, Michael O'Donnell
<odonnellm@xxxxxxxx> wrote:
>
> Does anyone have any insight as to how one might configure a pool
where the
> collector is located on a different server than that of the central
manager.
> I am asking because we have expanded our pool and our server (2
CPUs--3GHs)
> is running at 80% CPU load. Any jobs that we submit are taking as
long as 10
> hours before a match is found and the jobs run. The only thing I can
think
> of is that the loading on the server is causing the problems, which has
> significantly increased after doubling the size of our pool.
>
> Our pool includes all windows systems, we have approximately 115
slots and
> we are using the strictest of security settings (SSL, authentication,
> encryption, authorization, integrity).
>
> I have read the wiki
> (https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToConfigCollectors)
> about using milti-tier collectors, but for our system I think all I
need to
> do is locate the collector on a different machine. I have set up the
local
> configuration files so the COLLECTOR daemon is running on the second
host,
> and the global config files specifies the host of the collector with
> COLLECTOR_HOST. The manual has no information on doing this, and I
am not
> sure if the Collector daemon is required to run on the CM as well as the
> second host. After I made these changes, I can no longer query the
machines
> in the pool.
>
> Thank you for your suggestions,
> Mike
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
>
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



_______________________________________________
Condor-users mailing list
To unsubscribe, send a message tocondor-users-request@xxxxxxxxxxx  with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/