[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Collector (multi-tier) or different machine



same here. Our central manager is able to handle several thousand slots but I do see slowness in scheduling not 10 hours :-)




On Mon, Sep 20, 2010 at 10:36 AM, Dan Bradley <dan@xxxxxxxxxxxx> wrote:
Mike,

It would be helpful to know what processes are using so much CPU on your central manager.  I am used to seeing central managers handle thousands of slots without trouble (albeit in Linux), so something must be unusual about your situation.

--Dan


On 9/20/10 8:26 AM, Michael O'Donnell wrote:

Thank you Mag. I did a couple things and will explain what I found out.

First, I attempted to move the collector to a second server by specifying a different host. This would not work and as a result the central manager could not pick up any machines in the pool. I then tried to have the collector daemon run on both the central manager server as well as a second server but have the second server be the collector_host. This still did not work. I guess I do not understand the concepts here because it seems like one should be able to have the collector or multiple collectors run on different hosts.

Second, I decided to change servers for my central manager. We have about 115 slots in our pool. Every slot is a windows OS and including the central manager. I moved the central manager from a windows 2008 (32 bit) server with 2 physical CPUs to a dual quad core running as 64bit. My overall CPU load dropped from 80% to about 6-10% distributed across all cores. When I did this I was finally able to submit jobs.

We did not have any problems when we were running about 50 slots. As soon as I doubled our pool size, any submitted jobs would sit in the pool for up to 10 hours before running. The CPU load increased from about 20% to 80% after doubling the pool size on the windows 2008 server (2 physical CPUs). Every machine could be tracked in the pool with Condor, but jobs were not be submitted because no matches could be made. If I looked at the classAds, there were a ton of machines that were available. So either the collector was not working properly or the negotiator was not working. It was probably related to the negotiator, but I thought if I could off load the server by moving the collector this would help.

As soon as I changed to a dual quad core, everything worked instantly. Based on everything I have read, our server should have been plenty to handle such a small pool.


It would be extremely interesting to see a graph noting the performance of Condor with increasing pool sizes. I do not know if anyone has any data on this, but if you do I would love to see it.

Thank you,
Mike






From: Mag Gam <magawake@xxxxxxxxx>
To: Condor-Users Mail List <condor-users@xxxxxxxxxxx>
Date: 09/18/2010 04:55 AM
Subject: Re: [Condor-users] Collector (multi-tier) or different machine
Sent by: condor-users-bounces@xxxxxxxxxxx





I too would like to know but unfortunately I don't think its possible.

I don't think the collector is the problem in your situation. How many
machines are in your pool?  How many matches are occurring every X
minutes and how many free slots are available?

On Fri, Sep 17, 2010 at 10:35 AM, Michael O'Donnell <odonnellm@xxxxxxxx> wrote:
>
> Does anyone have any insight as to how one might configure a pool where the
> collector is located on a different server than that of the central manager.
> I am asking because we have expanded our pool and our server (2 CPUs--3GHs)
> is running at 80% CPU load. Any jobs that we submit are taking as long as 10
> hours before a match is found and the jobs run. The only thing I can think
> of is that the loading on the server is causing the problems, which has
> significantly increased after doubling the size of our pool.
>
> Our pool includes all windows systems, we have approximately 115 slots and
> we are using the strictest of security settings (SSL, authentication,
> encryption, authorization, integrity).
>
> I have read the wiki
> (
https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToConfigCollectors)
> about using milti-tier collectors, but for our system I think all I need to
> do is locate the collector on a different machine. I have set up the local
> configuration files so the COLLECTOR daemon is running on the second host,
> and the global config files specifies the host of the collector with
> COLLECTOR_HOST. The manual has no information on doing this, and I am not
> sure if the Collector daemon is required to run on the CM as well as the
> second host. After I made these changes, I can no longer query the machines
> in the pool.
>
> Thank you for your suggestions,
> Mike
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
>
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
>
https://lists.cs.wisc.edu/archive/condor-users/
>
>
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


_______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/