[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Very slow response from condor_q andcondor_status



Thanks for the suggestion; we're going to give it a shot and see what
happens. Our plan is to move all three directories (log, spool, execute)
for each host from the NFS mounted /home/condor/hosts directory to local
directories.  Is there any reason why this could be a bad idea?  Having
everything in /home/condor is nice and organized, but it seems like we
should be okay moving stuff around. 

Thanks again!
-Colin

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Erik Paulson
Sent: Saturday, December 03, 2005 12:38 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Very slow response from condor_q
andcondor_status

On Sat, Dec 03, 2005 at 08:32:07PM -0000, Chris Miles wrote:
> I have the same issue and have put it down to the mounted home
directory, as when I occasionaly
> get the slow response from condor_status If I try to copy a file from
/home/condor to the local drive
> its terribly unresponsive and slow also. Never bothered me enough to
find a solution though.
> 

Problems with NFS file locking, maybe? 

Try putting your log files on a local disk - I could believe that
the problem is Condor writing to a logfile that's on NFS, and either
the lock takes a long time to accquire, or the write itself takes a long
time to finish. Both of those would cause a daemon to just freeze up.

-Erik

> Chris
>   ----- Original Message ----- 
>   From: Little, Colin E 
>   To: condor-users@xxxxxxxxxxx 
>   Sent: Friday, December 02, 2005 6:20 PM
>   Subject: [Condor-users] Very slow response from condor_q and
condor_status
> 
> 
>   I'm setting up a condor pool which is currently just barely up and
running.  We have 1 Central Master server, 1 Submit only machine and 2
execute/submit machines.  All are running Redhat Enterprise Linux.
Occasionally we'll find that condor_q and condor_status will hang for
long periods of time (1-2 minutes or more) before responding.  I haven't
been able to reliably reproduce it, so I'm hoping that others may have
seen something similar.  
> 
> 
> 
>   Things we've thought of:
> 
> 
> 
>   It seems that it hangs at times when a job is sitting in the "2
Servers match, match, but reject the job for unknown reasons" stage,
which I believe is waiting for the Negotiator.  We've lowered the
NEGOTIATOR_INTERVAL to 30, which seemed to help a bit, but might have
just been wishful thinking.
> 
> 
> 
>   We raised the number of vm's per execute/submit machine from 1 to 2,
which seems to have improved it, but that could easily be coincidence. 
> 
> 
> 
>   Condor's main directory is in /home/condor, which is an NFS mounted
partition. We suspect that NFS might be hiccupping, preventing the
collector from retrieving the status info.  
> 
> 
> 
> 
> 
> 
> 
>   At this point we don't know if the problem is with the machines, the
network, condor or any of a hundred other things, but any insight into
this problem would be very helpful.
> 
> 
> 
>   Thanks a lot.
> 
>   -Colin Little
> 
> 
> 
> 
> 
>
------------------------------------------------------------------------
------
> 
> 
>   _______________________________________________
>   Condor-users mailing list
>   Condor-users@xxxxxxxxxxx
>   https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users