[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Very slow response from condor_q andcondor_status



Having
everything in /home/condor is nice and organized, but it seems like we
should be okay moving stuff around.
Thanks again!
-Colin

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Erik Paulson
Sent: Saturday, December 03, 2005 12:38 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Very slow response from condor_q
andcondor_status

On Sat, Dec 03, 2005 at 08:32:07PM -0000, Chris Miles wrote:
I have the same issue and have put it down to the mounted home
directory, as when I occasionaly
get the slow response from condor_status If I try to copy a file from
/home/condor to the local drive
its terribly unresponsive and slow also. Never bothered me enough to
find a solution though.


Problems with NFS file locking, maybe?
Try putting your log files on a local disk - I could believe that
the problem is Condor writing to a logfile that's on NFS, and either
the lock takes a long time to accquire, or the write itself takes a long
time to finish. Both of those would cause a daemon to just freeze up.

-Erik

Chris
----- Original Message ----- From: Little, Colin E To: condor-users@xxxxxxxxxxx Sent: Friday, December 02, 2005 6:20 PM
  Subject: [Condor-users] Very slow response from condor_q and
condor_status


  I'm setting up a condor pool which is currently just barely up and
running.  We have 1 Central Master server, 1 Submit only machine and 2
execute/submit machines.  All are running Redhat Enterprise Linux.
Occasionally we'll find that condor_q and condor_status will hang for
long periods of time (1-2 minutes or more) before responding.  I haven't
been able to reliably reproduce it, so I'm hoping that others may have
seen something similar.



  Things we've thought of:



  It seems that it hangs at times when a job is sitting in the "2
Servers match, match, but reject the job for unknown reasons" stage,
which I believe is waiting for the Negotiator.  We've lowered the
NEGOTIATOR_INTERVAL to 30, which seemed to help a bit, but might have
just been wishful thinking.



  We raised the number of vm's per execute/submit machine from 1 to 2,
which seems to have improved it, but that could easily be coincidence.



  Condor's main directory is in /home/condor, which is an NFS mounted
partition. We suspect that NFS might be hiccupping, preventing the
collector from retrieving the status info.







  At this point we don't know if the problem is with the machines, the
network, condor or any of a hundred other things, but any insight into
this problem would be very helpful.



  Thanks a lot.

  -Colin Little






------------------------------------------------------------------------
------


  _______________________________________________
  Condor-users mailing list
  Condor-users@xxxxxxxxxxx
  https://lists.cs.wisc.edu/mailman/listinfo/condor-users
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users