We have a
relatively small condor cluster its fifteen machines with a total of 140 cpus.
implemented it using Apache
Qpid Daemon is installed on the master node. This package provides the
queue “server”. It is the facility that provides message queuing to the
cluster. The Apache Qpid API for C++ is installed on each cluster node.
What I am seeing that I have
questions about is that when I submit say two jobs very simple just a sleep
command for two of the nodes. The first job will take off and run, the
second job will sit there for possibly 20 minutes before it times out.
Within any of the condor logs I am not seeing any errors or any indications of
weirdness. Then if I run a larger test of say 40 jobs to sleep for 5
seconds, I would expect that when I send the 40 jobs in they would all be
picked up and run completing in a reasonable amount of time. What I
really see is maybe 20 jobs take off, then 12 will start then maybe 8 and the
last few will complete. How can I find/learn out how the queue
actually performing and what can I do to better tune the queue.
This message and any enclosures are intended only for the addressee. Please
notify the sender by email if you are not the intended recipient. If you are
not the intended recipient, you may not use, copy, disclose, or distribute this
message or its contents or enclosures to any other person and any such actions
may be unlawful. Ball reserves the right to monitor and review all messages
and enclosures sent to or from this email address.