[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor with AMQP



Allen,

Are you using the Startd's work fetch (/job hooks) functionality to pull work from the messaging queues?

If so, you should be looking at the execute nodes to see what the problem may be. It might be helpful to describe the mechanism you're using to deliver the work to Startds some more. Rob Rati implemented just such a system on top of Qpid, which you might be interested in.

Best,


matt

On 09/20/2010 04:17 PM, Shahaan Ayyub wrote:

Allen,
I have never worked with Qpid but it seems from having a quick look at
the documentation that it simply provides a high level interface, in
your case, to condor. I am amazed as to why native condor commands are
not working? Otherwise you might have to look for a wrapper around
native condor commands.
Sorry couldn't be of much help to you.
Regards,
Shahaan
On 21/09/2010, at 5:03 AM, Shahaan Ayyub <shahaan@xxxxxxxxx
<mailto:shahaan@xxxxxxxxx>> wrote:

Hi Allen,
What does condor_q -better-analyze say for different timestamps, i.e.
when some of the jobs are held whilst some of them are still
running/completed.

Regards,
Shahaan

On 21/09/2010, at 3:07 AM, "Berg, Allen" <
<mailto:aberg@xxxxxxxx>aberg@xxxxxxxx <mailto:aberg@xxxxxxxx>> wrote:

We have a relatively small condor cluster its fifteen machines with a
total of 140 cpus.

We have implemented it using Apache Qpid Daemon is installed on the
master node. This package provides the queue “server”. It is the
facility that provides message queuing to the cluster. The Apache
Qpid API for C++ is installed on each cluster node.

What I am seeing that I have questions about is that when I submit
say two jobs very simple just a sleep command for two of the nodes.
The first job will take off and run, the second job will sit there
for possibly 20 minutes before it times out. Within any of the condor
logs I am not seeing any errors or any indications of weirdness. Then
if I run a larger test of say 40 jobs to sleep for 5 seconds, I would
expect that when I send the 40 jobs in they would all be picked up
and run completing in a reasonable amount of time. What I really see
is maybe 20 jobs take off, then 12 will start then maybe 8 and the
last few will complete. How can I find/learn out how the queue
actually performing and what can I do to better tune the queue.

Thanks

Allen

This message and any enclosures are intended only for the addressee.  Please
notify the sender by email if you are not the intended recipient.  If you are
not the intended recipient, you may not use, copy, disclose, or distribute this
message or its contents or enclosures to any other person and any such actions
may be unlawful.  Ball reserves the right to monitor and review all messages
and enclosures sent to or from this email address.
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to
<mailto:condor-users-request@xxxxxxxxxxx>condor-users-request@xxxxxxxxxxx
<mailto:condor-users-request@xxxxxxxxxxx> with a
subject: Unsubscribe
You can also unsubscribe by visiting
<https://lists.cs.wisc.edu/mailman/listinfo/condor-users>https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
<https://lists.cs.wisc.edu/archive/condor-users/>https://lists.cs.wisc.edu/archive/condor-users/


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/