[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor SOAP hanging schedd.



On 07/06/2010 05:51 PM, Patrick Armstrong wrote:
On 6-Jul-10, at 10:01 AM, Matthew Farrellee wrote:
I imagine if the Schedd is using CPU and IO bandwidth then it's just a
matter of the response to the getJobAds taking a long time to write.
If this happened all the time I'd imagine maybe the Schedd is just
slow. However, it could be that your client is periodically reading
slowly. Maybe the client is interleaving reads with computation.

No, I wrote the client myself. All it does is read, and once it's
completed reading, it starts parsing the XML. What actually happened was
that my client was timing out (its timeout was set to 90sec, and it took
the schedd about 5 minutes to generate and send the response). My
workaround for now is to just set a huge timeout, which seems to be
working okay.

I think there might actually be a bug here, since the schedd seems to
choke if the client times out. You can test this by running getJobAds
against a schedd, then canceling the request before it completes. The
schedd will just sit spinning your CPU forever until Master eventually
kills it.

It could be that I'm interpreting this wrong though. Any thoughts?

--patrick

It may just be that serializing is taking quite a bit of time.

If you have a simple program that can demonstrate the CPU spinning, please send it along.

Best,


matt