[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor SOAP hanging schedd.



On 6-Jul-10, at 10:01 AM, Matthew Farrellee wrote:
I imagine if the Schedd is using CPU and IO bandwidth then it's just a matter of the response to the getJobAds taking a long time to write. If this happened all the time I'd imagine maybe the Schedd is just slow. However, it could be that your client is periodically reading slowly. Maybe the client is interleaving reads with computation.

No, I wrote the client myself. All it does is read, and once it's completed reading, it starts parsing the XML. What actually happened was that my client was timing out (its timeout was set to 90sec, and it took the schedd about 5 minutes to generate and send the response). My workaround for now is to just set a huge timeout, which seems to be working okay.

I think there might actually be a bug here, since the schedd seems to choke if the client times out. You can test this by running getJobAds against a schedd, then canceling the request before it completes. The schedd will just sit spinning your CPU forever until Master eventually kills it.

It could be that I'm interpreting this wrong though. Any thoughts?

--patrick