[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] RuntimeError: Failed to receive remote ad.



On 3/23/2018 2:41 PM, Larry Martell wrote:
I have a python script that makes this call:

schedd.xquery(requirements="ClusterId == %d" % id)

Sometimes it throws an exception 'RuntimeError: Failed to receive remote ad.'


Hi Larry,

About how often is "sometimes"... 50% of the time? 10% of the time? 1 in 5000000 ?

How many queries / transactions per minute are you trying to do? For instance, if you submit 2,000 jobs and then subsequently do 2,000 xquery() calls every 10 seconds, that could be a problem.... better to submit the jobs and then do ONE query that fetches all 2,000 jobs every minute or so.... (i.e. batch your queries)

Maybe try Schedd.query() instead of xquery()? I ask because in the most recent versions of HTCondor, the Schedd.query() method gives more information about failures then Schedd.xquery(), and also query()'s implementation is less complex and thus less likely to have an intermittent failure. The only disadvantage to query() over xquery() is your python program may need to use more RAM as all your results are buffered in memory instead of streamed (probably only an issue if you are fetching many attributes from many thousands of jobs...).

Finally, what version of HTCondor are you using? (always good to include this)

p.s. it is always a good practice to add a "projection" argument to every call to query() or xquery() unless you truly need all 80+ attributes about every job.

regards,
Todd