[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] RuntimeError: Failed to receive remote ad.
- Date: Mon, 26 Mar 2018 15:29:36 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] RuntimeError: Failed to receive remote ad.
On 3/23/2018 2:41 PM, Larry Martell wrote:
I have a python script that makes this call:
schedd.xquery(requirements="ClusterId == %d" % id)
Sometimes it throws an exception 'RuntimeError: Failed to receive remote ad.'
About how often is "sometimes"... 50% of the time? 10% of the time? 1
in 5000000 ?
How many queries / transactions per minute are you trying to do? For
instance, if you submit 2,000 jobs and then subsequently do 2,000
xquery() calls every 10 seconds, that could be a problem.... better to
submit the jobs and then do ONE query that fetches all 2,000 jobs every
minute or so.... (i.e. batch your queries)
Maybe try Schedd.query() instead of xquery()? I ask because in the most
recent versions of HTCondor, the Schedd.query() method gives more
information about failures then Schedd.xquery(), and also query()'s
implementation is less complex and thus less likely to have an
intermittent failure. The only disadvantage to query() over xquery() is
your python program may need to use more RAM as all your results are
buffered in memory instead of streamed (probably only an issue if you
are fetching many attributes from many thousands of jobs...).
Finally, what version of HTCondor are you using? (always good to include
p.s. it is always a good practice to add a "projection" argument to
every call to query() or xquery() unless you truly need all 80+
attributes about every job.