[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] RuntimeError: Failed to receive remote ad.



On Mon, Mar 26, 2018 at 4:29 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
> On 3/23/2018 2:41 PM, Larry Martell wrote:
>>
>> I have a python script that makes this call:
>>
>> schedd.xquery(requirements="ClusterId == %d" % id)
>>
>> Sometimes it throws an exception 'RuntimeError: Failed to receive remote
>> ad.'
>>
>
> Hi Larry,

Sorry for the lack of reply, but I am no longer working for the
company I was using condor with.

> About how often is "sometimes"... 50% of the time? 10% of the time?  1 in
> 5000000 ?

Very very infrequently. We were submitting around 4,000 jobs a day,
and this would happen maybe once every 10 days. But when it happened
it was always with the same job.

> How many queries / transactions per minute are you trying to do?  For
> instance, if you submit 2,000 jobs and then subsequently do 2,000 xquery()
> calls every 10 seconds, that could be a problem.... better to submit the
> jobs and then do ONE query that fetches all 2,000 jobs every minute or
> so.... (i.e. batch your queries)

Yeah, I am doing something like that. This would be a good improvement.

> Maybe try Schedd.query() instead of xquery()?  I ask because in the most
> recent versions of HTCondor, the Schedd.query() method gives more
> information about failures then Schedd.xquery(), and also query()'s
> implementation is less complex and thus less likely to have an intermittent
> failure.  The only disadvantage to query() over xquery() is your python
> program may need to use more RAM as all your results are buffered in memory
> instead of streamed (probably only an issue if you are fetching many
> attributes from many thousands of jobs...).

I will keep this in mind for future work.

> Finally, what version of HTCondor are you using? (always good to include
> this)
>
> p.s. it is always a good practice to add a "projection" argument to every
> call to query() or xquery() unless you truly need all 80+ attributes about
> every job.