[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor 7.2.4 / 7.4.1 — "Can't find resource with ClaimId" errors from startd



Daniel,

I recommend the following course of action to debug this problem further:

If you haven't already, turn on verbose debugging information in the execute node configuration:

ALL_DEBUG = D_FULLDEBUG D_COMMAND

When the problem happens, send the full StartLog to condor-admin@xxxxxxxxxxxx

It may also be useful to see the collector and negotiator logs for the same time period and with the same extra debugging options.

--Dan

Daniel Pittman wrote:
G'day.

We have been having some problems with "Can't find resource for ClaimId"
errors between our submission and execution nodes.  These initially occurred
on 7.2.4, and after reading the release notes I updated to 7.4.1 and retested.

Sadly, we still seem to be getting them, and I can't quite understand why.

An example set of errors is:

02/23 13:57:48 Error: can't find resource with ClaimId (<192.168.12.27:11734>#1266891
856#1#...)
02/23 13:57:48 Error: can't find resource with ClaimId (<192.168.12.27:11734>#1266891
856#1#...)
02/23 13:58:19 Error: can't find resource with ClaimId (<192.168.12.27:11734>#1266891
856#1#...)
02/23 13:58:19 Error: can't find resource with ClaimId (<192.168.12.27:11734>#1266891
856#1#...)
02/23 13:58:49 Error: can't find resource with ClaimId (<192.168.12.27:11734>#1266891
856#1#...)
02/23 13:58:49 Error: can't find resource with ClaimId (<192.168.12.27:11734>#1266891
856#1#...)
02/23 13:59:19 Error: can't find resource with ClaimId (<192.168.12.27:11734>#1266891
856#1#...)
02/23 13:59:19 Error: can't find resource with ClaimId (<192.168.12.27:11734>#1266891
856#1#...)

We were getting regular bursts of these, for all execution attempts for five
minutes or more, despite the central manager matching them successfully during
negotiation.


I can't work out where to go next in debugging these, and the manual hasn't
shed any more light on the problem.  Can anyone advise what I should look into
when we encounter this again to trace it to a root cause?

Regards,
        Daniel