[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor 7.2.4 / 7.4.1 — "Can't find resource with ClaimId" errors from startd



G'day.

We have been having some problems with "Can't find resource for ClaimId"
errors between our submission and execution nodes.  These initially occurred
on 7.2.4, and after reading the release notes I updated to 7.4.1 and retested.

Sadly, we still seem to be getting them, and I can't quite understand why.

An example set of errors is:

02/23 13:57:48 Error: can't find resource with ClaimId (<192.168.12.27:11734>#1266891
856#1#...)
02/23 13:57:48 Error: can't find resource with ClaimId (<192.168.12.27:11734>#1266891
856#1#...)
02/23 13:58:19 Error: can't find resource with ClaimId (<192.168.12.27:11734>#1266891
856#1#...)
02/23 13:58:19 Error: can't find resource with ClaimId (<192.168.12.27:11734>#1266891
856#1#...)
02/23 13:58:49 Error: can't find resource with ClaimId (<192.168.12.27:11734>#1266891
856#1#...)
02/23 13:58:49 Error: can't find resource with ClaimId (<192.168.12.27:11734>#1266891
856#1#...)
02/23 13:59:19 Error: can't find resource with ClaimId (<192.168.12.27:11734>#1266891
856#1#...)
02/23 13:59:19 Error: can't find resource with ClaimId (<192.168.12.27:11734>#1266891
856#1#...)

We were getting regular bursts of these, for all execution attempts for five
minutes or more, despite the central manager matching them successfully during
negotiation.


I can't work out where to go next in debugging these, and the manual hasn't
shed any more light on the problem.  Can anyone advise what I should look into
when we encounter this again to trace it to a root cause?

Regards,
        Daniel
-- 
✣ Daniel Pittman            ✉ daniel@xxxxxxxxxxxx            ☎ +61 401 155 707
               ♽ made with 100 percent post-consumer electrons