[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Negotiator crashing



It looks like my Negotiator keeps crashing. If I look in the NegotiatorLog I see this:

6/22 14:09:08 ERROR "Assertion ERROR on (resource_hash.insert( ResourceName, ResourceAd ) == 0)" at line 785 in file Accountant.cpp

I can restart it, but it dies again after 30 seconds.
Can anyone give me some pointers on how to troubleshoot this? AFAIK nothing has changed that explains this. I've got a lot of jobs in the queue, and they seem to be running, but some of the I submit just fail, and the log says unable to contact the negotiator.

Thanks!
--Peter

Here's more of the log.


6/22 14:13:33 ******************************************************
6/22 14:13:33 ** condor_negotiator (CONDOR_NEGOTIATOR) STARTING UP
6/22 14:13:33 ** /opt/osg-shared/se/app/site/condor-7.2.1/sbin/ condor_negotiator 6/22 14:13:33 ** SubsystemInfo: name=NEGOTIATOR type=NEGOTIATOR(4) class=DAEMON(1) 6/22 14:13:33 ** Configuration: subsystem:NEGOTIATOR local:<NONE> class:DAEMON
6/22 14:13:33 ** $CondorVersion: 7.2.1 Feb 18 2009 BuildID: 133382 $
6/22 14:13:33 ** $CondorPlatform: X86_64-LINUX_RHEL5 $
6/22 14:13:33 ** PID = 7737
6/22 14:13:33 ** Log last touched 6/22 14:09:08
6/22 14:13:33 ******************************************************
6/22 14:13:33 Using config source: /opt/osg-shared/se/app/site/condor/ etc/condor_config
6/22 14:13:33 Using local config sources:
6/22 14:13:33    /opt/osg-local/condor/condor_config.local
6/22 14:13:33 DaemonCore: Command Socket at <10.0.10.39:36051>
6/22 14:13:33 About to rotate ClassAd log /opt/osg-local/condor/spool/ Accountantnew.log
6/22 14:13:34 NEGOTIATOR_SOCKET_CACHE_SIZE = 16
6/22 14:13:34 PREEMPTION_REQUIREMENTS = ( (CurrentTime - EnteredCurrentState) > (1 * (60 * 60)) && RemoteUserPrio > SubmittorPrio * 1.2 ) || (MY.NiceUser == True)
6/22 14:13:34 ACCOUNTANT_HOST = None (local)
6/22 14:13:34 NEGOTIATOR_INTERVAL = 25 sec
6/22 14:13:34 NEGOTIATOR_TIMEOUT = 30 sec
6/22 14:13:34 MAX_TIME_PER_SUBMITTER = 31536000 sec
6/22 14:13:34 MAX_TIME_PER_PIESPIN = 31536000 sec
6/22 14:13:34 PREEMPTION_RANK = (RemoteUserPrio * 1000000) - TARGET.ImageSize
6/22 14:13:34 NEGOTIATOR_PRE_JOB_RANK = RemoteOwner =?= UNDEFINED
6/22 14:13:34 NEGOTIATOR_POST_JOB_RANK = None
6/22 14:13:34 ---------- Started Negotiation Cycle ----------
6/22 14:13:34 Phase 1:  Obtaining ads from collector ...
6/22 14:13:34   Getting all public ads ...
6/22 14:13:34   Sorting 176 ads ...
6/22 14:13:34 Can't evaluate STARTD_AD_REEVAL_EXPR target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, treating as TRUE 6/22 14:13:34 Can't evaluate STARTD_AD_REEVAL_EXPR target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, treating as TRUE 6/22 14:13:34 Can't evaluate STARTD_AD_REEVAL_EXPR target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, treating as TRUE 6/22 14:13:34 Can't evaluate STARTD_AD_REEVAL_EXPR target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, treating as TRUE 6/22 14:13:34 Can't evaluate STARTD_AD_REEVAL_EXPR target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, treating as TRUE 6/22 14:13:34 Can't evaluate STARTD_AD_REEVAL_EXPR target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, treating as TRUE 6/22 14:13:34 Can't evaluate STARTD_AD_REEVAL_EXPR target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, treating as TRUE 6/22 14:13:34 Can't evaluate STARTD_AD_REEVAL_EXPR target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, treating as TRUE 6/22 14:13:34 Can't evaluate STARTD_AD_REEVAL_EXPR target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, treating as TRUE 6/22 14:13:34 Can't evaluate STARTD_AD_REEVAL_EXPR target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, treating as TRUE 6/22 14:13:34 Can't evaluate STARTD_AD_REEVAL_EXPR target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, treating as TRUE 6/22 14:13:34 Can't evaluate STARTD_AD_REEVAL_EXPR target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, treating as TRUE 6/22 14:13:34 Can't evaluate STARTD_AD_REEVAL_EXPR target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool, treating as TRUE
6/22 14:13:34   Getting startd private ads ...
6/22 14:13:34 Got ads: 176 public and 123 private
6/22 14:13:34 Public ads include 7 submitter, 137 startd
6/22 14:13:34 Phase 2:  Performing accounting ...
6/22 14:13:34 Could not lookup state --- assuming not claimed
6/22 14:13:34 Could not lookup state --- assuming not claimed
6/22 14:13:34 Could not lookup state --- assuming not claimed
6/22 14:13:34 Could not lookup state --- assuming not claimed
6/22 14:13:34 Could not lookup state --- assuming not claimed
6/22 14:13:34 Could not lookup state --- assuming not claimed
6/22 14:13:34 Could not lookup state --- assuming not claimed
6/22 14:13:34 Could not lookup state --- assuming not claimed
6/22 14:13:34 Could not lookup state --- assuming not claimed
6/22 14:13:34 Could not lookup state --- assuming not claimed
6/22 14:13:34 Could not lookup state --- assuming not claimed
6/22 14:13:34 Could not lookup state --- assuming not claimed
6/22 14:13:34 Could not lookup state --- assuming not claimed
6/22 14:13:34 Could not lookup state --- assuming not claimed
6/22 14:13:34 Phase 3:  Sorting submitter ads by priority ...
6/22 14:13:34 Phase 4.1:  Negotiating with schedds ...
6/22 14:13:34 Negotiating with nysgrid@xxxxxxxxxx at <10.0.10.39:58621>
6/22 14:13:34 0 seconds so far
6/22 14:13:34     Request 296163.00000:
6/22 14:13:34 Rejected 296163.0 nysgrid@xxxxxxxxxx <10.0.10.39:58621>: no match found
6/22 14:13:34     Got NO_MORE_JOBS;  done negotiating
6/22 14:13:34 Negotiating with ijstokes@xxxxxxxxxx at <10.0.10.39:58621>
6/22 14:13:34 0 seconds so far
6/22 14:13:34     Request 291916.00000:
6/22 14:13:34 Rejected 291916.0 ijstokes@xxxxxxxxxx <10.0.10.39:58621>: no match found
6/22 14:13:34     Got NO_MORE_JOBS;  done negotiating
6/22 14:13:34 Phase 4.2:  Negotiating with schedds ...
6/22 14:13:34 Negotiating with nysgrid@xxxxxxxxxx at <10.0.10.39:58621>
6/22 14:13:34 0 seconds so far
6/22 14:13:34     Request 296163.00000:
6/22 14:13:34 Rejected 296163.0 nysgrid@xxxxxxxxxx <10.0.10.39:58621>: insufficient priority
6/22 14:13:34     Got NO_MORE_JOBS;  done negotiating
6/22 14:13:34 Negotiating with ijstokes@xxxxxxxxxx at <10.0.10.39:58621>
6/22 14:13:34 0 seconds so far
6/22 14:13:34 Phase 4.3:  Negotiating with schedds ...
6/22 14:13:34 Negotiating with ijstokes@xxxxxxxxxx at <10.0.10.39:58621>
6/22 14:13:34 0 seconds so far
6/22 14:13:34     Request 291916.00000:
6/22 14:13:34 Rejected 291916.0 ijstokes@xxxxxxxxxx <10.0.10.39:58621>: no match found
6/22 14:13:34     Got NO_MORE_JOBS;  done negotiating
6/22 14:13:34 ---------- Finished Negotiation Cycle ----------
6/22 14:13:59 ---------- Started Negotiation Cycle ----------
6/22 14:13:59 Phase 1:  Obtaining ads from collector ...
6/22 14:13:59   Getting all public ads ...
6/22 14:13:59   Sorting 176 ads ...
6/22 14:13:59   Getting startd private ads ...
6/22 14:13:59 Got ads: 176 public and 123 private
6/22 14:13:59 Public ads include 7 submitter, 137 startd
6/22 14:13:59 Phase 2:  Performing accounting ...
6/22 14:13:59 ERROR "Assertion ERROR on (resource_hash.insert( ResourceName, ResourceAd ) == 0)" at line 785 in file Accountant.cpp