[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] COD troubleshooting



Hi,

I have a problem activating claims using COD (Condor 6.9.0). This is what I'm doing:

> condor_cod request -addr "<131.225.212.148:39446>" -classad ci.out
Successfully sent CA_REQUEST_CLAIM to startd at <131.225.212.148:39446>
Result ClassAd written to ci.out
ID of new claim is: "<131.225.212.148:39446>#1167216341#4"

> condor_cod activate -id "<131.225.212.148:39446>#1167216341#4" - classad ci.out -jobad TestCod Attempt to send CA_ACTIVATE_CLAIM to startd <131.225.212.148:39446> failed Reply ClassAd returned 'Failure' but does not have the ErrorString attribute

On the worker node, I can see the following two lines in the StartdLog right before crashing:

12/27 11:50:05 DaemonCore: Command received via TCP from condor@fcdfcaf444 from host <131.225.240.106:45123> 12/27 11:50:05 DaemonCore: received command 1000 (CA_AUTH_CMD), calling handler (command_classad_handler)

while in the MasterLog:

12/27 11:55:30 The STARTD (pid 15721) died due to signal 11
12/27 11:55:30 All daemons are gone.  Exiting.
12/27 11:55:32 **** condor_master (condor_MASTER) EXITING WITH STATUS 0

TestCod is a file with the following 2 lines:

Cmd="/bin/ps"
Args="-aux"

Am I using condor_cod the right way? Is there a way to have more debugging information to understand what happened?

Thanks
Renzo