[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs stuck in condor_starter



Steffen,

Is there anything logged in the corresponding ShadowLog on the submit side? If not, it may be helpful to add D_SYSCALLS to the SHADOW_DEBUG config setting.

--Dan

Steffen Grunewald wrote:

I've found a job cluster that won't run. Jobs are matched against a slot,
output and error files are created, but condor_starter never transfers
control to the real Executable (which is a Perl script).

In the slot's StarterLog, there are those messages every hour:

8/19 13:40:48 ERROR "Assertion ERROR on (result)" at line 384 in file NTsenders.C
8/19 13:40:48 condor_write(): Socket closed when trying to write 168 bytes to <10.100.200.93:60802>, fd is 5
8/19 13:40:48 Buf::write(): condor_write() failed
8/19 13:40:48 ERROR "Assertion ERROR on (result)" at line 875 in file NTsenders.C

A by-product is that apparently there are more jobs in R state than slots
available (809 free slots, 814 R jobs)

How to interpret the assert() error?

Condor version 7.0.4

Regards, Steffen