[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Assert in NTsenders.cpp



We've been running along for awhile with Condor v7.2.2 on a few
platforms with no issue.

We recently tried to increase our MAX_JOBS_RUNNING on the main submit
host and now we're starting to see

condor_read(): timeout reading 5 bytes from <ip_addr:port_num>
IO: Failed to read packet header
ERROR "Assertion ERROR on (result)" at line 384 in file NTsenders.cpp

Then the node seems to kick the jobs out

We tried backing off the MAX_JOBS_RUNNING, but now we can't seem to
make this error go away.  Jobs will run for awhile, but eventually
this error pops up

I guess the first question is what is actually failing here?

The submit host seems to be running with plenty of resources and we do
see the shadow processes running.

Is there some contention in the box or software I'm not readily able
to see or know to look for?