[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Shadow exception



I am having some issues with one of the machines in my cluster. I keep getting ‘Shadow exception’, e.g.


01/18/21 23:58:11 condor_read(fd=17 <>,,size=5,timeout=10,flags=0,non_blocking=0)

01/18/21 23:58:11 condor_read(): Socket closed abnormally when trying to read 5 bytes from <>, errno=104 Connection reset by peer

01/18/21 23:58:11 Stream::get(int) failed to read padding

01/18/21 23:58:11 CLOSE TCP <> fd=17

01/18/21 23:58:11 Starter pid 5973 exited with status 1


Now, the really strange part is that if keep fiddling around with the STARTD-machine (checking logs, running condor_status etc), the job just magically starts. I have no idea what actions make it start, but it does.


The startd-machine is running a newer version of condor (8.8.10) versus the remaining cluster running 8.6. Could that be an issue?


I added startd_debug = D_NETWORK, but didn’t really learn anything. Are there any other useful debugs I should check out?