I am having some issues with one of the machines in my cluster. I keep getting ‘Shadow exception’, e.g.
01/18/21 23:58:11 condor_read(fd=17 <127.0.0.1:21523>,,size=5,timeout=10,flags=0,non_blocking=0)
01/18/21 23:58:11 condor_read(): Socket closed abnormally when trying to read 5 bytes from <127.0.0.1:21523>, errno=104 Connection reset by peer
01/18/21 23:58:11 Stream::get(int) failed to read padding
01/18/21 23:58:11 CLOSE TCP <127.0.0.1:31043> fd=17
01/18/21 23:58:11 Starter pid 5973 exited with status 1
Now, the really strange part is that if keep fiddling around with the STARTD-machine (checking logs, running condor_status etc), the job just magically starts. I have no idea what actions make it start, but it does.
The startd-machine is running a newer version of condor (8.8.10) versus the remaining cluster running 8.6. Could that be an issue?
I added startd_debug = D_NETWORK, but didn’t really learn anything. Are there any other useful debugs I should check out?