[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Shadow exception



Gents

 

I am having some issues with one of the machines in my cluster. I keep getting ‘Shadow exception’, e.g.

 

01/18/21 23:58:11 condor_read(fd=17 <127.0.0.1:21523>,,size=5,timeout=10,flags=0,non_blocking=0)

01/18/21 23:58:11 condor_read(): Socket closed abnormally when trying to read 5 bytes from <127.0.0.1:21523>, errno=104 Connection reset by peer

01/18/21 23:58:11 Stream::get(int) failed to read padding

01/18/21 23:58:11 CLOSE TCP <127.0.0.1:31043> fd=17

01/18/21 23:58:11 Starter pid 5973 exited with status 1

 

Now, the really strange part is that if keep fiddling around with the STARTD-machine (checking logs, running condor_status etc), the job just magically starts. I have no idea what actions make it start, but it does.

 

The startd-machine is running a newer version of condor (8.8.10) versus the remaining cluster running 8.6. Could that be an issue?

 

I added startd_debug = D_NETWORK, but didn’t really learn anything. Are there any other useful debugs I should check out?

 

Peter