Last week we faced a strange schedd behaviour.
The server running the schedd has been shutdown in a cold way.
[root@schedd-03 ~]# condor_version
$CondorVersion: 8.2.9 Aug 13 2015 BuildID: 335839 $
We restarted it again just after a minute and we found that all the running jobs before the shutdown in IDLE status
though they were still running on the WN. In the schedd log I found these log messages
11/17/16 10:07:14 Marked job 1024870.0 as IDLE 11/17/16 10:07:14 Marked job 1024871.0 as IDLE
I wasnât able to reproduce the issue on a test schedd instance running this condor version
[root@ui01 ~]# condor_version
$CondorVersion: 8.4.9 Sep 29 2016 BuildID: 382747 $
trying the following actions
- restating condor
- shutting down the server
- killing -9 all the condor processes
the startd always get back a shadow connection
11/23/16 11:58:44 (pid:26843) Accepted request to reconnect from <22.214.171.124:26166>
11/23/16 11:58:44 (pid:26843) Ignoring old shadow <126.96.36.199:9618?addrs=188.8.131.52-9618&noUDP&sock=2400_c45b_1>
11/23/16 11:58:44 (pid:26843) Communicating with shadow <184.108.40.206:9618?addrs=220.127.116.11-9618&noUDP&sock=3913_cff8_1>
why has the job been marked as IDLE ?
thanks in advance
Description: S/MIME cryptographic signature