[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] job really running but idle on the schedd after a cold shutdown


Last week we faced a strange schedd behaviour.

The server running the schedd has been shutdown in a cold way.

[root@schedd-03 ~]# condor_version
$CondorVersion: 8.2.9 Aug 13 2015 BuildID: 335839 $

We restarted it again just after a minute and we found that all the running jobs before the shutdown in IDLE status 
though they were still running on the WN. In the schedd log I found these log messages

11/17/16 10:07:14 Marked job 1024870.0 as IDLE
11/17/16 10:07:14 Marked job 1024871.0 as IDLE

I wasnât able to reproduce the issue on a test schedd instance running this condor version

[root@ui01 ~]# condor_version
$CondorVersion: 8.4.9 Sep 29 2016 BuildID: 382747 $

trying the following actions

- restating condor
- shutting down the server
- killing -9 all the condor processes

the startd always get back a shadow connection 

11/23/16 11:58:44 (pid:26843) Accepted request to reconnect from <>
11/23/16 11:58:44 (pid:26843) Ignoring old shadow <>
11/23/16 11:58:44 (pid:26843) Communicating with shadow <>

why has the job been marked as IDLE ?

thanks in advance


Attachment: smime.p7s
Description: S/MIME cryptographic signature