[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] job really running but idle on the schedd after a cold shutdown



Hi

Last week we faced a strange schedd behaviour.

The server running the schedd has been shutdown in a cold way.

***
[root@schedd-03 ~]# condor_version
$CondorVersion: 8.2.9 Aug 13 2015 BuildID: 335839 $
***

We restarted it again just after a minute and we found that all the running jobs before the shutdown in IDLE status 
though they were still running on the WN. In the schedd log I found these log messages

^^^
11/17/16 10:07:14 Marked job 1024870.0 as IDLE
11/17/16 10:07:14 Marked job 1024871.0 as IDLE
^^^

I wasnât able to reproduce the issue on a test schedd instance running this condor version

***
[root@ui01 ~]# condor_version
$CondorVersion: 8.4.9 Sep 29 2016 BuildID: 382747 $
***

trying the following actions

- restating condor
- shutting down the server
- killing -9 all the condor processes

the startd always get back a shadow connection 

^^^
11/23/16 11:58:44 (pid:26843) Accepted request to reconnect from <90.147.168.55:26166>
11/23/16 11:58:44 (pid:26843) Ignoring old shadow <90.147.168.55:9618?addrs=90.147.168.55-9618&noUDP&sock=2400_c45b_1>
11/23/16 11:58:44 (pid:26843) Communicating with shadow <90.147.168.55:9618?addrs=90.147.168.55-9618&noUDP&sock=3913_cff8_1>
^^^


why has the job been marked as IDLE ?

thanks in advance

Ale

Attachment: smime.p7s
Description: S/MIME cryptographic signature