[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Job run time limit ?



Hi,

we're using Condor to execute jobs which take a lot of time. We easily executed some which took 27 hours. Is there a max run time limit ? Because we launched a longer job, and it stopped after approximately 65 hours (we tried again two times) :

000 (044.009.000) 09/09 15:44:56 Job submitted from host: <172.18.45.80:51293>
001 (044.009.000) 09/09 15:50:18 Job executing on host: <192.168.1.15:49234>
......
007 (044.009.000) 09/12 09:18:00 Shadow exception!
Can no longer talk to condor_starter on execute machine (192.168.1.15)
0 - Run Bytes Sent By Job
2176017 - Run Bytes Received By Job



In the Shadow log :

9/12 09:12:10 (44.7) (10025): ERROR "Can no longer talk to condor_starter on execute machine (192.168.1.23)" at line 63 in file NTreceivers.C
9/12 09:12:57 (44.4) (10013): ERROR "Can no longer talk to condor_starter on execute machine (192.168.1.22)" at line 63 in file NTreceivers.C
9/12 09:13:04 (44.6) (10023): ERROR "Can no longer talk to condor_starter on execute machine (192.168.1.23)" at line 63 in file NTreceivers.C
9/12 09:14:06 (44.8) (10026): ERROR "Can no longer talk to condor_starter on execute machine (192.168.1.15)" at line 63 in file NTreceivers.C
9/12 09:14:14 (44.1) (10010): ERROR "Can no longer talk to condor_starter on execute machine (192.168.1.20)" at line 63 in file NTreceivers.C
9/12 09:14:18 (44.0) (10009): ERROR "Can no longer talk to condor_starter on execute machine (192.168.1.20)" at line 63 in file NTreceivers.C
9/12 09:15:00 (44.3) (10012): ERROR "Can no longer talk to condor_starter on execute machine (192.168.1.21)" at line 63 in file NTreceivers.C
9/12 09:15:06 (44.2) (10011): ERROR "Can no longer talk to condor_starter on execute machine (192.168.1.21)" at line 63 in file NTreceivers.C
9/12 09:15:14 (44.5) (10014): ERROR "Can no longer talk to condor_starter on execute machine (192.168.1.22)" at line 63 in file NTreceivers.C
9/12 09:18:00 (44.9) (10151): ERROR "Can no longer talk to condor_starter on execute machine (192.168.1.15)" at line 63 in file NTreceivers.C


Thanks for your help,
Jérôme Jaglale