Subject: [Condor-users] why my submitted job runs for 2 mins and gets suspended and unsuspended every 10 mins
hi,
i have setup condor for 2 machines with NFS sharing(linux fedora core 3) .The installation and configuration was perfect.But when i submit a simple example job sh_loop.cmd ,it
gets excecuted for couple of mins and afterwards get suspeneded and after sometime again gets unsuspended and afterwards gets
evicted.please can any one help me out with this .i am showing the sh_loop.log file
here .i hope it might be of some help.This log file is for a single
machine.i am facing the same problem for single and multiple machines
000 (002.000.000) 04/25 20:02:23 Job submitted from host: <127.0.0.1:32878>
...
001 (002.000.000) 04/25 20:02:25 Job executing on host: <127.0.0.1:32877>
...
010 (002.000.000) 04/25 20:02:30 Job was suspended.
Number of processes actually suspended: 2
...
011 (002.000.000) 04/25 20:12:31 Job was unsuspended.
...
004 (002.000.000) 04/25 20:12:32 Job was evicted.
(0) Job was not checkpointed.
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
0 - Run Bytes Sent By Job
0 - Run Bytes Received By Job
...
001 (002.000.000) 04/25 20:22:29 Job executing on host: <127.0.0.1:32877>
...
010 (002.000.000) 04/25 20:22:33 Job was suspended.
Number of processes actually suspended: 2
...
011 (002.000.000) 04/25 20:32:34 Job was unsuspended.
...
004 (002.000.000) 04/25 20:32:34 Job was evicted.
(0) Job was not checkpointed.
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
0 - Run Bytes Sent By Job
0 - Run Bytes Received By Job
...
001 (002.000.000) 04/25 20:42:28 Job executing on host: <127.0.0.1:32877>
...
010 (002.000.000) 04/25 20:42:33 Job was suspended.
Number of processes actually suspended: 2
...
011 (002.000.000) 04/25 20:52:36 Job was unsuspended.
...
004 (002.000.000) 04/25 20:52:36 Job was evicted.
(0) Job was not checkpointed.
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
0 - Run Bytes Sent By Job
0 - Run Bytes Received By Job
...