[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Evictions



I just rebuilt by test bed grid with Condor 6.7.14 and ran a little test c++ program that searches for prime numbers as a test.  For some reason the program gets evicted from the nodes.  It eventually completed without any errors, but took a very long time with a lot of evictions.
 
Is there something simple I can do to the configuration to stop these evictions?
 
Here is part of the log file, the error file was empty.
 
Example from Log:
 
001 (010.000.000) 01/28 14:54:41 Job executing on host: <192.168.0.2:32773>
...
004 (010.000.000) 01/28 14:54:41 Job was evicted.
 (0) Job was not checkpointed.
  Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
  Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
 224  -  Run Bytes Sent By Job
 13518504  -  Run Bytes Received By Job
 
after being evicted multiple times if finally ran almost half an hour later:
 
001 (010.000.000) 01/28 15:23:43 Job executing on host: <192.168.0.1:32775>
...
005 (010.000.000) 01/28 15:23:55 Job terminated.
 (1) Normal termination (return value 0)
  Usr 0 00:00:03, Sys 0 00:00:00  -  Run Remote Usage
  Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
  Usr 0 00:00:03, Sys 0 00:00:00  -  Total Remote Usage
  Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
 1096  -  Run Bytes Sent By Job
 13520753  -  Run Bytes Received By Job
 2440  -  Total Bytes Sent By Job
 94631776  -  Total Bytes Received By Job
 
I sent 28 versions of this in the submittal program and each job had this problem with evictions, there were no other jobs in the queue.  All of the 28 jobs eventually completed without errors.
 
Thanks,
Steve Broughton
University of Idaho