[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Evictions



I just noticed from the log that all the evictions are from the nodes, the job only completes on the master which is also the submittign machine and the NFS server for the Condor installation binaries.  This test program worked when I had a Condor 6.7.12 install and all the same configuration settings.
 
001 (010.009.000) 01/28 15:24:08 Job executing on host: <192.168.0.1:32775>
...
005 (010.009.000) 01/28 15:24:18 Job terminated.
 (1) Normal termination (return value 0)
  Usr 0 00:00:04, Sys 0 00:00:00  -  Run Remote Usage
  Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
  Usr 0 00:00:04, Sys 0 00:00:00  -  Total Remote Usage
  Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
 1112  -  Run Bytes Sent By Job
 13520876  -  Run Bytes Received By Job
 1112  -  Total Bytes Sent By Job
 13520876  -  Total Bytes Received By Job
...
001 (010.010.000) 01/28 15:24:24 Job executing on host: <192.168.0.1:32775>
...
005 (010.010.000) 01/28 15:24:36 Job terminated.
 (1) Normal termination (return value 0)
  Usr 0 00:00:05, Sys 0 00:00:00  -  Run Remote Usage
  Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
  Usr 0 00:00:05, Sys 0 00:00:00  -  Total Remote Usage
  Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
 1184  -  Run Bytes Sent By Job
 13521426  -  Run Bytes Received By Job
 1184  -  Total Bytes Sent By Job
 13521426  -  Total Bytes Received By Job
...
001 (010.011.000) 01/28 15:24:41 Job executing on host: <192.168.0.1:32775>
...
005 (010.011.000) 01/28 15:24:53 Job terminated.
 (1) Normal termination (return value 0)
  Usr 0 00:00:05, Sys 0 00:00:00  -  Run Remote Usage
  Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
  Usr 0 00:00:05, Sys 0 00:00:00  -  Total Remote Usage
  Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
 1176  -  Run Bytes Sent By Job
 13521365  -  Run Bytes Received By Job
 1176  -  Total Bytes Sent By Job
 13521365  -  Total Bytes Received By Job
...
001 (010.001.000) 01/28 15:24:54 Job executing on host: <192.168.0.2:32773>
...
004 (010.001.000) 01/28 15:24:55 Job was evicted.
 (0) Job was not checkpointed.
  Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
  Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
 224  -  Run Bytes Sent By Job
 13518504  -  Run Bytes Received By Job
...

----- Original Message -----
Sent: Tuesday, January 31, 2006 10:14 AM
Subject: [Condor-users] Evictions

I just rebuilt by test bed grid with Condor 6.7.14 and ran a little test c++ program that searches for prime numbers as a test.  For some reason the program gets evicted from the nodes.  It eventually completed without any errors, but took a very long time with a lot of evictions.
 
Is there something simple I can do to the configuration to stop these evictions?
 
Here is part of the log file, the error file was empty.
 
Example from Log:
 
001 (010.000.000) 01/28 14:54:41 Job executing on host: <192.168.0.2:32773>
...
004 (010.000.000) 01/28 14:54:41 Job was evicted.
 (0) Job was not checkpointed.
  Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
  Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
 224  -  Run Bytes Sent By Job
 13518504  -  Run Bytes Received By Job
 
after being evicted multiple times if finally ran almost half an hour later:
 
001 (010.000.000) 01/28 15:23:43 Job executing on host: <192.168.0.1:32775>
...
005 (010.000.000) 01/28 15:23:55 Job terminated.
 (1) Normal termination (return value 0)
  Usr 0 00:00:03, Sys 0 00:00:00  -  Run Remote Usage
  Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
  Usr 0 00:00:03, Sys 0 00:00:00  -  Total Remote Usage
  Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
 1096  -  Run Bytes Sent By Job
 13520753  -  Run Bytes Received By Job
 2440  -  Total Bytes Sent By Job
 94631776  -  Total Bytes Received By Job
 
I sent 28 versions of this in the submittal program and each job had this problem with evictions, there were no other jobs in the queue.  All of the 28 jobs eventually completed without errors.
 
Thanks,
Steve Broughton
University of Idaho


_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users