[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Job stoped



Hi, below the state of my condor pool and... my problem is: the job 12.0 Schedded on atena.solidos.quimica.ufjf.br by user mateus was interrupted when the user rafael submitted the job 22.0 on fenix.solidos.quimica.ufjf.br ... now, the rafel job is running in the same machine that the mateus job was running... why? Its very bad!!! Mateus will be furious with me :-) ...

[aryjr@atena ~]$ condor_q -global

-- Schedd: fenix.solidos.quimica.ufjf.br : <192.168.1.182:45514>
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
  18.0   mateus          8/7  11:02   1+08:27:00 R  0   468.8 Ga-g-al2o3-2D.sh
  19.0   mateus          8/7  11:04   1+08:24:57 R  0   478.5 Ga-g-al2o3-3D.sh
  20.0   mateus          8/7  11:04   1+07:27:35 R  0   459.0 Ga-g-al2o3-5D.sh
  21.0   mateus          8/7  11:04   1+07:26:47 R  0   468.8 Ga-g-al2o3-6D.sh
  22.0   rafael          8/8  15:43   0+03:45:04 R  0   1386.7 ca-wplan-4aguas-1.

5 jobs; 0 idle, 5 running, 0 held

-- Schedd: atena.solidos.quimica.ufjf.br : <192.168.1.107:32806>
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
  12.0   mateus          8/7  10:54   1+05:18:38 I  0   1669.9 g-al2o3duplicada_z

1 jobs; 1 idle, 0 running, 0 held

-- Schedd: onyx.solidos.quimica.ufjf.br : <192.168.1.176:32887>
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
   2.0   deyse           8/7  15:33   1+04:40:45 R  0   634.8 hct-cl-3R-A.sh
  11.0   deyse           8/7  16:26   1+03:47:58 R  0   644.5 hct-cl-3R-B.sh
  15.0   deyse           8/8  12:07   0+08:07:24 R  0   302.7 gibbisita.sh
  16.0   deyse           8/8  14:09   0+06:05:19 R  0   371.1 mgcl2.sh
  19.0   deyse           8/8  14:57   0+05:16:50 R  0   253.9 hct-cl-1H.sh

5 jobs; 0 idle, 5 running, 0 held

The analyze of the job stoped is:

[aryjr@atena ~]$ condor_q -analyze 12

-- Submitter: atena.solidos.quimica.ufjf.br : <192.168.1.107:32806> : atena.solidos.quimica.ufjf.br
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
---
012.000:  Run analysis summary.  Of 25 machines,
     23 are rejected by your job's requirements
      1 reject your job because of their own requirements
      1 match but are serving users with a better priority in the pool
      0 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job
        Last successful match: Tue Aug  7 16:19:48 2007
        Last failed match: Wed Aug  8 20:16:15 2007
        Reason for last match failure: insufficient priority

Thanks very much!!! I'm getting many help on this list!!!