[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] my parallel universe error , 4 match but reject the job for unknown reasons

Dear All,

I am a new user of condor. I have two computers two quad-core computers
with Linux ubuntu 7.10 and condor 6.8.8, and I am trying to add them to a
condor pool to running parallel jobs. I set both of them as dedicated
resource (which called mpi0 and mpi1) and mpi0 is,in addition, dedicated
scheduler. (our last condor pool has no dedicated scheduler.)

but parallel jobs only run on the dedicated scheduler (mpi0).
and in mpi1 I get the error:

"4 match but reject the job for unknown reasons"

I think this problem may be appear because my scheduler
is a quad-core machine. but I don't know how to fix it.

In the following you can see some detail of one of my try:

submitted file:
universe = parallel
executable =/bin/sleep
arguments = 30
machine_count = 3
log    = logfile
error  = err

mpi1@.....$ condor_q -analyze

-- Submitter: mpi1.x.x.x : <x.x.x.x:46536> : mpi1.x.x.x
008.000:  Run analysis summary.  Of 25 machines,
     21 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      4 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job

1 jobs; 1 idle, 0 running, 0 held