[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Processing jobs in parallel-universe-queue



I have installed Condor 6.8.6 few weeks ago

(so I am still new to condor).


We are running Condor on small pool of 6 machines.

One of them is central manager, submit, scheduler (also

acts as dedicated scheduler) and execute machine.

The rest of the pool are execute machines (configured as

dedicated resources). Execute machines are 4-core machines

(2xdual-core CPUs).


We are experiencing  2 problems with parallel jobs submissions.


1 )

I submit job1 which requires 4 CPU on, say, node1.

After some time it is executed.

Then I submit  job2 witch again requires 4 CPUs  on node1.

This one stays in idle state, because no more CPUs are available on node1.

As last I submit a job3 to node2.

The strange is that this job stays idle until job2 is executed.

But because node2 is free I do not see a reason why

it should stay idle and wait for job2.

It looks like the job queue for parallel universe is processed

strictly in FIFO policy. Is this normal behavior for parallel

universe or am I missing something?


Note: In vanilla universe job management work

as expected – the job3 will be executed right after submission.



After the job for parallel universe is submitted to queue

it stays idle for some time. Sometimes it is executed in 10s

of seconds, sometimes in few minutes. We usually use

condor_reschedule, which helps to execute the job

(at least we think it helps). The jobs for vanilla universe

are executed right after they are submitted (assuming

there are free CPUs to run the job).

Is this normal behavior of parallel universe

or is it just due to configuration of Condor?

If it is configuration, how can I change it?   



If you need some configuration files, log files or whatever,

just tell me, I will send it.


Thanks in advance for any help or suggestion.