I have installed Condor 6.8.6 few weeks ago
(so I am still new to condor).
We are running Condor on small pool of 6 machines.
One of them is central manager, submit, scheduler (also
acts as dedicated scheduler) and execute machine.
The rest of the pool are execute machines (configured as
dedicated resources). Execute machines are 4-core machines
We are experiencing 2 problems with parallel jobs submissions.
I submit job1 which requires 4 CPU on, say, node1.
After some time it is executed.
Then I submit job2 witch again requires 4 CPUs on node1.
This one stays in idle state, because no more CPUs are available on node1.
As last I submit a job3 to node2.
The strange is that this job stays idle until job2 is executed.
But because node2 is free I do not see a reason why
it should stay idle and wait for job2.
It looks like the job queue for parallel universe is processed
strictly in FIFO policy. Is this normal behavior for parallel
universe or am I missing something?
Note: In vanilla universe job management work
as expected – the job3 will be executed right after submission.
After the job for parallel universe is submitted to queue
it stays idle for some time. Sometimes it is executed in 10s
of seconds, sometimes in few minutes. We usually use
condor_reschedule, which helps to execute the job
(at least we think it helps). The jobs for vanilla universe
are executed right after they are submitted (assuming
there are free CPUs to run the job).
Is this normal behavior of parallel universe
or is it just due to configuration of Condor?
If it is configuration, how can I change it?
If you need some configuration files, log files or whatever,
just tell me, I will send it.
Thanks in advance for any help or suggestion.