[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [condor-users] Condor Pool and multiple cpus
- Date: Thu, 25 Sep 2003 20:11:05 -0500
- From: Alain Roy <roy@xxxxxxxxxxx>
- Subject: Re: [condor-users] Condor Pool and multiple cpus
VIRTUAL_MACHINE_TYPE_1 = cpus=1, mem=50%
NUM_VIRTUAL_MACHINES_TYPE_1 = 1
Why do you want to advertise the CPU has only 50% of the memory it has?
By the way, it's probably simpler to just use the NUM_CPUS:
But I find that the jobs I start are just idling in the queue.
This may or may not be due to your change to force a single CPU--I suspect
3 reject your job because of their own requirements
The important thing is to figure out the requirements of the machine to
figure out why they aren't being met.
Here's what you do.
1) Pick one of the jobs and one of the machines. Say you pick job 5.0 and
machine foo.example.com. (I don't know the real host names.)
2) Look at what a job advertises:
condor_q -l 5.0
This will give you the ClassAd for the job. Notice a few things: notice the
Requirements of the job, and notice the attributes that it references. For
instance, one of the requirements may be "Disk > DiskUsage". Disk will be
an attribute of the machine, and DiskUsage an attribute of the job, so look
at the DiskUsage in the job to see what it is.
3) Look at what a machine advertises:
condor_status -l foo.example.com
Again, this gives you the ClassAd for the machine. Look again at the
requirements and the related attributes. Sometimes machines have tricky
requirements to track down.
In your case, the requirements for the machine are not being met. You can
figure out why by looking at the requirements of the machine and seeing
what the problem is.
Some common problems you'll encounter:
1) The machine must be idle for a certain amount of time, but it hasn't
been idle long enough. (Condor_q -analyze can't tell that this is a problem
with the machine instead of the job.)
2) The job requires more memory than the machine has.
Does it seem like a pain to analyze these requirements? It is, but we're
working on making this better. In Condor 6.5.5, we have "condor_analyze"
with is an advanced version of "condor_q -analyze" that tries to do this
analysis for you. If you have Condor 6.5.5, give it a shot. Even if you do
have it, going through this exercise may be useful to help you understand
how Condor works.
I hope this helps!
Condor Support Information:
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>