[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] Condor Pool and multiple cpus




VIRTUAL_MACHINE_TYPE_1 = cpus=1, mem=50%
NUM_VIRTUAL_MACHINES_TYPE_1 = 1

Why do you want to advertise the CPU has only 50% of the memory it has?


By the way, it's probably simpler to just use the NUM_CPUS:

http://www.cs.wisc.edu/condor/manual/v6.5/3_3Configuration.html#SECTION00438000000000000000

But I find that the jobs I start are just idling in the queue.

This may or may not be due to your change to force a single CPU--I suspect it's independent.


3 reject your job because of their own requirements

The important thing is to figure out the requirements of the machine to figure out why they aren't being met.


Here's what you do.

1) Pick one of the jobs and one of the machines. Say you pick job 5.0 and machine foo.example.com. (I don't know the real host names.)

2) Look at what a job advertises:

condor_q -l 5.0

This will give you the ClassAd for the job. Notice a few things: notice the Requirements of the job, and notice the attributes that it references. For instance, one of the requirements may be "Disk > DiskUsage". Disk will be an attribute of the machine, and DiskUsage an attribute of the job, so look at the DiskUsage in the job to see what it is.

3) Look at what a machine advertises:

condor_status -l foo.example.com

Again, this gives you the ClassAd for the machine. Look again at the requirements and the related attributes. Sometimes machines have tricky requirements to track down.

In your case, the requirements for the machine are not being met. You can figure out why by looking at the requirements of the machine and seeing what the problem is.

Some common problems you'll encounter:

1) The machine must be idle for a certain amount of time, but it hasn't been idle long enough. (Condor_q -analyze can't tell that this is a problem with the machine instead of the job.)

2) The job requires more memory than the machine has.

Does it seem like a pain to analyze these requirements? It is, but we're working on making this better. In Condor 6.5.5, we have "condor_analyze" with is an advanced version of "condor_q -analyze" that tries to do this analysis for you. If you have Condor 6.5.5, give it a shot. Even if you do have it, going through this exercise may be useful to help you understand how Condor works.

I hope this helps!

-alain


Condor Support Information: http://www.cs.wisc.edu/condor/condor-support/ To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with unsubscribe condor-users <your_email_address>