[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] jobs only executing on their node



On Tuesday, 22 November, 2011 at 8:11 PM, Tom Melendez wrote:
Hi Folks,

Working on setting up my own small two-node test cluster and have
worked through some issues thanks to the list. Now I'm seeing that my
job is only executing on the machine in which it was submitted. I see
from condor_status that all of the slots from both machines are
present. But when I submit the job, despite how many I queue up, they
are only processed on the one node. I haven't seen anything in the
logs to even signal to me that the job is being considered on another
node.

Any ideas?
I'd say 9 times out of 10 it's the FileSystemDomain setting that's blocking jobs from spreading out across machines in a new pool like this.

But you can use condor_q to get a better idea of what machines match your job requirements. Just run:

condor_q -better-analyze <cluterid>.<procid>

Against a job that's idle in your queue. Post the results if they're unclear.

Regards,
- Ian



---
Ian Chesal

Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com
http://www.cyclecloud.com
http://twitter.com/cyclecomputing