[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor and diskless beowulf cluster


We've run Condor pools of up to 128 nodes (512 cores) in a netbooted config with no trouble. In this configuration, the head node has a complete local Linux installation but it, along with the compute nodes, gets Condor via NFS from a clustered NAS. The compute nodes all have local disks for swap and scratch, but that's it - everything else resides on the NFS servers.

Users can cripple this configuration by doing stupid I/O (e.g. all 256 cores trying to create small files as fast as possible, all in the same directory), but putting Condor on the local disks wouldn't help.

Having said that, we recently moved the Condor installation to each node's local disk. We did this mostly to become more "standard", since most other sites seem to do things this way. The biggest drawback is that the log files are scattered all over the place now, whereas they used to be in one tidy directory tree on the NFS servers.



Chance Reschke
Biochemistry Department
University of Washington

On Nov 8, 2007, at 12:12 AM, Steffen Grunewald wrote:

On Wed, Nov 07, 2007 at 05:30:36PM -0500, Vasil Lalov wrote:

I am currently in the process of building a small mini grid of 2
clusters. One of them is already up and running and condor is working

The second cluster is a diskless node cluster on which there is no
Condor installation at this point.

I need to know if I need to install Condor on each diskless compute node as a condor execute machine? Since the entire OS of the compute nodes is loaded from the head node, will installing Condor on the head node as
master, execute and submission machine be enough?

Some years ago we had a single Condor installation on a NFS volume (but the number of nodes was ~10 only). Since there is the opportunity to have
individual config files for the nodes (selected by hostname) you still
can configure everything as you like. (DAEMON_LIST, START, NETWORK_INTERFACE
for the head node[s])

It might make sense to move some stuff to ramdisks to reduce the load on the network... at the expense of available memory though, so there's a tradeoff.

Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at: