Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor and diskless beowulf cluster

Date: Thu, 08 Nov 2007 10:00:59 -0600 (CST)
From: Steven Timm <timm@xxxxxxxx>
Subject: Re: [Condor-users] Condor and diskless beowulf cluster

I'm aware of a cluster here at Fermilab that has 500 nodes all
sharing the same set of condor binaries via NFS.  The tradeoff
is that if the network goes away for any significant length of time
or is down when the machines get power cycled, condor doesn't start
right and you have to go back and start it manually. But it does
make upgrades faster.  I don't like the technique myself but
the admins that do it that way swear by it.

Steve Timm


On Thu, 8 Nov 2007, Steffen Grunewald wrote:

On Wed, Nov 07, 2007 at 05:30:36PM -0500, Vasil Lalov wrote:

Hello,

I am currently in the process of building a small mini grid of 2
clusters. One of them is already up and running and condor is working
fine.

The second cluster is a diskless node cluster on which there is no
Condor installation at this point.

I need to know if I need to install Condor on each diskless compute node
as a condor execute machine?  Since the entire OS of the compute nodes
is loaded from the head node, will installing Condor on the head node as
master, execute and submission machine be enough?


Some years ago we had a single Condor installation on a NFS volume (but
the number of nodes was ~10 only). Since there is the opportunity to have
individual config files for the nodes (selected by hostname) you still
can configure everything as you like. (DAEMON_LIST, START, NETWORK_INTERFACE
for the head node[s])

It might make sense to move some stuff to ramdisks to reduce the load on
the network... at the expense of available memory though, so there's a tradeoff.

Cheers,
Steffen
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.

References:
- [Condor-users] Condor and diskless beowulf cluster
  - From: Vasil Lalov
- Re: [Condor-users] Condor and diskless beowulf cluster
  - From: Steffen Grunewald

Prev by Date: Re: [Condor-users] Condor-G Matchmaking
Next by Date: [Condor-users] Condor_ Master error
Previous by thread: Re: [Condor-users] Condor and diskless beowulf cluster
Next by thread: Re: [Condor-users] Condor and diskless beowulf cluster
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] Condor and diskless beowulf cluster