I have a 48-node Linux cluster running Condor. One of the node's hard drive crashed and was rebuilt. I have tried, unsuccessfully, to get Condor running again on the rebuilt node (it worked fine before the node crashed, and it works fine for the other 47 machines). The Condor base install is on /home/condor, which is shared across all of the nodes via NFS. The condor user exists on the new node. All I did was create /var/lock/condor/InstanceLock with the proper permissions and make it so the Condor services would start.

The Condor services start on the rebuilt node without any errors, but the Condor master never sees the new node (condor_status doesn't report the rebuilt node). It is as if no information were being sent from the rebuilt node to the master node. However, I know that the network communication is fine (the new node is loading the Condor services off the NFS mount).

The following services are running on the rebuilt node:


and their log files show no errors.

Has anyone seen this problem before? How exactly do the Condor clients communicate to the master node? Is it via a specific TCP/UDP port? I've disabled IPTABLES on both the master and the client to no avail. The weird thing is that all of the other clients are showing up.

Any help would be appreciated.

