[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] flocking question



Hi, I'm attempting to get flocking working from a dedicated cluster to a cluster of workstations, with only partial success.

I think that the issue may be related to internal vs. external networks, and so would like some input from the condor community.

We have a dedicated condor cluster which communicates completely on an internal 10.101.x.x network, which is on an internal network interface on each node and the head. The head node and 10 submit nodes also have an external interface, into which people login and submit jobs. However the schedd and startd etc. all talk to each within this pool on the internal network.

We would like this cluster to be able to flock to a workstation cluster that is completely on the external network. The head node of this workstation cluster can also see the internal network, but its condor_config has explicitly set the interface to be the external one.

FLOCK_TO and FLOCK_FROM were set on both heads nodes with the explcit external interfaces on the other head nodes, and ALLOW_READ and ALLOW_WRITE were set to allow all machines on the external network to be able to interact with the head node of the workstation cluster.

What I've seen is matchmaking between the two head nodes, and the workstation nodes preparing to receive the job but timing out. I've increased the timeout from 2 to 20 minutes without an improvement.

My question is: can the schedd on the submit nodes, which normally talk to the other nodes on an internal network and interface, be able to contact condor on the workstations on the external interface? If not, what options do we have for getting the clusters to flock?


thanx,

rob