[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Getting mad trying to flocking in condor



On Jul 4, 2012, at 2:57 AM, Michell Guzman Cancimance wrote:

I'm getting mad trying to flock a job from a cluster A (master.cluster.org, 172.18.0.2) to a cluster B (cl-master.mycluster.org. 178.12.100.2),
each cluster have a master and two worker nodes, the cluster A have nodes with arch X86_64, and the cluster
B have nodes with arch INTEL (32 bits). I have configured the two condor_config (the flocking section) in each master nodes of this clusters (master.cluster.org and cl-master.mycluster.org nodes) following the steps in (http://research.cs.wisc.edu/condor/manual/v6.8/5_2Connecting_Condor.html). When I run a job en each cluster separately that works fine, but when I run a job with a requirement of an arch INTEL into the cluster A (the cluster whose nodes have X86_64 Arch) trying to
do a flock to the cluster B doesn't works. I have tried a lot of stuff but I can't get any success. I would appreciate any help in order to solve this problem.

Here are a couple things to try:

* Ensure you are setting should_transfer_files and when_to_transfer_output in your submit file, so that Condor isn't restricting the jobs to run only on machines that have the same shared filesystem.

* Run 'condor_status -submitters' on cluster B and see if there's an ad from the schedd on cluster A. If it's there, then the schedd from A is successfully flocking to B, though possibly not getting any matches.

* If A is flocking to B but not getting any matches, run 'condor_q -analyze -pool cl-master.mycluster.org' to see if any job or machine requirements are preventing a match.

* Check that you have FLOCK_TO set on A and FLOCK_FROM set on B.

Thanks and regards,
Jaime Frey
UW-Madison Condor Team