[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Not FLOCKing.



When in doubt, blame firewalls. Can all the execute nodes see all the submit nodes and vice versa?

The firewalls need to be open for your specified high ephemeral port range as well as a few well defined ports.

 

From your machine names I am guessing that might not be the case this time round, so maybe check the documentation and see where it says the flock_to and flock_from need to be set.

 

BTW a hint at testing flocking I found useful. Set the Requirements to send a job to specific machine name.

 

You can also use condor_config_val to make sure the values you set are there.

 

JK

 

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Edier Alberto Zapata Hernández
Sent: Wednesday, June 16, 2010 8:33 PM
To: condor-users
Subject: [Condor-users] Not FLOCKing.

 

Good afternoon, today I setup 2 testbeds to try Condor's Flocking, the structure of them are:

TB1:

Central Manger pc56
Submit Node pc55
Submit/Execute Node pc54

TB2:

Central Manager with Submit pc50
Execute Node pc51
Execute Node pc52
Execute Node pc53

 

I added:

in the condor_config.local file of PC55

And:

 FLOCK_FROM=*.eisc.univalle.edu.co

in the computer PC50

 

I submitted a job from PC55 with this submit file:

##
# Test Submit File
##
# Use: condor_submit testTask.condor
should_transfer_files = Yes
when_to_transfer_output = ON_EXIT_OR_EVICT
Executable = /bin/hostname
Universe = vanilla
Output = hostOut.$(Process)
Error = hostErr.$(Process)
Queue 50

But all of the process ( 50 ) run in PC54 none flocked to TB2, anyone can give an idea why?

 

Thank you very much.

 

PD. Obviously I restarted the nodes after the changes.

 

Note: Before that, I tryed with no Execute nodes in TB1 and the jobs go to TB2 but I got a Failed to create hostOut.[0-49] file in TB2's nodes, because a Permission Denied error. The Condor's owner in both TB is the same, same password but can't access by SSH to any node, maybe this was the fail?

 

----
Edier Alberto Zapata Hernández
Est. Ingeniería de Sistemas
Universidad de Valle


--
Scanned by iCritical.