[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Not FLOCKing.



On Thu, Jun 17, 2010 at 3:57 AM,  <john.kewley@xxxxxxxxxx> wrote:

When in doubt, blame firewalls. Can all the execute nodes see all the submit nodes and vice versa?

Yes they do. 

The firewalls need to be open for your specified high ephemeral port range as well as a few well defined ports.

They are, all the computer are in the same room, the firewall is filtering outside traffic not inside, I mean the PCs does not have firewalls on. 

From your machine names I am guessing that might not be the case this time round, so maybe check the documentation and see where it says the flock_to and flock_from need to be set.

In the Pool1 submit nodes which Flock, FLOCK_TO must point to the Central Manager of Pool2, and FLOCK_FROM must be set in the Pool2's Central Manager pointing to the Submit nodes which can sent task to it.

BTW a hint at testing flocking I found useful. Set the Requirements to send a job to specific machine name.

 Thank you, I'll try it

You can also use condor_config_val to make sure the values you set are there.

I'll try this too :) 

JK 

 Thank you very much.

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Edier Alberto Zapata Hernández
Sent: Wednesday, June 16, 2010 8:33 PM
To: condor-users
Subject: [Condor-users] Not FLOCKing.

 

Good afternoon, today I setup 2 testbeds to try Condor's Flocking, the structure of them are:

TB1:

Central Manger pc56
Submit Node pc55
Submit/Execute Node pc54

TB2:

Central Manager with Submit pc50
Execute Node pc51
Execute Node pc52
Execute Node pc53

 

I added:

in the condor_config.local file of PC55

And:

 FLOCK_FROM=*.eisc.univalle.edu.co

in the computer PC50

 

I submitted a job from PC55 with this submit file:

##
# Test Submit File
##
# Use: condor_submit testTask.condor
should_transfer_files = Yes
when_to_transfer_output = ON_EXIT_OR_EVICT
Executable = /bin/hostname
Universe = vanilla
Output = hostOut.$(Process)
Error = hostErr.$(Process)
Queue 50

But all of the process ( 50 ) run in PC54 none flocked to TB2, anyone can give an idea why?

 

Thank you very much.

 

PD. Obviously I restarted the nodes after the changes.

 

Note: Before that, I tryed with no Execute nodes in TB1 and the jobs go to TB2 but I got a Failed to create hostOut.[0-49] file in TB2's nodes, because a Permission Denied error. The Condor's owner in both TB is the same, same password but can't access by SSH to any node, maybe this was the fail?

 

----
Edier Alberto Zapata Hernández
Est. Ingeniería de Sistemas
Universidad de Valle


--
Scanned by iCritical.



_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/




--
----
Edier Alberto Zapata Hernández
Est. Ingeniería de Sistemas
Universidad de Valle