[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor on multiple network interfaces (GCB Problem)



Hi Nicolas,

I was wondering is your GCB setup is working now.

 

I am trying to setup GCB for the machines behind NAT. And I could not see those machines (running behind NAT) in the central manager condor_status, neither the GCB machine (where only condor_master is running).

 

Since I could not see those NAT behind machines, I could not force the job to run on these machines just to test, by specifying the machine name in the requirements.

 

If I try to submit the job from the private network machine (behind NAT), I am getting this error, could not transfer the executable file.

ERROR: failed to transfer executable file test.sh

 

StartLog (in job submitting machine)

**********

7/24 08:23:10 GCB: ERROR "GCB_bind: binding the socket locally failed" errno 98: Address already in use

7/24 08:23:10 GCB: ERROR "GCB_bind: binding the socket locally failed" errno 98: Address already in use

7/24 08:23:10 GCB: ERROR "GCB_bind: binding the socket locally failed" errno 98: Address already in use

 

SchedLog (in job submitting machine)

**********

7/24 07:53:53 (pid:1834) GCB: ERROR "GCB_bind: binding the socket locally failed" errno 98: Address already in use

7/24 07:53:53 (pid:1834) GCB: ERROR "GCB_bind: binding the socket locally failed" errno 98: Address already in use

7/24 08:50:21 (pid:1834) get_file: Zero-length file check failed!

7/24 08:50:21 (pid:1834) Failed to receive file from client in SendSpoolFile.

 

It looks like public network machine (Central Manager), GCB Broker and GCB clients are not properly communicating.

 

Do you know what might be the problem?

 

Thanks,

Senthil

 


From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Nicholas Lavigne
Sent: Friday, July 20, 2007 10:20 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Condor on multiple network interfaces

 

I've installed the GCB broker, and from what I can tell from the logs, it is running correctly.  I have also configured the clients as explained in that document, but when trying to run jobs, it seems like I have made absolutely no progress.  The central manager (which is also the network boundary) is able to match the job to the execute machine, but the job does not run, and the execute machine's status stays "Unclaimed".

Is there anyone with a similar setup that would be willing to share their details (condor_config files) ?

On 7/19/07, Dan Bradley < dan@xxxxxxxxxxxx> wrote:


Condor requires bidirectional connectivity between the submit node and
the execute node.  In other words, Condor must be able to open up
network connections to the execute machine from the submit machine and
also to the submit machine from the execute machine.

If connections can only be made in one direction in your network ( e.g.
from private to public), then you can configure condor to use GCB to
broker connections in the reverse direction.  There's a section in the
manual about that:

http://www.cs.wisc.edu/condor/manual/v6.8/3_7Networking.html#SECTION00473000000000000000

--Dan

Nicholas Lavigne wrote:

> Thanks for the reply.  I now have all of the machines appearing in the
> pool (as reported by condor_status) but I have a new problem.  I
> *think* I understand the problem, but as of yet the solution is
> evading me....
>
> Our network is mostly Windows and so the vanilla universe is my
> primary concern.  Now, submitting a job from the public network, the
> central manager is able to match the job to a machine on the private
> network, but the job does not run, presumably because Condor's file
> transfer mechanism does not know how to transfer the file from the
> public network to the private network.
>
> I know that other pools are using a similar type of setup and so there
> must be a solution to this problem.  I am not currently using a
> network file system, could this be the answer?
>
> -Nicholas
>
>
> On 7/18/07, *Tomas Grigera* <tgrigera@xxxxxxxxxxxxxxxxxx
> <mailto: tgrigera@xxxxxxxxxxxxxxxxxx>> wrote:
>
>     Hi,
>
>     I use a similar setup. I have
>
>     BIND_ALL_INTERFACES = TRUE
>
>
>     But you must make sure the server name resolves to the public IP also
>     for the internal machines.
>
>     Tomas
>
>     On 7/17/07, Nicholas Lavigne < condor.list@xxxxxxxxxxxxxx
>     <mailto:condor.list@xxxxxxxxxxxxxx>> wrote:
>     > Due to a shortage of allocated IP addresses on our university's
>     network, we
>     > have decided to use the central manager machine (running Debian)
>     as a
>     > gateway with two network interfaces and place some compute nodes
>     on a
>     > sub-network behind it.  The router is doing its job correctly,
>     but the
>     > machines on the subnet do not seem to appear in the Condor pool.
>     >
>     > Are there any general rules for having Condor listen on two network
>     > interfaces?  Maybe some modification to the HOSTALLOW_READ and
>     > HOSTALLOW_WRITE variables on the central manager?  Currently,
>     >
>     > HOSTALLOW_READ = *
>     > HOSTALLOW_WRITE = 134.95.*
>     >
>     > But I would like Condor to listen on the 192.168.10.* subnet as
>     well.
>     >
>     > Any suggestions?
>     >
>     >
>     > Thanks,
>     > Nicholas Lavigne
>     > University of Cologne
>     > Graduiertenkolleg Risikomanagement
>     >
>     >
>     > _______________________________________________
>     > Condor-users mailing list
>     > To unsubscribe, send a message to
>     > condor-users-request@xxxxxxxxxxx
>     <mailto: condor-users-request@xxxxxxxxxxx> with a
>     > subject: Unsubscribe
>     > You can also unsubscribe by visiting
>     > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>     >
>     > The archives can be found at:
>     > https://lists.cs.wisc.edu/archive/condor-users/
>     >
>     >
>     _______________________________________________
>     Condor-users mailing list
>     To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
>     <mailto:condor-users-request@xxxxxxxxxxx> with a
>     subject: Unsubscribe
>     You can also unsubscribe by visiting
>     https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>     The archives can be found at:
>     https://lists.cs.wisc.edu/archive/condor-users/
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Condor-users mailing list
>To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>subject: Unsubscribe
>You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>The archives can be found at:
>https://lists.cs.wisc.edu/archive/condor-users/
>
>
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/