[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Flocking problems during Schedd negotiation cycle



I modified the initial condor_config.local file as follows,

ALL_DEBUG = D_ALL
BIND_ALL_INTERFACES = False
NETWORK_INTERFACE = 192.168.253.2
FLOCK_TO = master02.demo02.org
FLOCK_FROM = $(FLOCK_TO)
ALLOW_ADVERTISE_SCHEDD = 192.168.*

this is configuration file for master01, for master02 the red three is changed for four.

Here is the error message that I get now

/12/14 21:08:37 (fd:11) (pid:957) (D_HOSTNAME) Destroying Daemon object:
08/12/14 21:08:37 (fd:11) (pid:957) (D_HOSTNAME) Type: 3 (schedd), Name: vagrant@xxxxxxxxxxxxxxxxxxx, Addr: <192.168.253.2:36999>
08/12/14 21:08:37 (fd:11) (pid:957) (D_HOSTNAME) FullHost: master01.demo01.org, Host: master01, Pool: (null), Port: -1
08/12/14 21:08:37 (fd:11) (pid:957) (D_HOSTNAME) IsLocal: N, IdStr: schedd vagrant@xxxxxxxxxxxxxxxxxxx, Error: (null)
08/12/14 21:08:37 (fd:11) (pid:957) (D_HOSTNAME)Â --- End of Daemon object info ---
08/12/14 21:08:37 (fd:11) (pid:957) (D_NETWORK) condor_write(fd=8 schedd vagrant@xxxxxxxxxxxxxxxxxxx,,size=155,timeout=30,flags=0,non_blocking=0)
08/12/14 21:08:37 (fd:11) (pid:957) (D_NETWORK) condor_write(fd=8 schedd vagrant@xxxxxxxxxxxxxxxxxxx,,size=13,timeout=30,flags=0,non_blocking=0)
08/12/14 21:08:37 (fd:11) (pid:957) (D_NETWORK) condor_read(fd=8 schedd vagrant@xxxxxxxxxxxxxxxxxxx,,size=5,timeout=30,flags=0,non_blocking=0)
08/12/14 21:08:37 (fd:11) (pid:957) (D_NETWORK) Stream::get(int) failed to read padding
08/12/14 21:08:37 (fd:11) (pid:957) (D_ALWAYS)ÂÂÂÂ Failed to get reply from schedd
08/12/14 21:08:37 (fd:11) (pid:957) (D_NETWORK) CLOSE <192.168.254.2:35295> fd=8
08/12/14 21:08:37 (fd:8) (pid:957) (D_ALWAYS)ÂÂ Error: Ignoring submitter for this cycle
08/12/14 21:08:37 (fd:8) (pid:957) (D_ALWAYS)Â negotiateWithGroup resources used scheddAds length 0
08/12/14 21:08:37 (fd:8) (pid:957) (D_ALWAYS) ---------- Finished Negotiation Cycle ----------


What does that [red] message mean?




On 12 August 2014 11:43, john alexander sanabria ordonez <john.sanabria@xxxxxxxxxxxxxxxxxxxxx> wrote:
hi,

I'm here again with my flocking issues. As follows, I am giving you some context.

I have created two virtual machines using Vagrant and for each VM I defined an additional private network interface.

master01[eth1] -> 192.168.253.2
master02[eth1] -> 192.168.254.2

Every virtual machine which is created with Vagrant has a NAT interface, eth0 -> 10.0.2.15. It is a Vagrant convention.

The aforementioned VMs are running Ubuntu(Precise64). For each VM I installed HTCondor from Debian repositories. Both VMs are able to run their own HTCondor jobs.

Now, I enabled flocking adding those lines in /etc/condor/condor_config.local
## master01
ALL_DEBUG = D_ALL
NETWORK_INTERFACE = 192.168.253.2
FLOCK_TO = master02.demo02.org
FLOCK_FROM = $(FLOCK_TO)
#ALLOW_ADVERTISE_SCHEDD = $

## master02
ALL_DEBUG = D_ALL
NETWORK_INTERFACE = 192.168.254.2
FLOCK_TO = master02.demo02.org
FLOCK_FROM = $(FLOCK_TO)
#ALLOW_ADVERTISE_SCHEDD = $(FLOCK_TO) $(CONDOR_HOST)
ALLOW_ADVERTISE_SCHEDD = *

I submit a job from master01 and checking the NegotiatorLog (@master02) I found this

08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) CONNECT bound to <10.0.2.15:39406> fd=8 peer=<192.168.253.2:33532>
08/12/14 16:21:02 (fd:9) (pid:877) (D_SECURITY) SECMAN: command 416 NEGOTIATE to schedd vagrant@xxxxxxxxxxxxxxxxxxx from TCP port 39406 (blocking).
08/12/14 16:21:02 (fd:9) (pid:877) (D_SECURITY) SECMAN: using session master01:994:1407860162:6 for {<192.168.253.2:33532>,<416>}.
08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) condor_write(fd=8 schedd vagrant@xxxxxxxxxxxxxxxxxxx,,size=625,timeout=30,flags=0,non_blocking=0)
08/12/14 16:21:02 (fd:9) (pid:877) (D_SECURITY) SECMAN: startCommand succeeded.
08/12/14 16:21:02 (fd:9) (pid:877) (D_HOSTNAME) Destroying Daemon object:
08/12/14 16:21:02 (fd:9) (pid:877) (D_HOSTNAME) Type: 3 (schedd), Name: vagrant@xxxxxxxxxxxxxxxxxxx, Addr: <192.168.253.2:33532>
08/12/14 16:21:02 (fd:9) (pid:877) (D_HOSTNAME) FullHost: master01.demo01.org, Host: master01, Pool: (null), Port: -1
08/12/14 16:21:02 (fd:9) (pid:877) (D_HOSTNAME) IsLocal: N, IdStr: schedd vagrant@xxxxxxxxxxxxxxxxxxx, Error: (null)
08/12/14 16:21:02 (fd:9) (pid:877) (D_HOSTNAME)Â --- End of Daemon object info ---
08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) condor_write(fd=8 schedd vagrant@xxxxxxxxxxxxxxxxxxx,,size=155,timeout=30,flags=0,non_blocking=0)
08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) condor_write(fd=8 schedd vagrant@xxxxxxxxxxxxxxxxxxx,,size=13,timeout=30,flags=0,non_blocking=0)
08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) condor_read(fd=8 schedd vagrant@xxxxxxxxxxxxxxxxxxx,,size=5,timeout=30,flags=0,non_blocking=0)
08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) Stream::get(int) failed to read padding
08/12/14 16:21:02 (fd:9) (pid:877) (D_ALWAYS)ÂÂÂÂ Failed to get reply from schedd
08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) CLOSE <10.0.2.15:39406> fd=8
08/12/14 16:21:02 (fd:8) (pid:877) (D_ALWAYS)ÂÂ Error: Ignoring submitter for this cycle
08/12/14 16:21:02 (fd:8) (pid:877) (D_ALWAYS)Â negotiateWithGroup resources used scheddAds length 0
08/12/14 16:21:02 (fd:8) (pid:877) (D_ALWAYS) ---------- Finished Negotiation Cycle ----------

As you can see a connection with the NAT interface (10.0.2.15) was created and that situation is causing the communication network problems.

How can I fix that? How can I force to HTCondor to use the eth1 and to forget the eth0 interface in this particular case?Â

Thank you very, very much for your help.

PS: I and other bunch of people are working in a national initiative (in Colombia) willing to share computational clusters between different institutions of our country using HTCondor.