[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Flocking problems during Schedd negotiation cycle


I'm here again with my flocking issues. As follows, I am giving you some context.

I have created two virtual machines using Vagrant and for each VM I defined an additional private network interface.

master01[eth1] ->
master02[eth1] ->

Every virtual machine which is created with Vagrant has a NAT interface, eth0 -> It is a Vagrant convention.

The aforementioned VMs are running Ubuntu(Precise64). For each VM I installed HTCondor from Debian repositories. Both VMs are able to run their own HTCondor jobs.

Now, I enabled flocking adding those lines in /etc/condor/condor_config.local
## master01
FLOCK_TO = master02.demo02.org

## master02
FLOCK_TO = master02.demo02.org

I submit a job from master01 and checking the NegotiatorLog (@master02) I found this

08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) CONNECT bound to <> fd=8 peer=<>
08/12/14 16:21:02 (fd:9) (pid:877) (D_SECURITY) SECMAN: command 416 NEGOTIATE to schedd vagrant@xxxxxxxxxxxxxxxxxxx from TCP port 39406 (blocking).
08/12/14 16:21:02 (fd:9) (pid:877) (D_SECURITY) SECMAN: using session master01:994:1407860162:6 for {<>,<416>}.
08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) condor_write(fd=8 schedd vagrant@xxxxxxxxxxxxxxxxxxx,,size=625,timeout=30,flags=0,non_blocking=0)
08/12/14 16:21:02 (fd:9) (pid:877) (D_SECURITY) SECMAN: startCommand succeeded.
08/12/14 16:21:02 (fd:9) (pid:877) (D_HOSTNAME) Destroying Daemon object:
08/12/14 16:21:02 (fd:9) (pid:877) (D_HOSTNAME) Type: 3 (schedd), Name: vagrant@xxxxxxxxxxxxxxxxxxx, Addr: <>
08/12/14 16:21:02 (fd:9) (pid:877) (D_HOSTNAME) FullHost: master01.demo01.org, Host: master01, Pool: (null), Port: -1
08/12/14 16:21:02 (fd:9) (pid:877) (D_HOSTNAME) IsLocal: N, IdStr: schedd vagrant@xxxxxxxxxxxxxxxxxxx, Error: (null)
08/12/14 16:21:02 (fd:9) (pid:877) (D_HOSTNAME)Â --- End of Daemon object info ---
08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) condor_write(fd=8 schedd vagrant@xxxxxxxxxxxxxxxxxxx,,size=155,timeout=30,flags=0,non_blocking=0)
08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) condor_write(fd=8 schedd vagrant@xxxxxxxxxxxxxxxxxxx,,size=13,timeout=30,flags=0,non_blocking=0)
08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) condor_read(fd=8 schedd vagrant@xxxxxxxxxxxxxxxxxxx,,size=5,timeout=30,flags=0,non_blocking=0)
08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) Stream::get(int) failed to read padding
08/12/14 16:21:02 (fd:9) (pid:877) (D_ALWAYS)ÂÂÂÂ Failed to get reply from schedd
08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) CLOSE <> fd=8
08/12/14 16:21:02 (fd:8) (pid:877) (D_ALWAYS)ÂÂ Error: Ignoring submitter for this cycle
08/12/14 16:21:02 (fd:8) (pid:877) (D_ALWAYS)Â negotiateWithGroup resources used scheddAds length 0
08/12/14 16:21:02 (fd:8) (pid:877) (D_ALWAYS) ---------- Finished Negotiation Cycle ----------

As you can see a connection with the NAT interface ( was created and that situation is causing the communication network problems.

How can I fix that? How can I force to HTCondor to use the eth1 and to forget the eth0 interface in this particular case?Â

Thank you very, very much for your help.

PS: I and other bunch of people are working in a national initiative (in Colombia) willing to share computational clusters between different institutions of our country using HTCondor.