[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_submit -addr doesn't work when sched is behind a shared port



Where did you get that address?

When the schedd is behind a shared port its address will include a shared port id

using the keyword sock=<port-id>.

 

something like this.

 

<172.1.3.3:9618?addrs=172.1.3.3-9618&noUDP&sock=5044_80fc_5>

 

-tj

 

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of robert smith via HTCondor-users
Sent: Wednesday, July 4, 2018 3:43 PM
To: htcondor-users@xxxxxxxxxxx
Cc: robert smith <robertdavidsmith@xxxxxxxxx>
Subject: [HTCondor-users] condor_submit -addr doesn't work when sched is behind a shared port

 

Hi,

 

I can't get condor_submit -addr to work when condor_schedd is behind a condor_shared_port.

 

Output from condor_submit is below

 

sh-4.2$ condor_submit -debug -addr "<172.1.3.3:9618>" job.sub

Submitting job(s)07/04/18 10:06:49 condor_read() failed: recv(fd=4) returned -1, errno = 104 Connection reset by peer, reading 5 bytes from schedd at <172.1.3.3:9618>.

07/04/18 10:06:49 IO: Failed to read packet header

07/04/18 10:06:49 SECMAN: no classad from server, failing

 

ERROR: Failed to connect to local queue manager

SECMAN:2007:Failed to end classad message.

 

Error message written to /var/log/condor/SharedPortLog in the schedd container is 

 

07/04/18 10:06:49 SharedPortServer: server was busy, failed to connect collector as requested by <172.1.3.3:46528>: primary (7d2cc1f5fc7f6a4e2eb39facb9bb27877fdd809e4b7fa28fd830cd99c77172ee/collector): Connection refused (111); alt (/var/lock/condor/daemon_sock/collector): Connection refused (111)

 

Nothing is written to /var/log/condor/SchedLog

 

Why is condor_submit even trying to access the collector when -addr is meant to tell it to connect straight to the sched? Is there is a bug in condor_submit that means it asks the shared_port_daemon to connect to the the collector, not the sched, even when the -addr option it set?

 

Everything works fine when sched isn't running behind a condor_shared_port, so I've worked round this issue by simply not using a shared port.

 

Relevant versions are

 

sh-4.2$ condor_version

$CondorVersion: 8.6.11 May 10 2018 BuildID: 440910 $

$CondorPlatform: x86_64_RedHat7 $

 

Relevant files are 

 

sh-4.2$ cat job.sub

should_transfer_files = YES

when_to_transfer_output = ON_EXIT

Universe = vanilla

Executable = /bin/bash

Arguments = test.sh

Log = job.log

Output = job.out

Error = job.error

transfer_input_files = test.sh

Queue

sh-4.2$

sh-4.2$

sh-4.2$ cat test.sh

echo Starting test.sh

whoami

id

hostname

/usr/sbin/ip a

echo Ending test.sh

sh-4.2$

 

I'm running HTCondor in a container on Kubernetes, but doubt that is relevant to this problem.

 

Thanks,

 

Rob