[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_submit -addr doesn't work when sched is behind a shared port

Where did you get that address?

When the schedd is behind a shared port its address will include a shared port id

using the keyword sock=<port-id>.


something like this.






From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of robert smith via HTCondor-users
Sent: Wednesday, July 4, 2018 3:43 PM
To: htcondor-users@xxxxxxxxxxx
Cc: robert smith <robertdavidsmith@xxxxxxxxx>
Subject: [HTCondor-users] condor_submit -addr doesn't work when sched is behind a shared port




I can't get condor_submit -addr to work when condor_schedd is behind a condor_shared_port.


Output from condor_submit is below


sh-4.2$ condor_submit -debug -addr "<>" job.sub

Submitting job(s)07/04/18 10:06:49 condor_read() failed: recv(fd=4) returned -1, errno = 104 Connection reset by peer, reading 5 bytes from schedd at <>.

07/04/18 10:06:49 IO: Failed to read packet header

07/04/18 10:06:49 SECMAN: no classad from server, failing


ERROR: Failed to connect to local queue manager

SECMAN:2007:Failed to end classad message.


Error message written to /var/log/condor/SharedPortLog in the schedd container is 


07/04/18 10:06:49 SharedPortServer: server was busy, failed to connect collector as requested by <>: primary (7d2cc1f5fc7f6a4e2eb39facb9bb27877fdd809e4b7fa28fd830cd99c77172ee/collector): Connection refused (111); alt (/var/lock/condor/daemon_sock/collector): Connection refused (111)


Nothing is written to /var/log/condor/SchedLog


Why is condor_submit even trying to access the collector when -addr is meant to tell it to connect straight to the sched? Is there is a bug in condor_submit that means it asks the shared_port_daemon to connect to the the collector, not the sched, even when the -addr option it set?


Everything works fine when sched isn't running behind a condor_shared_port, so I've worked round this issue by simply not using a shared port.


Relevant versions are


sh-4.2$ condor_version

$CondorVersion: 8.6.11 May 10 2018 BuildID: 440910 $

$CondorPlatform: x86_64_RedHat7 $


Relevant files are 


sh-4.2$ cat job.sub

should_transfer_files = YES

when_to_transfer_output = ON_EXIT

Universe = vanilla

Executable = /bin/bash

Arguments = test.sh

Log = job.log

Output = job.out

Error = job.error

transfer_input_files = test.sh




sh-4.2$ cat test.sh

echo Starting test.sh




/usr/sbin/ip a

echo Ending test.sh



I'm running HTCondor in a container on Kubernetes, but doubt that is relevant to this problem.