[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] condor_ssh_to_job to a flocked job
- Date: Thu, 11 Aug 2022 14:00:53 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] condor_ssh_to_job to a flocked job
On 8/9/2022 9:50 AM, Matthias Schnepf
We have two HTCondor pools and flock jobs from one cluster to
the other. The submit node runs with 9.1.2, while the worker
nodes we flock to run 9.0.13. I'll try condor_ssh_to_job to a
running flocked job at the other pool. The jobs run inside a
docker container as user nobody.
When I use condor_ssh_to_job as root user on the submit machine,
it works fine, and I'm inside the docker container. Independent
of whom submitted the job.
When an ordinary user tries to ssh into a flocked job, it gets
after a while, "Failed to connect to starter". condor_ssh_to_job
works fine within the cluster the job was submitted.
I looked at the StarterLog (see below), and it seems that it
gets stuck by ordinary users. After "Created security session
for job owner", the starter queries docker regularly but nothing
else. After "Created security session for job owner" condor runs
a "docker exec -it ..." when the user root runs
Could this be a problem with authentication? I did not find any
security message in the logs that looks problematic.
Given the information you provided above, especially the clue about
how it works fine if you run condor_ssh_to_job, I have a good guess
about what is happening here. I am also guessing that your submit
machine has firewall rules setup to deny incoming ephemeral ports,
and you do not want to change your firewall rules. If so, my guess
is you can get condor_ssh_to_job to work for regular users just as
it does now for root by performing the following chmod command in
your submit machine:
sudo chmod 1777 `condor_config_val DAEMON_SOCKET_DIR`
Take a look at the documentation in the Manual for config knob
DAEMON_SOCKET_DIR here for an explanation about why this works:
Feel free to follow-up with any questions.
Hope this helps,