Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_ssh_to_job to a flocked job

Date: Fri, 12 Aug 2022 13:32:39 +0200
From: Matthias Schnepf <matthias.schnepf@xxxxxxx>
Subject: Re: [HTCondor-users] condor_ssh_to_job to a flocked job

Hi Todd,

Thanks a lot for the answer.Â After setting a DAEMON_SOCKET_DIR to a directory, and sudo chmod 1777 `condor_config_val DAEMON_SOCKET_DIR`, it works fine now.

Best regards,

Matthias

On 8/11/22 21:00, Todd Tannenbaum wrote:

On 8/9/2022 9:50 AM, Matthias Schnepf wrote:

Hi

Hi all,
We have two HTCondor pools and flock jobs from one cluster to the other. The submit node runs with 9.1.2, while the worker nodes we flock to run 9.0.13. I'll try condor_ssh_to_job to a running flocked job at the other pool. The jobs run inside a docker container as user nobody.
When I use condor_ssh_to_job as root user on the submit machine, it works fine, and I'm inside the docker container. Independent of whom submitted the job.
When an ordinary user tries to ssh into a flocked job, it gets after a while, "Failed to connect to starter". condor_ssh_to_job works fine within the cluster the job was submitted.

I looked at the StarterLog (see below), and it seems that it gets stuck by ordinary users. After "Created security session for job owner", the starter queries docker regularly but nothing else. After "Created security session for job owner" condor runs a "docker exec -it ..." when the user root runs condor_ssh_to_job.

Could this be a problem with authentication? I did not find any security message in the logs that looks problematic.

Best regards,
Matthias

Hi Matthias,

Given the information you provided above, especially the clue about how it works fine if you run condor_ssh_to_job, I have a good guess about what is happening here.Â I am also guessing that your submit machine has firewall rules setup to deny incoming ephemeral ports, and you do not want to change your firewall rules. If so, my guess is you can get condor_ssh_to_job to work for regular users just as it does now for root by performing the following chmod command in your submit machine:
ÂÂÂ sudo chmod 1777 `condor_config_val DAEMON_SOCKET_DIR`

Take a look at the documentation in the Manual for config knob DAEMON_SOCKET_DIR here for an explanation about why this works:
Âhttps://htcondor.readthedocs.io/en/v9_0/admin-manual/configuration-macros.html#DAEMON_SOCKET_DIR

Feel free to follow-up with any questions.

Hope this helps,
Todd

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

References:
- [HTCondor-users] condor_ssh_to_job to a flocked job
  - From: Matthias Schnepf
- Re: [HTCondor-users] condor_ssh_to_job to a flocked job
  - From: Todd Tannenbaum

Prev by Date: [HTCondor-users] DAGMAN Workflow Assertion ERROR
Next by Date: [HTCondor-users] dags and max open files
Previous by thread: Re: [HTCondor-users] condor_ssh_to_job to a flocked job
Next by thread: [HTCondor-users] Async jobs notification callback
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] condor_ssh_to_job to a flocked job