[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_interactive & condor_ssh_to_job & /usr/libexec/condor/condor_ssh_to_job_shell_setup & PID namespaces



On 2/28/2019 3:42 AM, Bert DeKnuydt wrote:
> 
> Hello all people Condorese,
> 

Hi Bert, thank you for sharing your thoughts!  More below inline...

> 1) There is still a thinko in 
> /usr/libexec/condor/condor_ssh_to_job_shell_setup if
> I understand things correctly.
> 
> *) There's a code snippet meant to kill the dummy sleep, killing 
> whatever is in
>  ÂÂ _CONDOR_JOB_PIDS, when the job is 'Interactive'.
> 
> *) However, if I launch an Interactive job, and later, possibly much 
> much later,
>  ÂÂ do a 'condor_ssh_to_job' to that Interactive Job, that code is ran 
> again and
>   a process with _CONDOR_JOB_PIDS is killed. That process can be 
> literally
>  ÂÂ anything, as pids can easily have rolled over by then.
> 
>  ÂÂ This is obviously unintentional and can pose a risk to other processes.
>  ÂÂ (Luckily it all runs with the users credentials only).
> 
> *) That could be fixed by assuring that the target of the kill is indeed 
> that
>  ÂÂ sleep; or alternatively by disallowing ssh_to_job to an interactive 
> job.
>   (But that is really used here though). Or just leave the sleep to 
> die by itself.
> 

You say it is common at your site to ssh_to_job to an interactive 
job.... I am curious, what is the use case motivation to do this (I 
could guess, but I'd rather hear your real-world scenario)?

This is indeed a use case we did not anticipate.

Weighing the available options, I am inclined to simply leave the sleep 
job to die by itself.  I guess the only downsides of this are a) 
potential user confusion --- if they do a ps they may wonder why they 
see some sleep process running, and b) the slot will remain claimed for 
a minimum of 3 minutes by default even if the user quits the interactive 
session in 5 seconds.  I think I can live with both of these down sides.

> 2) Apart from that, there's another inconsistency, when a startd runs 
> with PID
> isolation (i.e. : USE_PID_NAMESPACES = True); then neither ssh_to_job 
> nor a plain
> 'Interactive Job' really run under the PID namespace; only the dummy 
> sleep did.
> 
> In other words, for practical purposes, you should not allow ssh_to_job nor
> interactive jobs, if you really need PID isolation.
>

Yes, currently that is the case - batch jobs run in a pid namespace, 
interactive ssh session run in the global namespace.  The wisdom behind 
this is in this ticket (http://tinyurl.com/y398eugh) for those who 
really are interested.  Of course, there is isolation in the fact that 
each slot runs with the user credentials, or you can even configure each 
slot to use its own unique pid.

Thanks and regards,
Todd