On Wed, Aug 13, 2014 at 1:00 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
On 8/13/2014 7:13 AM, Keith Brown wrote:
so, what is the point of condor_ssh_job? if a user can start hundreds of
processes he can just ssh into his job and occupy slots indefinitely.
 there must be a way for an administrator to control access to

There is a fundamental misunderstanding in this thread...

When you condor_ssh_to_job, HTCondor starts up an sshd _in the same slot as the job_, and thus the ssh session is subject to all the same administrator policies as the batch job itself. It is NOT an arbitrary unmanaged ssh session - it is fully managed just as if it was part of the user's job itself. That is the beauty of it. For instance, the sshd is started underneath the condor_starter, will be killed along with anything started via that ssh session whenever the policy says to kill the batch job. It is also subject to any slot usage restrictions (memory, cpu, etc - i.e. if you are enforcing slot memory usage with cgroups, your ssh usage is included in the memory usage for that slot). Above Keith worries that a user could just ssh into his job and occupy all slots indefinitely - how is this different from a user submitting a batch job that runs forever? In both cases, you as the administrator could setup your PREEMPT _expression_ to kick off jobs after X amount of time, or a PREEMPTION_REQUIREMENTS _expression_ to kick off the job if the resource is desired by someone else, etc.

Think of it this way - imagine if HTCondor did not have ssh_to_job. Do you allow users provide their own executable for their job? Most likely you do. If so, a user could submit a shell script that that forks off their own sshd alongside alongside their job and get the same thing as condor_ssh_to_job.

We disable this feature at Fermilab and I would strongly suggest that any
other cluster do the same, it is an uncontrollable access hole. If you care
about security at all don't turn it on.
Steve Timm

Above one of the very rare instances where I disagree with Steve (99.9% of the time I am in full agreement w/ him!). But if you feel the same as Steve, you can disable condor_ssh_to_job fully or selectively via config knob ENABLE_SSH_TO_JOB, as per Dan's post earlier this morning.


i understand you can run arbitrary code on HTc when using condor_submit. That wasn't my concern at all. On our relatively large enterprise cluster, 6000+ cores, we are constantly battling resource problems. We have users who run Âjobs; interactive and batch Âwhich take 7-8 hours. I wanted to avoid people submitting thousands of jobs AND then condor_ssh_job to the job and run more jobs taking up the slot indefinitely. When they condor_ssh_job they are circumventing fair scheduling. Ideally, I would like the user to run very basic commands such as 'top','vmstat' and force timeout after the condor_ssh_job.

ClientAliveInterval 600 ClientAliveCountMax 3

(30 minute sshd timeout)

Also, how does one monitor if a user has condor_ssh_job? I would like to generate a report for that.