[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_ssh_to_job



On 8/14/2014 6:42 AM, Keith Brown wrote:
i understand you can run arbitrary code on HTc when using condor_submit.
That wasn't my concern at all.

Ok...

I wanted to
avoid people submitting thousands of jobs AND then condor_ssh_job to the
job and run more jobs taking up the slot indefinitely. When they
condor_ssh_job they are circumventing fair scheduling.

If I understand you correctly you are saying, for instance, that a user could submit a job requesting 1 cpu core, and then when the job starts, they could ssh_to_job to that slot and start up 10 more processes, thus using 11 cores when they were only fairly scheduled for 1 core. Does that capture your concern? If so, condor_ssh_to_job has nothing to do with this issue; after all, the user could simply submit a shell script that starts up 11 instances of their program without ssh_to_job. At the core of your concern is users using more resources than they were allocated in the execution slot. HTCondor has a wealth of mechanisms you can enable to address that concern. An overview of them can be found in the HTCondor Week presentation at

http://research.cs.wisc.edu/htcondor/HTCondorWeek2013/presentations/ThainG_BoxingUsers.pdf
For instance, if you enable the cgroup (Linux container) support in HTCondor, then if a user is allocated a slot with 1 cpu core and 1 GB or RAM, that is all they will be able to use regardless of how many processes they start up (via ssh_to_job or not). Even if they ssh_to_job and start up 50 more processes, all 50 processes will timeshare the one cpu core allocated to the slot that was scheduled for them - there will be no impact on other users of the system. I recommend using cgroup support in HTCondor if you are running on a recent Linux distro (i.e. RedHat 6.5 or equivalent), and if you are using an older Linux and cannot upgrade, look at HTCondors CPU affinity mechanism.

As for taking up the slot indefinitely - as I stated in my post yesterday in this thread, all processes, regardless of if they are launched by the job or via ssh_to_job, follow the administrator policy for the slot. In other words, users can only take up a slot indefinitely if your startd policy in the condor_config file allows them to do so.

If we are still talking past each other and/or I am failing to understand your concern, feel free to send me a phone number to my personal email address (tannenba@xxxxxxxxxxx) and I'll give you a call.

regards,
Todd