[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Max Number of Jobs Submission



On 3/22/07, Natarajan, Senthil <senthil@xxxxxxxx> wrote:
Hi,
Thanks for the info.
Here Central manager is the only dedicated submit node. And also runs
job.


This is a bad idea on many levels including security.

I strongly suggest never allowing jobs to run on the submit machine. a
runaway job could take out the whole farm in quite a few ways
(resource starvation from file handles, disk space, memory the list
goes on and while you can plug each one you don't want to have to)

Secondly jobs running on that machine would have a variety of ways of
exploiting the increased permissions of the box (again controllable
but easy to miss)

Also I suggest (slightly less strongly but still pretty strongly) that
you should consider either splitting your submissions across multiple
machines or going for a High Availability solution (if you can handle
the increased complexity this would provide very solid stability) if
most of your jobs could last several hours or more.
An error on this machine or a forced reboot could total all the cpu
cycles expended across 150 nodes.
Consider this cost and the risk of failure/downtime carefully...

Matt