[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] TasksMax in default unit file sometimes too low



Hi Bert,

Probably will have to be a lot more aggressive than this: since the schedd uses process separation (one shadow per running job), we'll need to greatly increase this for very large schedds.

Thanks for the report,

Brian

> On Dec 20, 2016, at 2:31 AM, Bert DeKnuydt <Bert.Deknuydt@xxxxxxxxxxxxxxxx> wrote:
> 
> 
> Dear Condor breeders,
> 
> I've seen a problem whereby HTCondor runs out of processes on a fat machine.
> 
> Problem is, that the default unit file 'condor.service' for Systemd does not
> specify a value for TasksMax, and the default value is (at least on Fedora)
> 512.  TasksMax is the maximum number of processes a daemon can start under
> Systemd; useful to prevent fork-bombs.
> 
> So if you have a fat machine with say 64 CPUs and dynamic slots, you can in the
> worst case only have 512/64 or 8 processes per 1-CPU slot.  That's really not
> enough.
> 
> So I think that 'TasksMax' should, for HTCondor, be a function of the number
> of CPUs configured. I've set it at 32 * nrcpus and everyone is happy
> here.
> 
> Midwinter greetings, Bert.
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/