[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] Fwd: [Medusa-users] ulimit -a



On Monday 29 September 2003 12:10 pm, Alexander Klyubin wrote:
> Actually, looking at the log entries in StarterLog I start to suspect
> that Condor indeed sets resource limits for job processes.
> For example, Windows machine's StarterLog:
>    9/29 15:08:57 Setting resource limits not implemented!
> Linux machine's StarterLog:
>    9/8 20:07:52 Done setting resource limits
>
> I wonder what limits it tries to set and where the settings governing
> this process are located.

Scott & everybody else:

Yeah.  After thinking about this some more, I know what the problem is.  
Condor, and processes started by Condor will completely ignore limits.conf.  
Completely.

The only solution that I can think of is to start the condor master (which 
you're running as root, I gather), from a script which looks something like:

#!/bin/sh
ulimit -n 4096
/path/to/condor/bin/condor_master

Does this work?

Now, another option that could work would be to replace your startd with a 
script.

Why does this work?  Because the master and the startd are both run as root 
that can crank up their limits.

Problems:  *Any* process startd by Condor will have these bigger limits.

Perhaps we should make Condor use PAM, but it currently doesn't.

Hope this helps.  Let me know.

-Nick

> Regards,
> Alexander Klyubin
>
> Scott Koranda wrote:
> > Hi Nick,
> >
> >>On Monday 29 September 2003 8:58 am, Scott Koranda wrote:
> >>>Hi,
> >>>
> >>>Hmmmm. The limit is not being set in /etc/profile. It is being set in
> >>>/etc/security/limits.conf, but perhaps your failure mechanism is the
> >>>correct one.
> >>
> >>Well...  I can think of a couple, probably all obvious things that you've
> >>thought of, but off the top of my head:
> >>
> >>1. Restart Condor after the change; the condor master process would be
> >> running with the original limits until it's restarted.  Actually, a
> >> condor_restart -master probably wouldn't do the trick, either.  This is
> >> because the last thing the master does on a restart is "exec
> >> condor_master", so the new master would inherit the ulimit from the
> >> original master.  :-(
> >
> > I am pretty sure that Condor has been restarted since the edits to
> > /etc/security/limits.conf were made, but just to be sure I am
> > restarting Condor now.
> >
> >>2. Doest the user that's running condor have it's own limit set, or is it
> >> set in one of the startup files?
> >
> > The limit is set in /etc/security/limits.conf. In this file
> >
> > /usr/share/doc/pam-0.75/txts/README.pam_limits
> >
> > I read the following:
> >
> > "Also, please note that all limit settings are set PER LOGIN.  They
> > are not global, nor are they permanent (they apply for the session
> > only)."
> >
> > Also I read in there:
> >
> > "No limits are imposed on UID 0 accounts."
> >
> > So I am guessing that since the Condor daemons run as root,
> > /etc/security/limits.conf is ignored, and so the limits are not passed
> > to the vanilla universe job.
> >
> > Sound correct?
> >
> > (I am not by any means a PAM expert so if you have any thoughts please
> > let me know...)
> >
> > Scott
> >
> >>>Thanks, I will dig deeper.
> >>>
> >>>Cheers,
> >>>
> >>>Scott
> >>
> >>-Nick
> >>
> >>>>Can it be that you changed the number of open files in a script, which
> >>>>does not get executed for user "nobody"? E.g., /etc/profile gets
> >>>>executed for normal users, but not for user "nobody" under which Condor
> >>>>normally runs jobs?
> >>>>
> >>>>Regards,
> >>>>Alexander Klyubin
> >>>>
> >>>>Scott Koranda wrote:
> >>>>>Hello,
> >>>>>
> >>>>>We recently changed the nodes in our cluster to allow 2048 open file
> >>>>>descriptors rather than the standard 1024. On any node in our cluster
> >>>>>I see the following:
> >>>>>
> >>>>>[skoranda@medusa-slave001 ]$ ulimit -n
> >>>>>2048
> >>>>>
> >>>>>But as the user below points out, when the ulimit is run via Condor in
> >>>>>the vanilla universe we always get 1024 and not 2048.
> >>>>>
> >>>>>Any ideas?
> >>>>>
> >>>>>Thanks,
> >>>>>
> >>>>>Scott
> >>>>>
> >>>>>----- Forwarded message from Vladimir Dergachev
> >>>>><volodya@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> -----
> >>>>>
> >>>>>Subject: [Medusa-users] ulimit -a
> >>>>>Date: Sun, 28 Sep 2003 20:15:41 -0400 (EDT)
> >>>>>
> >>>>>
> >>>>>It was a while since I needed to run my statistics generating program
> >>>>>that needs to open many files at once, and for some reason I can not
> >>>>> do it using condor.
> >>>>>
> >>>>>condor_run "ulimit -a" reports limit of open files as 1024, but when I
> >>>>>rsh to a node and run ulimit -a myself I get 2048 (as it should be
> >>>>>after recent changes).
> >>>>>
> >>>>>Would anyone have a suggestion how I can explain to condor not to
> >>>>> lower the limit ?
> >>>>>
> >>>>>                  thank you !
> >>>>>
> >>>>>                       Vladimir Dergachev
> >>>>>
> >>>>>
> >>>>>_______________________________________________
> >>>>>Medusa-users mailing list
>
> Condor Support Information:
> http://www.cs.wisc.edu/condor/condor-support/
> To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
> unsubscribe condor-users <your_email_address>

Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>