[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] Fwd: [Medusa-users] ulimit -a



Hi,

> On Monday 29 September 2003 12:10 pm, Alexander Klyubin wrote:
> > Actually, looking at the log entries in StarterLog I start to suspect
> > that Condor indeed sets resource limits for job processes.
> > For example, Windows machine's StarterLog:
> >    9/29 15:08:57 Setting resource limits not implemented!
> > Linux machine's StarterLog:
> >    9/8 20:07:52 Done setting resource limits
> >
> > I wonder what limits it tries to set and where the settings governing
> > this process are located.
> 
> Scott & everybody else:
> 
> Yeah.  After thinking about this some more, I know what the problem is.  
> Condor, and processes started by Condor will completely ignore limits.conf.  
> Completely.
> 
> The only solution that I can think of is to start the condor master (which 
> you're running as root, I gather), from a script which looks something like:
> 
> #!/bin/sh
> ulimit -n 4096
> /path/to/condor/bin/condor_master
> 
> Does this work?

I will try this later by just editing /etc/init.d/condor. Right now
there are a bunch of jobs running so I have to wait until later. I
will let you know.

> 
> Now, another option that could work would be to replace your startd with a 
> script.
> 
> Why does this work?  Because the master and the startd are both run as root 
> that can crank up their limits.
> 
> Problems:  *Any* process startd by Condor will have these bigger limits.

This is fine for us. We have edited limits.conf for all users anyway.

> 
> Perhaps we should make Condor use PAM, but it currently doesn't.

I personally don't think this is a high priority. In fact I wish our
user would find another way to write his code. He shouldn't need to
open 2048 files all at once...

Thanks,

Scott

> 
> Hope this helps.  Let me know.
> 
> -Nick
> 
> > Regards,
> > Alexander Klyubin
> >
> > Scott Koranda wrote:
> > > Hi Nick,
> > >
> > >>On Monday 29 September 2003 8:58 am, Scott Koranda wrote:
> > >>>Hi,
> > >>>
> > >>>Hmmmm. The limit is not being set in /etc/profile. It is being set in
> > >>>/etc/security/limits.conf, but perhaps your failure mechanism is the
> > >>>correct one.
> > >>
> > >>Well...  I can think of a couple, probably all obvious things that you've
> > >>thought of, but off the top of my head:
> > >>
> > >>1. Restart Condor after the change; the condor master process would be
> > >> running with the original limits until it's restarted.  Actually, a
> > >> condor_restart -master probably wouldn't do the trick, either.  This is
> > >> because the last thing the master does on a restart is "exec
> > >> condor_master", so the new master would inherit the ulimit from the
> > >> original master.  :-(
> > >
> > > I am pretty sure that Condor has been restarted since the edits to
> > > /etc/security/limits.conf were made, but just to be sure I am
> > > restarting Condor now.
> > >
> > >>2. Doest the user that's running condor have it's own limit set, or is it
> > >> set in one of the startup files?
> > >
> > > The limit is set in /etc/security/limits.conf. In this file
> > >
> > > /usr/share/doc/pam-0.75/txts/README.pam_limits
> > >
> > > I read the following:
> > >
> > > "Also, please note that all limit settings are set PER LOGIN.  They
> > > are not global, nor are they permanent (they apply for the session
> > > only)."
> > >
> > > Also I read in there:
> > >
> > > "No limits are imposed on UID 0 accounts."
> > >
> > > So I am guessing that since the Condor daemons run as root,
> > > /etc/security/limits.conf is ignored, and so the limits are not passed
> > > to the vanilla universe job.
> > >
> > > Sound correct?
> > >
> > > (I am not by any means a PAM expert so if you have any thoughts please
> > > let me know...)
> > >
> > > Scott
> > >
> > >>>Thanks, I will dig deeper.
> > >>>
> > >>>Cheers,
> > >>>
> > >>>Scott
> > >>
> > >>-Nick
> > >>
> > >>>>Can it be that you changed the number of open files in a script, which
> > >>>>does not get executed for user "nobody"? E.g., /etc/profile gets
> > >>>>executed for normal users, but not for user "nobody" under which Condor
> > >>>>normally runs jobs?
> > >>>>
> > >>>>Regards,
> > >>>>Alexander Klyubin
> > >>>>
> > >>>>Scott Koranda wrote:
> > >>>>>Hello,
> > >>>>>
> > >>>>>We recently changed the nodes in our cluster to allow 2048 open file
> > >>>>>descriptors rather than the standard 1024. On any node in our cluster
> > >>>>>I see the following:
> > >>>>>
> > >>>>>[skoranda@medusa-slave001 ]$ ulimit -n
> > >>>>>2048
> > >>>>>
> > >>>>>But as the user below points out, when the ulimit is run via Condor in
> > >>>>>the vanilla universe we always get 1024 and not 2048.
> > >>>>>
> > >>>>>Any ideas?
> > >>>>>
> > >>>>>Thanks,
> > >>>>>
> > >>>>>Scott
> > >>>>>
> > >>>>>----- Forwarded message from Vladimir Dergachev
> > >>>>><volodya@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> -----
> > >>>>>
> > >>>>>Subject: [Medusa-users] ulimit -a
> > >>>>>Date: Sun, 28 Sep 2003 20:15:41 -0400 (EDT)
> > >>>>>
> > >>>>>
> > >>>>>It was a while since I needed to run my statistics generating program
> > >>>>>that needs to open many files at once, and for some reason I can not
> > >>>>> do it using condor.
> > >>>>>
> > >>>>>condor_run "ulimit -a" reports limit of open files as 1024, but when I
> > >>>>>rsh to a node and run ulimit -a myself I get 2048 (as it should be
> > >>>>>after recent changes).
> > >>>>>
> > >>>>>Would anyone have a suggestion how I can explain to condor not to
> > >>>>> lower the limit ?
> > >>>>>
> > >>>>>                  thank you !
> > >>>>>
> > >>>>>                       Vladimir Dergachev
> > >>>>>
> > >>>>>
> > >>>>>_______________________________________________
> > >>>>>Medusa-users mailing list
> >
> > Condor Support Information:
> > http://www.cs.wisc.edu/condor/condor-support/
> > To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
> > unsubscribe condor-users <your_email_address>
> 
> Condor Support Information:
> http://www.cs.wisc.edu/condor/condor-support/
> To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
> unsubscribe condor-users <your_email_address>
Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>