[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] Fwd: [Medusa-users] ulimit -a



Actually, looking at the log entries in StarterLog I start to suspect that Condor indeed sets resource limits for job processes.
For example, Windows machine's StarterLog:
9/29 15:08:57 Setting resource limits not implemented!
Linux machine's StarterLog:
9/8 20:07:52 Done setting resource limits


I wonder what limits it tries to set and where the settings governing this process are located.

Regards,
Alexander Klyubin

Scott Koranda wrote:
Hi Nick,


On Monday 29 September 2003 8:58 am, Scott Koranda wrote:

Hi,

Hmmmm. The limit is not being set in /etc/profile. It is being set in
/etc/security/limits.conf, but perhaps your failure mechanism is the
correct one.

Well... I can think of a couple, probably all obvious things that you've thought of, but off the top of my head:


1. Restart Condor after the change; the condor master process would be running with the original limits until it's restarted. Actually, a condor_restart -master probably wouldn't do the trick, either. This is because the last thing the master does on a restart is "exec condor_master", so the new master would inherit the ulimit from the original master. :-(


I am pretty sure that Condor has been restarted since the edits to
/etc/security/limits.conf were made, but just to be sure I am
restarting Condor now.


2. Doest the user that's running condor have it's own limit set, or is it set in one of the startup files?


The limit is set in /etc/security/limits.conf. In this file

/usr/share/doc/pam-0.75/txts/README.pam_limits

I read the following:

"Also, please note that all limit settings are set PER LOGIN.  They
are not global, nor are they permanent (they apply for the session
only)."

Also I read in there:

"No limits are imposed on UID 0 accounts."

So I am guessing that since the Condor daemons run as root,
/etc/security/limits.conf is ignored, and so the limits are not passed
to the vanilla universe job.

Sound correct?

(I am not by any means a PAM expert so if you have any thoughts please
let me know...)

Scott


Thanks, I will dig deeper.

Cheers,

Scott

-Nick



Can it be that you changed the number of open files in a script, which
does not get executed for user "nobody"? E.g., /etc/profile gets
executed for normal users, but not for user "nobody" under which Condor
normally runs jobs?

Regards,
Alexander Klyubin

Scott Koranda wrote:

Hello,

We recently changed the nodes in our cluster to allow 2048 open file
descriptors rather than the standard 1024. On any node in our cluster
I see the following:

[skoranda@medusa-slave001 ]$ ulimit -n
2048

But as the user below points out, when the ulimit is run via Condor in
the vanilla universe we always get 1024 and not 2048.

Any ideas?

Thanks,

Scott

----- Forwarded message from Vladimir Dergachev
<volodya@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> -----

Subject: [Medusa-users] ulimit -a
Date: Sun, 28 Sep 2003 20:15:41 -0400 (EDT)


It was a while since I needed to run my statistics generating program that needs to open many files at once, and for some reason I can not do it using condor.

condor_run "ulimit -a" reports limit of open files as 1024, but when I
rsh to a node and run ulimit -a myself I get 2048 (as it should be
after recent changes).

Would anyone have a suggestion how I can explain to condor not to lower
the limit ?

thank you !

Vladimir Dergachev


_______________________________________________ Medusa-users mailing list

Condor Support Information: http://www.cs.wisc.edu/condor/condor-support/ To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with unsubscribe condor-users <your_email_address>