[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] OS unable to allocate memory to job when run under condor



i've looked into that but I'm new to ulimit so perhaps i'm missing something.

first, about the machines. I'm running on ec2. for these tests i've been just running on a single machine. I've tested para & HVM architected machines seperately.

logged in via ssh, command line:

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 59623
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 59623
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited


when i condor_ssh_to_job:

ulimit -a
core file size          (blocks, -c) 1317074
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 59623
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 59623
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

SO there are differences (core file size) but not max memory size, data seg size, or virtual memory. 


I turned on wine debugging. This causes the process to runs in slowmotion. So we can examine /proc/pid/limits for the wine process:

root@master:/proc/11457# cat limits 
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            unlimited            unlimited            bytes     
Max core file size        0                    0                    bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             59623                59623                processes 
Max open files            4096                 4096                 files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       59623                59623                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        


From within the condor job - from stata, which launches the wine process via stata's shell command):


SO it would appear there is no applicable linux memory limit. 


any additional thoughts?

jason





On Mar 28, 2012, at 3:47 PM, Ian Chesal wrote:

On Monday, 26 March, 2012 at 7:30 PM, jason herman wrote:
So the question is what could be preventing the OS from allocating memory to a process that a condor job forked via shell?
Could be a different set of system limits are being applied when jobs are run via Condor.

What does the following report for the shell from where you run your jobs manually:

ulimit -a

And what does that same command report when you run it as a Condor job the same way you run your application?

Also: are your machines homogenous? Are the machines where you run your commands by hand the same (RAM, disk, CPU, etc.) as the machines where jobs run under Condor's control?

Regards,
- Ian

---
Ian Chesal

Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/