Perhaps my original post didn't contain enough detail so here goes...
I'm using condor via dagman on Ubuntu 11.10 on a powerful machine with 8GB ram. The job is a stata program that calls another program via shell. The stata program is well tested and works fine. The condor job created via dagman works fine under a variety of parameterizations. However when I increase the size of the dataset to 40MB the job fails. If i rerun the job but the replace the dataset file everything works fine.
I have traced the point of failure to a denied request for 300MB of memory made by executable stata calls. The weird thing is that if I manually execute the condor job (go into the InitialDir and execute the executable with args straight from the submit file) everything works. It's clear that memory is not an issue, for I can run 4+ of these jobs simultaneously without a hitch (even while condor is attempting to do its thing.)
So the question is: what is different about the executing environment when i run the job manually versus when condor executes the job that could effect OS memory allocation in such a limiting way?
The executable that stata calls is a windows executable being called via wine1.3. That should limit memory to 2GB.
I've tried everything I can think of. Thing is the condor job doesn't fail (stata doesn't return an error value under these circumstances) - but there must be some difference from the OS's perspective for it to refuse the memory allocation.
I looked into memory limits (and can't find any):
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 488497
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 488497
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Anyone have any ideas??
[Condor-users] OS unable to allocate memory to job when run under condorhttps://lists.cs.wisc.edu/archive/condor-users/2012-March/msg00121.shtml