[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] out-of-memory issues in parallel universe



I understand this solution, but not all my users do :->

As I understand your response, these properties will be considered for all nodes on which the job is run ... is that the case?

I'm also/instead looking for a solution to enforce memory limits at runtime.

It looks as if a USER_JOB_WRAPPER with a ulimit line is the solution here. Does that jibe with what others have done?

rob



On Mar 17, 2008, at 11:36 AM, Greg Thain wrote:


Is there some way of specifying the image size, and restricting jobs
to larger memory compute nodes, for MPI jobs submitted in the parallel
universe?

By default, Condor tries to run jobs only on machines that have enough
memory.  Condor_submit does this by sticking the clause:

((Memory * 1024) >= ImageSize)

into the job's requirements. The problem is that Condor doesn't know a priori how much memory the job will need (the ImageSize). So, it makes
an initial guess based on the size of the executable.  This guess is
almost always wrong, almost always too small.  If you have a better
guess as to the image size, you can put it in the submit file:

image_size = some_value_in_kbytes

And Condor will only match the job to machines (or slots) with at least
that amount of memory.

-greg
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


==========================
Robert E. Parrott, Ph.D. (Phys. '06)
Associate Director, Grid and
       Supercomputing Platforms
Project Manager, CrimsonGrid Initiative
Harvard University Sch. of Eng. and App. Sci.
Maxwell-Dworkin  211,
33 Oxford St.
Cambridge, MA 02138
(617)-495-5045