Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] out-of-memory issues in parallel universe

Date: Wed, 19 Mar 2008 12:57:05 -0400
From: "Robert E. Parrott" <parrott@xxxxxxxxxxxxxxxx>
Subject: Re: [Condor-users] out-of-memory issues in parallel universe

I understand this solution, but not all my users do :->

As I understand your response, these properties will be considered forall nodes on which the job is run ... is that the case?

I'm also/instead looking for a solution to enforce memory limits atruntime.

It looks as if a USER_JOB_WRAPPER with a ulimit line is the solutionhere. Does that jibe with what others have done?


rob



On Mar 17, 2008, at 11:36 AM, Greg Thain wrote:

Is there some way of specifying the image size, and restricting jobs
to larger memory compute nodes, for MPI jobs submitted in theparallel
universe?
By default, Condor tries to run jobs only on machines that have enough
memory.  Condor_submit does this by sticking the clause:

((Memory * 1024) >= ImageSize)
into the job's requirements. The problem is that Condor doesn'tknow apriori how much memory the job will need (the ImageSize). So, itmakes
an initial guess based on the size of the executable.  This guess is
almost always wrong, almost always too small.  If you have a better
guess as to the image size, you can put it in the submit file:

image_size = some_value_in_kbytes
And Condor will only match the job to machines (or slots) with atleast
that amount of memory.

-greg
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxxwith a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



==========================
Robert E. Parrott, Ph.D. (Phys. '06)
Associate Director, Grid and
       Supercomputing Platforms
Project Manager, CrimsonGrid Initiative
Harvard University Sch. of Eng. and App. Sci.
Maxwell-Dworkin  211,
33 Oxford St.
Cambridge, MA 02138
(617)-495-5045

Follow-Ups:
- Re: [Condor-users] out-of-memory issues in parallel universe
  - From: Dan Bradley

References:
- [Condor-users] Submit Parallel Job from the client Matlab to Condor scheduler!
  - From: Vinicius da Cunha M. Borges
- [Condor-users] out-of-memory issues in parallel universe
  - From: Robert E. Parrott
- Re: [Condor-users] out-of-memory issues in parallel universe
  - From: Greg Thain

Prev by Date: Re: [Condor-users] USER_JOB_WRAPPER causes problem with Java jobs on Windows
Next by Date: [Condor-users] condor_schedd slowness causing job leases to expire
Previous by thread: Re: [Condor-users] out-of-memory issues in parallel universe
Next by thread: Re: [Condor-users] out-of-memory issues in parallel universe
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] out-of-memory issues in parallel universe