[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] out-of-memory issues in parallel universe




Is there some way of specifying the image size, and restricting jobs to larger memory compute nodes, for MPI jobs submitted in the parallel universe?

By default, Condor tries to run jobs only on machines that have enough memory. Condor_submit does this by sticking the clause:

((Memory * 1024) >= ImageSize)

into the job's requirements. The problem is that Condor doesn't know a priori how much memory the job will need (the ImageSize). So, it makes an initial guess based on the size of the executable. This guess is almost always wrong, almost always too small. If you have a better guess as to the image size, you can put it in the submit file:

image_size = some_value_in_kbytes

And Condor will only match the job to machines (or slots) with at least that amount of memory.

-greg