[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] condor submission: how to force a job to use specified amount of memory?



Dear all,

I'm doing numerical experiments, solving optimization problems and
collect the log files to compare different algorithms.
The program requires about 12 GB memory to solve a problem.

The machine I am using is a cluster of 27 nodes, and each of them has 12 slots.Â
1 slot has 2 GB of memory.

Following is my current condor submission.

universe = vanilla
notification = never
should_transfer_files = yes
when_to_transfer_output = always
copy_to_spool = false
requirements = regexp("slot([1-9]|1[0-2])@pedigree-([1-9]|1[0-9]|2[0-7]).*",Name)
request_memory = 12000
executable = limit.sh
 ÂÂ
output = out
error Â= err
log  Â= log
transfer_input_files = program, input_file
arguments = 22600 12000000 ./program -f input_file --algorithm search
queue


I am submitting 100 ~ 200 jobs at once hoping that condor schedules jobs for me.
It was fine until I was using the memory less than 4 GB for each job.

What I am seeing is:Â
condor assigns each job to a single node, so more than 2 jobs assigned to 1 node.
as the program solves the input_problem it will take more memory.
At some point, some of the jobs become suspended and eventually go idle.

I guess it is due HTCondor try to allocate the resource within a single machine rather than using unclaimed slots.
I confirmed this by submitting small number of jobs and HTCondor didn't use 300 slots available.

I changed above header as,
requirements = regexp("slot([1-9]|1[0-2])@pedigree-([1-9]|1[0-9]|2[0-7]).*",Name) && ( Memeory >= 12000 )
request_memory = 12000

But it didn't resolve this issue.

If someone could suggest a way to modify the condor submission?
Thanks in advance.

Best,

Junkyu Lee