[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Problem in running parallel program



Hi Carsten

Thank you very much for your email. I have changed the universe to vanilla and I have tested

request_memory = 6G
request_cpus = 5
The job did not start. It starts only with request_cpus=1.

with
request_memory = 20G
request_cpus = 1
I have submitted 5 jobs. All of them have started, even though I have 20GB RAM only.

Regards
Rajagopal

On Wed, Sep 22, 2021 at 6:09 PM Carsten Aulbert <carsten.aulbert@xxxxxxxxxx> wrote:
Hi

On 22.09.21 14:35, Rajagopala Reddy Seelam wrote:
> Response to this email: No, I think "dagman" may not help me here. This
> has to do with the "request_cpus=1". HTCondor accepts jobs upto 20 and
> immediately runs these 20 calculations. As a result, the memory is
> exhausted and the machine hangs. I am looking to the "hold" possibility
> to manually specify the scheduler to hold the job and release the job
> after the earlier job is completed.


I think the partition-able slot will help here as well as you can also
can simply use

request_memory = 6G
request_cpus = 5

and if the machine has 20 cores and 16 GByte of RAM, it would only ever
run two of
these at the same time as condor only as 4 GByte and 10 CPU cores left
for a new job.

There are many more knobs to try to achieve this, but these would be the
ones I would
try first.

Cheers

Carsten



--
Rajagopala R. Seelam,
Assistant Professor,
School of Chemical Sciences and Pharmacy,
Central University of Rajasthan,
NH-8, Bandar Sindri, Ajmer-305817,
Rajasthan, India