[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Problem in running parallel program



Hi Rajagopal,

use 'condor_status -l' to get a better idea what your slot looks like and what ressources it provides.

omp_set_num_threads is an integer value that defines how many parallel threads should be used.

It can not be more than virtual cores detected but that is not purely condor related ...

Best
christoph

--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx


Von: "Rajagopala Reddy Seelam" <rajagopala.seelam@xxxxxxxxxxx>
An: "Carsten Aulbert" <carsten.aulbert@xxxxxxxxxx>
CC: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Donnerstag, 23. September 2021 09:26:54
Betreff: Re: [HTCondor-users] Problem in running parallel program

Hi Carsten

I have edited the /etc/condor/config.d/00-htcondor-9.0.config file and added those instructions.
After condor_reconfigure, condor_status returns

slot1@Theochem 20.0 23963

I will test further and I will get back to you.

In vanilla universe, I am getting the following error message.

omp_set_num_threads value (16) is invalid

In the local universe, I am not getting the error. Can you please comment on that.

Thank you
Regards
Rajagopal




On Thu, Sep 23, 2021 at 11:20 AM Carsten Aulbert <carsten.aulbert@xxxxxxxxxx> wrote:
Hi

On 23.09.21 07:42, Rajagopala Reddy Seelam wrote:
> Try using a single partition-able slot, e.g.
> NUM_SLOTS_TYPE_1                 = 1
> SLOT_TYPE_1                      = cpus=100%, ram=100%, swap=0%
> SLOT_TYPE_1_PARTITIONABLE        = True
>
> If you dont mind, I need further assistance here. When I include these
> instructions in the submission script, condor returns
>
> WARNING: the line 'SLOT_TYPE_1_PARTITIONABLE = True' was unused by
> condor_submit. Is it a typo?
> WARNING: the line 'SLOT_TYPE_1 = cpus=100%, ram=100%, swap=0%' was
> unused by condor_submit. Is it a typo?
> WARNING: the line 'NUM_SLOTS_TYPE_1 = 1' was unused by condor_submit. Is
> it a typo?
>
> It seems I need to use these instructions at the time of installation. I
> request you to help me.
>

You need to place those instructions to the configuration file for the
startd, e.g. on the machine where the jobs should run on in
/etc/condor/config.d/01_startd.config

In other words, you first need to configure the starter "correctly", run
condor_reconfig and verify that you only have one slot with all the
resources in it, e.g.

condor_status -constraint PartitionableSlot -af Name TotalCpus TotalMemory

should show something like (final number depends how much memory the
machine has)

slot1@machinename 20.0 16000

When this is set, you should be able to submit your jobs and it should
not overwhelm the machine anymore.

HTH

Carsten



--
Rajagopala R. Seelam,
Assistant Professor,
School of Chemical Sciences and Pharmacy,
Central University of Rajasthan,
NH-8, Bandar Sindri, Ajmer-305817,
Rajasthan, India

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/