[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Problem in running parallel program



Hi Aulbert

Thank you very much for your quick response.

I am working on your suggestions in the earlier email. I will get back to you on that.

Response to this email: No, I think "dagman" may not help me here. This has to do with the "request_cpus=1". HTCondor accepts jobs upto 20 and immediately runs these 20 calculations. As a result, the memory is exhausted and the machine hangs. I am looking to the "hold" possibility to manually specify the scheduler to hold the job and release the job after the earlier job is completed.

Thank you very much
Best Regards
Rajagopal



On Wed, Sep 22, 2021 at 5:54 PM Carsten Aulbert <carsten.aulbert@xxxxxxxxxx> wrote:
Hi again,

separate reply as this is a different topic and may result in a
different thread.

On 22.09.21 13:57, Rajagopala Reddy Seelam wrote:
> Another request: Is it possible to hold a job until its preceding job
> (with job id 123) is completed? PBS has such possibility.
>
> *qsub* *-W* *depend=afterok:123 m1.sub
> *
>
> I see condor_hold option. I have to manually release the job to queue with
> condor_release.
>

I would look into "dagman" where you can define dependencies between jobs

https://htcondor.readthedocs.io/en/latest/users-manual/dagman-workflows.html

I.e. you can define that jobs A, B and C need to finish before job D can
start.

Is this what you are looking at?

Cheers

carsten
--
Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics,
CallinstraÃe 38, 30167 Hannover, Germany, Phone +49 511 762 17185




--
Rajagopala R. Seelam,
Assistant Professor,
School of Chemical Sciences and Pharmacy,
Central University of Rajasthan,
NH-8, Bandar Sindri, Ajmer-305817,
Rajasthan, India