[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Problem in running parallel program



Hello

I have installed HTCondor in an workstation with 20 cores. The installation went well and condor_status returns 20 free slots. I could successfully run the examples given in the quick start manual. Thank you very much for the program.

Now I am trying to run calculations using an application called Gaussian. Gaussian is an electronic structure calculation code used by computational chemists. It is a parallel program (open mpi type) and I am intended to use 16 cores for a calculation. The following is my job submission script.

#!/bin/bash
universe  = local
executable = m1.sh
log    Â= m1.err
output   = m1.err
error   Â= m1.err
request_cpus  = 16
should_transfer_files  = Yes
when_to_transfer_output = ON_EXIT
queue


The job did not started and is with "Idle" status for a long time. However, if I change the request_cpus = 1, it is working well. The calculation using 16 cores as intended to. Even-though this work around allowing to submit jobs, I am unable to manage the jobs efficiently. Please help me in setting the cpus correctly.Â

Another request: Is it possible to hold a job until its preceding job (with job id 123) is completed? PBS has such possibility.
qsub -W depend=afterok:123  m1.sub
I see condor_hold option. I have to manually release the job to queue with 
condor_release.
Thank you very much
Best Regards
Rajagopal