[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Condor/SGE cluster
- Date: Tue, 7 Jan 2014 15:55:48 +0100 (CET)
- From: Francesco Prelz <Francesco.Prelz@xxxxxxxxxx>
- Subject: Re: [HTCondor-users] Condor/SGE cluster
On Mon, 6 Jan 2014, Lukas Koschmieder wrote:
I'm trying to set up Condor in order to be able to submit jobs to a local SGE cluster. The SGE cluster is already up and running, and I can execute Vanilla universe Condor jobs (e.g. "/usr/bin/condor_run -u vanilla -a periodic_remove=JobStatus==5 /bin/hostname &). But if I try to submit a Grid universe job (grid_resource=sge), the job always ends up in hold state.
Hold reason: Attempts to submit failed:
 (77.0) blah_job_submit() failed: submission command failed (exit code = 1)
You have to understand first why the 'sge_submit.sh'
(/usr/libexec/condor/glite/bin/sge_submit.sh) script is failing.
This means that the script is either unable to find 'qsub' or that
the generated submit file is incorrect.
The script expects to find the SGE root directory and cell name via
the batch_gahp.config (/usr/libexec/condor/glite/etc/batch_gahp.config)
file. These default to the SGE_ROOT and SGE_CELL environment
variables. If these variables are not defined, '/usr/local/sge/pro'
and 'default' are used for the root path and cell name.
You can set these (sge_root and sge_cellname) in batch_gahp.config as
If these settings are correct and sge_submit.sh is still failing can try
to execute it directly by giving a simple command as an argument, say
sge_submit.sh -c /bin/date
If you wish to inspect the generated submit file you should modify
sge_submit.sh so that $bls_tmp_file is either copied away or not removed
in the script.
I unfortunately have no hands-on experience with SGE. However, if these
scripts contain assumptions that don't make sense in your environment
I can make sure they get fixed.
Hope this helps.