[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor/SGE cluster



> Hi,

Hi Lukas.

> okay, I've found the error.
> 
> I had to add a line to /usr/libexec/condor/glite/bin/sge_submit.sh which
> includes the location of "qsub" to PATH.

This *should* in principle have been taken care of by executing SGE's
'settings.sh'. As I mentioned in my earlier post, you can make sure
it is found by setting sge_rootpath and sge_cellname appropriately in 
batch_gahp.config:

if [ -z "$sge_rootpath" ]; then sge_rootpath="/usr/local/sge/pro"; fi
if [ -r "$sge_rootpath/${sge_cellname:-default}/common/settings.sh" ]
then
  . $sge_rootpath/${sge_cellname:-default}/common/settings.sh
fi

Or, is settings.sh not setting the path ?

> By the way, there is some pointless code in this script:
>
> jobID=`qsub $bls_tmp_file 2> /dev/null | perl -ne 'print $1 if /^Your job
> (\d+)/;'` # actual submission
> retcode=$?
> if [ "$retcode" != "0" -o -z "$jobID" ] ; then
>     rm -f $bls_tmp_file
>     exit 1
> fi

Agree, thanks. This is now fixed in the upstream code.

> And for readability reasons you could use awk '{ print $3 }' instead of perl
> -ne 'print $1 if /^Your job (\d+) /;'.

This depends on what else qsub can output to stdout, and, having no direct
SGE experience, I'd be cautious in changing it.

> Furthermore, it would be nice if this script would generate some error
> messages or an error log.

I have trouble contacting the SGE script authors. The main issue here
is whether they had a reason to redirect the STDERR of the qsub command to
/dev/null. As STDERR is recorded in the logs it may have provided valuable
information. Would it be too much if I asked you to please test removing the
"2> /dev/null" and see if you hit any side effects ? This may be useful
for other users in the future.

Thank you for sharing your findings.
Francesco Prelz
INFN Milano