I'm trying to use HTCondor to submit jobs to our Scarf HPC. At
present, this uses Platform LSF, and (following initial work by Andrew
Lahiff) I've managed to get this to work (to some extent). However,
Scarf is replacing Platform LSF with Slurm, and I'm having trouble
getting submission to work with Slurm in the case where the jobscript
is in a directory that is not shared with the worker nodes. (I am
submitting from a custom Scarf node that has Condor
installed. Ultimately, jobs will be submitted to this node from an
HTCondor node that is external to Scarf, so sharing won't be an
The problem seems to be that the jobscript that is generated by BLAH's
slurm_submit.sh assumes that the original jobscript has been copied to
a (unique) filename in a sandbox folder, but the copy never happens.
The lsf_submit.sh script generates BSUB directives that (I think)
instruct LSF to perform the intial copy, but I see no equivalent in
None of this is reflected in the files created by HTCondor: the log
file implies that the job ran OK (but consumed no resources), and the
output and error files are always empty. Only by modifying the blah
scripts to log to somewhere other than /dev/null (and copying the
generated jobscripts to file) was I able to get more information about
what was going wrong!
batch_gahp.config has many options for defining which directories are
shared, and for overriding default locations for sandboxes etc. I have
tried numerous permutations, to no avail.
Is there a better guide to configuration than the comments in batch_gahp.config?
What special considerations are required for Slurm?