[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Jobs held on submit node



On 2/3/2016 1:25 AM, Baumeier, B. wrote:
Dear all:

I have built a (still small) condor cluster and so far it works well.
With one exception. When a job is supposed to start on the same machine
it was submitted from, the job is being held because it cannot create
output files. Which is weird because the jobs starting on other nodes
can readily do that. All clients use the same config file which I
attach. Any idea what I did wrong?

Thanks,

Bjoern


That is a little strange.

Do your nodes use a shared filesystem, or are you expecting to have HTCondor transfer your job's files by specifying things like transfer_input_files in your submit description file?

My wild guess based on the above is this has something to do with HTCondor using file transfer when the job runs remotely but not using file transfer when it tries to start on the submit machine. See the man page for condor_submit, specifically the entry for should_transfer_files. Try changing your submit file to explicitly have should_transfer_files = YES (or NO if you want to always use a shared filesystem) and see if that makes a difference. My guess is it will at least result in the same behavior if the job runs local or remote.

hope the above helps,
Todd