[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Jobs held on submit node
- Date: Wed, 03 Feb 2016 15:24:15 -0600
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Jobs held on submit node
On 2/3/2016 1:25 AM, Baumeier, B. wrote:
I have built a (still small) condor cluster and so far it works well.
With one exception. When a job is supposed to start on the same machine
it was submitted from, the job is being held because it cannot create
output files. Which is weird because the jobs starting on other nodes
can readily do that. All clients use the same config file which I
attach. Any idea what I did wrong?
That is a little strange.
Do your nodes use a shared filesystem, or are you expecting to have
HTCondor transfer your job's files by specifying things like
transfer_input_files in your submit description file?
My wild guess based on the above is this has something to do with
HTCondor using file transfer when the job runs remotely but not using
file transfer when it tries to start on the submit machine. See the man
page for condor_submit, specifically the entry for
should_transfer_files. Try changing your submit file to explicitly have
should_transfer_files = YES (or NO if you want to always use a shared
filesystem) and see if that makes a difference. My guess is it will at
least result in the same behavior if the job runs local or remote.
hope the above helps,