[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor_submit hangs with warning on Mac OS 10.4



Hi all,

I am running Condor 7.0.4 on an Intel Mac with OS 10.4. When I submit
several jobs in a short period of time condor_submit hangs and/or
produces warnings about files not being writable by condor.

Here is an example job:

universe = vanilla
executable = /bin/hostname
transfer_executable = false
output = test_$(cluster).$(process).out
error = test_$(cluster).$(process).err
log = test_$(cluster).$(process).log
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
notification = NEVER
queue 50

Here is a typical output:

[juve@juve ~]$ condor_submit condor.sub
Submitting job(s)..................................................
Logging submit event(s)..................................................
50 job(s) submitted to cluster 20.

WARNING: File /Users/juve/test_20.26.out is not writable by condor.

WARNING: File /Users/juve/test_20.38.out is not writable by condor.

At that point condor_submit may hang for a long time.

I don't always get the warnings and they usually only appear for a few
of the output and/or error files. The jobs always run and produce the
correct output. Sometimes condor_submit hangs until I kill it (it is
often still running after all the jobs have finished). If I submit the
same set of jobs on Linux I don't have any problems.

The problem also occurs if I am submitting jobs rapidly using DAGMan.
For example, if there is a location in my DAG where a large number of
jobs become ready. This causes problems for DAGMan because it sees the
warning as a submission failure and resubmits the job a few seconds
later. But since the original jobs are actually running, the resubmits
cause duplicates that can interfere with each other (overwrite working
files, etc). This also seems to cause DAGMan to hang occasionally.

Has anyone seen problems like this before?

Cheers,
Gideon