[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_submit hangs with warning on Mac OS 10.4



I can confirm I also get this behavior occasionally on Mac OS X 10.4 (Intel
& PowerPC) with 7.0.x/7.1.0. The hanging and warnings don't stop the tasks
from running successfully though. Hangs tend to occur when requesting
multiple processes per cluster.

Cheers
Craig

On 25/09/2008 20:54, "Gideon Juve" <juve@xxxxxxx> wrote:

> Hi all,
> 
> I am running Condor 7.0.4 on an Intel Mac with OS 10.4. When I submit
> several jobs in a short period of time condor_submit hangs and/or
> produces warnings about files not being writable by condor.
> 
> Here is an example job:
> 
> universe = vanilla
> executable = /bin/hostname
> transfer_executable = false
> output = test_$(cluster).$(process).out
> error = test_$(cluster).$(process).err
> log = test_$(cluster).$(process).log
> should_transfer_files = YES
> when_to_transfer_output = ON_EXIT
> notification = NEVER
> queue 50
> 
> Here is a typical output:
> 
> [juve@juve ~]$ condor_submit condor.sub
> Submitting job(s)..................................................
> Logging submit event(s)..................................................
> 50 job(s) submitted to cluster 20.
> 
> WARNING: File /Users/juve/test_20.26.out is not writable by condor.
> 
> WARNING: File /Users/juve/test_20.38.out is not writable by condor.
> 
> At that point condor_submit may hang for a long time.
> 
> I don't always get the warnings and they usually only appear for a few
> of the output and/or error files. The jobs always run and produce the
> correct output. Sometimes condor_submit hangs until I kill it (it is
> often still running after all the jobs have finished). If I submit the
> same set of jobs on Linux I don't have any problems.
> 
> The problem also occurs if I am submitting jobs rapidly using DAGMan.
> For example, if there is a location in my DAG where a large number of
> jobs become ready. This causes problems for DAGMan because it sees the
> warning as a submission failure and resubmits the job a few seconds
> later. But since the original jobs are actually running, the resubmits
> cause duplicates that can interfere with each other (overwrite working
> files, etc). This also seems to cause DAGMan to hang occasionally.
> 
> Has anyone seen problems like this before?
> 
> Cheers,
> Gideon
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/


This message has been checked for viruses but the contents of an attachment
may still contain software viruses, which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.