[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] input file locking?



As I said, I've already checked for accessibility of the file from the execute node, and I've checked daemon logs for any signs of NFS trouble. The same executable is being used for all runs, successful and unsuccessful, and the same input file has been used both successfully and unsuccessfully. The submit description queues up about 300 runs of the same program, which is doing some evolutionary simulations.

- dave


Junjun Mao wrote:
Most likely this is not condor related, as the job was already started by Condor. Try to run the program on the node with failure to see if he gets the same error. Then you may want to look if NFS is not stable.

Junjun

On Friday 10 November 2006 16:18, David A. Kotz wrote:
When Condor opens an input file for a job, does it lock that file?  I
have a user who is submitting hundreds of jobs, all of which refer to
a directory (NFS mounted) of text files with one number in each.  At
any given time, there may be several jobs using the same input file. Some of the jobs using a given input file run to completion with no
problems while others repeatedly fail to run with errors like the
following in the shadow log:

11/10 15:10:27 (3744.186) (9335):error: Error: Couldn't open standard
file 'inputs/in.186'

I've checked the system logs to make sure we aren't having
intermittent automounter issues or any other system failings.  The
jobs that fail to run keep failing to run, returning to the idle
state over and over, even after all of the running jobs have
completed.
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR