[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor Error No locks available



Hello Kent,

Thanks for the reply.
- What architecture/OS are you running on?
I am using Ubuntu LINUX (Linux cpu02 2.6.12-10-amd64-xeon #1 SMP Thu Dec 22 11:43:32 UTC 2005 x86_64 GNU/Linux)

- Is this problem repeatable or not?
Yes it is repeated and am not sure how to proceed

- Would it be possible for you to move the node job log files themselves
  off of NFS?  That's probably the first thing to try.
I tried this. I generated the sub file using condor_submit_dag -no_submit command. and then edited the file to point all log, lock files to local disk on node. Yet I am seeing this error. Here is my edited run.dag.condor.sub file

# Filename: run.dag.condor.sub
# Generated by condor_submit_dag run.dag
universe        = vanilla
executable      = /home/usr1/condor/bin/condor_dagman
getenv          = True
output          = run.dag.lib.out
error           = run.dag.lib.out
log             = run.dag.dagman.log
remove_kill_sig = SIGUSR1
arguments       = -f -l . -Debug 3 -Lockfile /scratch/usr1/run.dag.lock -Dag run.dag -Rescue /scratch/usr1/run.dag.rescue -Condorlog /scratch/usr1/run.dag.dummy_log
environment    = _CONDOR_DAGMAN_LOG=run.dag.dagman.out;_CONDOR_MAX_DAGMAN_LOG=0
queue

-lohit

On 6/13/06, R. Kent Wenger <wenger@xxxxxxxxxxx> wrote:
On Mon, 12 Jun 2006, lohit wrote:

> I am trying to submit jobs using DAG file. To test the feature, I have 4
> jobs. 3 defined as PARENT and fourth as CHILD.
>
> Job sh_loop1 sh_loop1.cmd
> Job sh_loop2 sh_loop2.cmd
> Job sh_loop3 sh_loop3.cmd
> Job sh_loop4 sh_loop4.cmd
> PARENT sh_loop1 sh_loop2 sh_loop3 CHILD sh_loop4
>
> If I submit this .dag file using condor_submit_dag, I am seeing this error
>
> 6/12 23:32:18   assigned Condor ID (22.0.0)
> 6/12 23:32:18 Just submitted 3 jobs this cycle...
> 6/12 23:32:18 FileLock::obtain(1) failed - errno 37 (No locks available)
> 6/12 23:32:18 ERROR "Assertion ERROR on (is_locked)" at line 916 in file
> user_log.C
>
> I searched previous thread with problem of locks as part of NFS, so I now
> have defined ${LOCK) to be local directory on the nodes.
> But, still I am seeing this error and the job assigned to CHILD is not being
> submitted.
>
> Am, I missing something? Please could anyone explain what the problem is and
> how I could solve this

A few questions:
- What architecture/OS are you running on?
- Is this problem repeatable or not?
- Would it be possible for you to move the node job log files themselves
  off of NFS?  That's probably the first thing to try.

Kent Wenger
Condor Team
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR