[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_submit never return with condor 7.2.1



Frédéric,

> I finaly got the output from condor_submit with -debug. I read it, but
> I can't find anything to help me. I hesitate to send it to the mailing
> list as it contain information about the security that we use, so I
> send it to you in case you can help me. If not, just tell me.

You actually replied to the list so I'll answer here. If you ask the list admins they might be able to remove your attachment from the list archives so at least the output isn't kept around for all ages.

> I submit a very small job(echo 1). The process condor_submit is still
> running after 15 minutes. After a few seconds, their is no more output
> from condor_submit. So you have the full output.
>
> Do you have any idea of what could cause this?

These last few lines in your output:

4/6 17:14:14 (fd:2) (pid:841) FileLock object is updating timestamp on: /u/bastienf/testclaude/LOGS.NOBACKUP/echo_1_2009-04-06_17:14:05.784002/condor.log
4/6 17:14:14 (fd:2) (pid:841) PRIV_USER --> PRIV_CONDOR at file_lock.cpp:432
4/6 17:14:14 (fd:2) (pid:841) PRIV_CONDOR --> PRIV_USER at file_lock.cpp:444
4/6 17:14:14 (fd:2) (pid:841) PRIV_USER --> PRIV_UNKNOWN at user_log.cpp:173
4/6 17:14:14 (fd:2) (pid:841) PRIV_UNKNOWN --> PRIV_USER at user_log.cpp:767
4/6 17:14:14 (fd:2) (pid:841) FILE_LOCK_VIA_MUTEX is undefined, using default value of True

They make it look like condor_submit is waiting on a file lock to write something to a shared file.

Are your logs on a network file system? If so, perhaps the network file system protocol is causing a file locking issue here? There are known problems with logging on NFS-exported file systems in Condor. How are you mounting the /u file system? I usually NFS my mounts used by my Condor pools with:

<NAS>:/data   /data       nfs   exec,dev,suid,rw,tcp,hard,vers=3,rsize=32768,wsize=32768,timeo=10,retrans=600    1 1

That works well with the file locking semantics in Condor. I also run with:

        IGNORE_NFS_LOCK_ERRORS = True
        LOG_ON_NFS_IS_ERROR = Fals

In my condor_config files.

Hope that helps move along your debugging a bit!

Warm regards,
- Ian

Confidentiality Notice.
This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,  or copying  of this message, or any attachments, is strictly prohibited.  If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.