[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_submit never return with condor 7.2.1



Hi,

This don't give me more data on the console. The only thing in the console is:
Submitting job(s).
Logging submit event(s)

Where are the debug data supposed to go? In the log?

Here is the list of condor_process

condorr  14978  0.0  0.1  30832  3540 ?        Ss   11:58   0:00
/opt/condor/sbin/condor_master
condorr  14979  0.0  0.2  31188  4980 ?        Ss   11:58   0:00
condor_schedd -f
condorr  14980  1.1  0.2  30632  4592 ?        Ss   11:58   0:04
condor_startd -f
root     14981  0.0  0.1  20040  3276 ?        S    11:58   0:00
condor_procd -A
/tmp/condor-lock.atchoum0.780986706101/procd_pipe.SCHEDD -S 60 -C
51860
bastienf 15349  4.0  0.1  29168  3116 pts/6    R+   12:02   0:05
condor_submit -debug
LOGS.NOBACKUP/echo_1_2009-03-31_12:02:42.874654/submit_file.condor
bastienf 15577  4.1  0.1  30248  4028 ?        S    12:03   0:03
condor_shadow -f 1.0 --schedd=<132.204.26.92:9601>
--xfer-queue=limit=upload,download;addr=<132.204.26.92:9666>
<132.204.26.92:9666> -

as you can see, condor_submit is running for more then 5 minutes. The
condor_shadow have been started, but the stdout, stderr and log file
are empty. Can you confirm me that the condor_submit should be
finished before the jobs is matched?


condor_q tell me that the jobs is running event if their is notting on
the compute server.

in the SchedLog on the submit node I have:

3/31 12:03:04 (pid:14979) Sent ad to central manager for
bastienf@xxxxxxxxxxxxxxxx
3/31 12:03:04 (pid:14979) Sent ad to 1 collectors for bastienf@xxxxxxxxxxxxxxxx
3/31 12:03:56 (pid:14979) Negotiating for owner: bastienf@xxxxxxxxxxxxxxxx
3/31 12:03:56 (pid:14979) AutoCluster:config() significant atttributes
changed to OWNER,IOJob,JobUniverse,LastCheckpointPlatform,NumCkpts,slot1_IOJob,slot2_IOJob,slot3_IOJob,slot4
_IOJob,slot5_IOJob,slot6_IOJob,slot7_IOJob,slot8_IOJob
3/31 12:03:56 (pid:14979) Checking consistency running and runnable jobs
3/31 12:03:56 (pid:14979) Tables are consistent
3/31 12:03:56 (pid:14979) Rebuilt prioritized runnable job list in 0.000s.
3/31 12:03:56 (pid:14979) Out of jobs - 1 jobs matched, 0 jobs idle,
flock level = 0
3/31 12:03:56 (pid:14979) Sent ad to central manager for
bastienf@xxxxxxxxxxxxxxxx
3/31 12:03:56 (pid:14979) Sent ad to 1 collectors for bastienf@xxxxxxxxxxxxxxxx
3/31 12:03:56 (pid:14979) Sent REQUEST_CLAIM to startd
slot3@xxxxxxxxxxxxxxxxxxxxxxx <132.204.27.64:47146> for
bastienf@xxxxxxxxxxxxxxxx
3/31 12:03:56 (pid:14979) Starting add_shadow_birthdate(1.0)
3/31 12:03:56 (pid:14979) Started shadow for job 1.0 on
slot3@xxxxxxxxxxxxxxxxxxxxxxx <132.204.27.64:47146> for
bastienf@xxxxxxxxxxxxxxxx, (shadow pid = 15577)
3/31 12:04:56 (pid:14979) Sent ad to central manager for
bastienf@xxxxxxxxxxxxxxxx
3/31 12:04:56 (pid:14979) Sent ad to 1 collectors for bastienf@xxxxxxxxxxxxxxxx
3/31 12:04:58 (pid:14979) Activity on stashed negotiator socket
3/31 12:04:58 (pid:14979) Negotiating for owner: bastienf@xxxxxxxxxxxxxxxx
3/31 12:04:58 (pid:14979) Out of servers - 0 jobs matched, 1 jobs
idle, 0 jobs rejected
3/31 12:05:56 (pid:14979) Sent ad to central manager for
bastienf@xxxxxxxxxxxxxxxx
3/31 12:05:56 (pid:14979) Sent ad to 1 collectors for bastienf@xxxxxxxxxxxxxxxx
3/31 12:06:56 (pid:14979) Sent ad to central manager for
bastienf@xxxxxxxxxxxxxxxx


In the shadow log:
3/31 12:03:56 Initializing a VANILLA shadow for job 1.0
3/31 12:03:56 (1.0) (15577): Request to run on
slot3@xxxxxxxxxxxxxxxxxxxxxxx <132.204.27.64:47146> was ACCEPTED

On the central node in the NegotiatorLog and MatchLog file. It tell me
that the jobs have been matched.

I would really appreciate to solve this issue as it happen more often
then before and I can reproduice it each time on some computer with
some user.

Also, I use FS and FS_REMOTE as authentification methods.

thanks for your time

Frédéric bastien

On Mon, Mar 30, 2009 at 5:13 PM, Ian Chesal <ICHESAL@xxxxxxxxxx> wrote:
>> Do you have any advise where I could look to solve this? What log
>> would help you?
>
> You can try turning on tool debugging:
>
> TOOL_DEBUG = True
>
> in your condor_config file. And then running condor_submit with -debug.
>
> That might help give you a little more information the tool side as to
> what's going on.
>
> - Ian
>
> Confidentiality Notice.
> This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,  or copying  of this message, or any attachments, is strictly prohibited.  If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>