[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor-C delays abount 5 minutes!



Hello,

You can speed up the frequency of polling with the configuration variable CONDOR_JOB_POLL_INTERVAL

See:  http://www.cs.wisc.edu/condor/manual/v7.5/3_3Configuration.html#SECTION004320000000000000000

Note the warnings in the documentation, increased network traffic, but decreased latency on job state changes.

Derek Weitzel
Graduate Research Assistant
University of Nebraska Holland Computing Center

On Dec 1, 2010, at 11:14 AM, 胡忠想 wrote:

> Hello everyone.
>  
>       When i use condor-c to submit jobs from one center manager to another, the job can be successfully done.
> However, the remote executing machine shows the job has finished, the submitting machine always delay for about five minutes ,and then it shows the job has finished.
> So i want to know if i should set some arguments in /etc/condor_config,so it can work right!
> The following is the log on the submitting machine:
> 12/02 01:30:41 ******************************************************
> 12/02 01:30:41 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
> 12/02 01:30:41 ** /opt/condor-7.4.4/sbin/condor_gridmanager
> 12/02 01:30:41 ** SubsystemInfo: name=GRIDMANAGER type=DAEMON(11) class=DAEMON(1)
> 12/02 01:30:41 ** Configuration: subsystem:GRIDMANAGER local:<NONE> class:DAEMON
> 12/02 01:30:41 ** $CondorVersion: 7.4.4 Oct 14 2010 BuildID: 279383 $
> 12/02 01:30:41 ** $CondorPlatform: I386-LINUX_RHEL5 $
> 12/02 01:30:41 ** PID = 9283
> 12/02 01:30:41 ** Log last touched 12/2 00:44:08
> 12/02 01:30:41 ******************************************************
> 12/02 01:30:41 Using config source: /opt/condor-7.4.4/etc/condor_config
> 12/02 01:30:41 Using local config sources:
> 12/02 01:30:41    /opt/condor-7.4.4/local.node2/condor_config.local
> 12/02 01:30:41 DaemonCore: Command Socket at <10.1.1.27:58032>
> 12/02 01:30:44 [9283] Found job 25.0 --- inserting
> 12/02 01:30:44 [9283] gahp server not up yet, delaying ping
> 12/02 01:30:44 [9283] GAHP server pid = 9287
> 12/02 01:30:44 [9283] (25.0) doEvaluateState called: gmState GM_INIT, remoteState 0
> 12/02 01:30:49 [9283] resource m2m.jsi.cn is now up
> 12/02 01:30:49 [9283] (25.0) doEvaluateState called: gmState GM_SUBMIT, remoteState 0
> 12/02 01:30:54 [9283] (25.0) doEvaluateState called: gmState GM_SUBMIT, remoteState 0
> 12/02 01:30:54 [9283] (25.0) doEvaluateState called: gmState GM_SUBMIT_SAVE, remoteState 0
> 12/02 01:30:59 [9283] (25.0) doEvaluateState called: gmState GM_STAGE_IN, remoteState 0
> 12/02 01:31:04 [9283] (25.0) doEvaluateState called: gmState GM_POLL_ACTIVE, remoteState 0
> 12/02 01:35:49 [9283] (25.0) doEvaluateState called: gmState GM_SUBMITTED, remoteState 1
> 12/02 01:35:54 [9283] (25.0) doEvaluateState called: gmState GM_STAGE_OUT, remoteState 4
> 12/02 01:35:54 [9283] (25.0) doEvaluateState called: gmState GM_DONE_SAVE, remoteState 4
> 12/02 01:35:59 [9283] (25.0) doEvaluateState called: gmState GM_DONE_COMMIT, remoteState 4
> 12/02 01:35:59 [9283] No jobs left, shutting down
> 12/02 01:35:59 [9283] Got SIGTERM. Performing graceful shutdown.
> 12/02 01:35:59 [9283] **** condor_gridmanager (condor_GRIDMANAGER) pid 9283 EXITING WITH STATUS 0
> As you can see, the delay appears after the stage of HM_POLL_ACTIVE.Do anybody encounter the same problem, and  know the solution?
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/