[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] It is so long for jobs to get scheduled using condor web service

Date: Tue, 13 Apr 2010 12:56:51 +0800
From: "Zhenghua Xue" <zhxue@xxxxxxx>
Subject: Re: [Condor-users] It is so long for jobs to get scheduled using condor web service

Both of your proposals work. Thanks a lot.

2010-04-13

Zhenghua Xue

发件人： Matthew Farrellee

发送时间： 2010-04-13 09:50:43

收件人： Zhenghua Xue

抄送： condor-users

主题： Re: It is so long for jobs to get scheduled using condor web service

On 4/12/10 10:43 AM, Zhenghua Xue wrote:

> Hi all,

> I used birdbath to submit jobs. It is very strange that the submitted

> jobs remains idle too long before it is executed. It seems that the

> submited job is to be executed as condor meets a new reschedule cycle.

> Could you guy tell me the reason? How to make the jobs scheduled as soon

> as the http reuqest is accepted? Thank you.

> The following is the log from the SchedLog. The http request is accepted

> at 23:17:47, but it is scheduled and then executed until 23:27:32. It is

> just a very simple job, why it waits so long between being accepted and

> scheduled?

> 4/12 23:17:47 (pid:17685) Received HTTP POST connection from

> <116.69.180.124:51227>

> 4/12 23:17:47 (pid:17685) About to serve HTTP request...

> 4/12 23:17:47 (pid:17685) Completed servicing HTTP request

> 4/12 23:17:48 (pid:17685) Received HTTP POST connection from

> <116.69.180.124:51228>

> 4/12 23:17:48 (pid:17685) About to serve HTTP request...

> 4/12 23:17:48 (pid:17685) Completed servicing HTTP request

> 4/12 23:17:48 (pid:17685) Received HTTP POST connection from

> <116.69.180.124:51229>

> 4/12 23:17:48 (pid:17685) About to serve HTTP request...

> 4/12 23:17:48 (pid:17685) Completed servicing HTTP request

> 4/12 23:17:48 (pid:17685) Received HTTP POST connection from

> <116.69.180.124:51230>

> 4/12 23:17:48 (pid:17685) About to serve HTTP request...

> 4/12 23:17:48 (pid:17685) Completed servicing HTTP request

> 4/12 23:17:48 (pid:17685) Received HTTP POST connection from

> <116.69.180.124:51231>

> 4/12 23:17:48 (pid:17685) About to serve HTTP request...

> 4/12 23:17:48 (pid:17685) Completed servicing HTTP request

> 4/12 23:17:48 (pid:17685) Received HTTP POST connection from

> <116.69.180.124:51232>

> 4/12 23:17:48 (pid:17685) About to serve HTTP request...

> 4/12 23:17:48 (pid:17685) Completed servicing HTTP request

> 4/12 23:17:49 (pid:17685) Received HTTP POST connection from

> <116.69.180.124:51233>

> 4/12 23:17:49 (pid:17685) About to serve HTTP request...

> 4/12 23:17:49 (pid:17685) Completed servicing HTTP request

> 4/12 23:17:49 (pid:17685) Received HTTP POST connection from

> <116.69.180.124:51234>

> 4/12 23:17:49 (pid:17685) About to serve HTTP request...

> 4/12 23:17:49 (pid:17685) Timer 785 not found

> 4/12 23:17:49 (pid:17685) Completed servicing HTTP request

> 4/12 23:27:29 (pid:17685) Sent ad to central manager for daemon@xxxxxxxxxxx

> 4/12 23:27:29 (pid:17685) Sent ad to 1 collectors for daemon@xxxxxxxxxxx

> 4/12 23:27:29 (pid:17685) Sent ad to central manager for zhxue@xxxxxxxxxxx

> 4/12 23:27:29 (pid:17685) Sent ad to 1 collectors for zhxue@xxxxxxxxxxx

> 4/12 23:27:29 (pid:17685) Sent ad to central manager for

> Administrator@xxxxxxxxxxx

> 4/12 23:27:29 (pid:17685) Sent ad to 1 collectors for

> Administrator@xxxxxxxxxxx

> 4/12 23:27:29 (pid:17685) Activity on stashed negotiator socket

> 4/12 23:27:29 (pid:17685) Negotiating for owner: zhxue@xxxxxxxxxxx

> 4/12 23:27:29 (pid:17685) Checking consistency running and runnable jobs

> 4/12 23:27:29 (pid:17685) Tables are consistent

> 4/12 23:27:29 (pid:17685) Rebuilt prioritized runnable job list in 0.000s.

> 4/12 23:27:29 (pid:17685) Out of jobs - 1 jobs matched, 0 jobs idle,

> flock level = 0

> 4/12 23:27:31 (pid:17685) Starting add_shadow_birthdate(39664.0)

> 4/12 23:27:31 (pid:17685) Started shadow for job 39664.0 on

> "<192.168.21.201:44824>", (shadow pid = 14044)

> 4/12 23:27:32 (pid:17685) ZKM: setting default map to zhxue@xxxxxxxxxxx

> 4/12 23:27:32 (pid:17685) Shadow pid 14044 for job 39664.0 exited with

> status 100

> 4/12 23:27:32 (pid:17685) Checking consistency running and runnable jobs

> 4/12 23:27:32 (pid:17685) Tables are consistent

> 4/12 23:27:32 (pid:17685) Rebuilt prioritized runnable job list in

> 0.000s. (Expedited rebuild because no match was found)

> 4/12 23:27:32 (pid:17685) match

> (<192.168.21.201:44824>#1271042786#26#...) out of jobs (cluster id

> 39664); relinquishing

> 4/12 23:27:32 (pid:17685) Sent RELEASE_CLAIM to startd at

> <192.168.21.201:44824>

> 4/12 23:27:32 (pid:17685) Match record (<192.168.21.201:44824>, 39664,

> -1) deleted

> 4/12 23:27:32 (pid:17685) Got VACATE_SERVICE from <192.168.21.201:39723>

Try the RequestReschedule SOAP call on the Schedd or execute the

condor_reschedule command after submitting.

Best,

matt

References:
- [Condor-users] It is so long for jobs to get scheduled using condor web service
  - From: Zhenghua Xue

Prev by Date: Re: [Condor-users] inspiration for DAGMan syntax
Next by Date: [Condor-users] condor_rooster observation (bug possibly ?)
Previous by thread: [Condor-users] It is so long for jobs to get scheduled using condor web service
Next by thread: Re: [Condor-users] "COLLECTOR address or hostname not specified in config file"
Index(es):
- Date
- Thread