[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Schedd & Negotiator not considering all jobs in the queue for scheduling



Hi Johnson,

I confirmed that there is a bug in the way auto cluster attributes are determined. This has been fixed in 7.5.1 (not yet released) as a side-effect of the change to a different ClassAd library. However, the bug remains in the next scheduled stable series release 7.4.2.

You can work around the problem by putting all of your requirements in the requirements expression rather than in a secondary attribute that is referenced from the requirements expression (CCP_Requirements in your case). Another workaround would be to manually configure SIGNIFICANT_ATTRIBUTES with a list of all job attributes (including DeferralTime) that are significant for matchmaking purposes.

--Dan

Johnson koil Raj wrote:
Hi Dan,

Ya you are correct the jobs have same AutoCluterID. I have attached the condor_q -long output. I am submiting the job through SOAP. I have to submit same kind of jobs only most of the time. Is there is a configuration parameter available to schedule without take care of AutoClusterId.

by
Johnson

Dan Bradley wrote:
Hello Johnson,

It sounds like both of the jobs you submitted got assigned to the same AutoClusterId. If you can reproduce the problem, could you please run condor_q -long and send the output of that?

--Dan

Johnson koil Raj wrote:
Hi,

In my pool I am using one execute machine configured for dynamic slot. I submitted 2 jobs at the same but with different deferral times. The second job have deferral time earlier than first job. so the jobs are in Idle state. But at the time window when second job should match and start it didn't so it misses the deferral window.

I found in the Schedd log that first job was considered as runnable and second job didn't taken into consideration.

why Schedd is not sending all the jobs for Negotiation. Is there any configuration changes needed.


Schedd Log
2/10 20:35:08 (pid:23977) Negotiating for owner: idealgrid@xxxxxxxxxxxxxxxxx 2/10 20:35:08 (pid:23977) AutoCluster:config(isControlJob,JobUniverse,LastCheckpointPlatform,NumCkpts,RequestCpus,RequestMemory,RequestDisk) invoked 2/10 20:35:08 (pid:23977) Checking consistency running and runnable jobs
2/10 20:35:08 (pid:23977) Tables are consistent
2/10 20:35:08 (pid:23977) Rebuilt prioritized runnable job list in 0.001s.
2/10 20:35:08 (pid:23977) Sent job 59.0 (autocluster=0)
2/10 20:35:08 (pid:23977) Job 59.0 rejected: no match found
2/10 20:35:08 (pid:23977) Out of servers - 0 jobs matched, 2 jobs idle, 1 jobs rejected


2/10 20:36:08 (pid:23977) Negotiating for owner: idealgrid@xxxxxxxxxxxxxxxxx 2/10 20:36:08 (pid:23977) Reusing prioritized runnable job list because nothing has changed.
2/10 20:36:08 (pid:23977) Job 59.0: is runnable
2/10 20:36:08 (pid:23977) Sent job 59.0 (autocluster=0)
2/10 20:36:08 (pid:23977) Job 59.0 rejected: no match found
2/10 20:36:08 (pid:23977) Out of servers - 0 jobs matched, 2 jobs idle, 1 jobs rejected

2/10 20:36:57 (pid:23977) ============ End clean_shadow_recs =============
2/10 20:37:08 (pid:23977) Activity on stashed negotiator socket
2/10 20:37:08 (pid:23977)
2/10 20:37:08 (pid:23977) *Reusing prioritized runnable job list because nothing has changed.*
2/10 20:37:08 (pid:23977) Job 59.0: is runnable
2/10 20:37:08 (pid:23977) Sent job 59.0 (autocluster=0)
2/10 20:37:08 (pid:23977) Job 59.0 rejected: no match found
2/10 20:37:08 (pid:23977) Out of servers - 0 jobs matched, 2 jobs idle, 1 jobs rejected
2/10 20:37:08 (pid:23977) Activity on stashed negotiator socket
2/10 20:37:08 (pid:23977)

2/10 20:37:08 (pid:23977) Reusing prioritized runnable job list because nothing has changed.
2/10 20:37:08 (pid:23977) Job 59.0: is runnable
2/10 20:37:08 (pid:23977) Sent job 59.0 (autocluster=0)
2/10 20:37:08 (pid:23977) Job 59.0 rejected: no match found
2/10 20:37:08 (pid:23977) Out of servers - 0 jobs matched, 2 jobs idle, 1 jobs rejected

2/10 20:38:08 (pid:23977) Reusing prioritized runnable job list because nothing has changed.
2/10 20:38:08 (pid:23977) Job 59.0: is runnable
2/10 20:38:08 (pid:23977) Sent job 59.0 (autocluster=0)
2/10 20:38:08 (pid:23977) In case PERMISSION_AND_AD
2/10 20:38:08 (pid:23977) Enqueued contactStartd startd=<192.168.111.31:9785>
2/10 20:38:08 (pid:23977) Job 60.0: is runnable
2/10 20:38:08 (pid:23977) Sent job 60.0 (autocluster=0)
2/10 20:38:08 (pid:23977) Job 60.0 rejected: no match found
2/10 20:38:08 (pid:23977) Out of servers - 1 jobs matched, 1 jobs idle, 1 jobs rejected 2/10 20:38:08 (pid:23977) In checkContactQueue(), args = 0xa2a3658, host=<192.168.111.31:9785>
2/10 20:38:08 (pid:23977) In Scheduler::contactStartd()

2/10 20:38:08 (pid:23977) Reusing prioritized runnable job list because nothing has changed.
2/10 20:38:08 (pid:23977) Job already matched
2/10 20:38:08 (pid:23977) Job 60.0: is runnable
2/10 20:38:09 (pid:23977) start next job after 0 sec, JobsThisBurst 0

I am using condor-7.2.3.

by
Johnson



Please do not print this email unless it is absolutely necessary.
The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
www.wipro.com
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



Please do not print this email unless it is absolutely necessary.
The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
www.wipro.com