[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] In Dynamic slot to match a new job to slot1.2 need condor_negotiator



Is there a combination of these configs that make it work for you?

NEGOTIATOR_IGNORE_USER_PRIORITIES = TRUE
NEGOTIATOR_MATCHLIST_CACHING = FALSE

If there was a requirements mismatch does condor_q -better-analyze give any hints?

Best,


matt

Sateesh Potturu wrote:
Hi Matt,

But why is this job not getting started with "no match found" reported
in negotiator log?

I too tested this feature and face the same problem even though there
were a lot of negotiation cycles.

1/9 21:56:27 ---------- Started Negotiation Cycle ----------
1/9 21:56:27 Phase 1:  Obtaining ads from collector ...
1/9 21:56:27   Getting all public ads ...
1/9 21:56:27 Trying to query collector <192.168.2.100:9618>
1/9 21:56:27   Sorting 5 ads ...
1/9 21:56:27   Getting startd private ads ...
1/9 21:56:27 Trying to query collector <192.168.2.100:9618>
1/9 21:56:27 Got ads: 5 public and 1 private
1/9 21:56:27 Public ads include 1 submitter, 1 startd
1/9 21:56:27 Entering compute_significant_attrs()
1/9 21:56:27 Leaving compute_significant_attrs() - result=JobUniverse,LastCheckp
1/9 21:56:27 Phase 2:  Performing accounting ...
1/9 21:56:27 Phase 3:  Sorting submitter ads by priority ...
1/9 21:56:27 Phase 4.1:  Negotiating with schedds ...
1/9 21:56:27     NumStartdAds = 1
1/9 21:56:27     NormalFactor = 1.000000
1/9 21:56:27     MaxPrioValue = 0.557410
1/9 21:56:27     NumScheddAds = 1
1/9 21:56:27   Negotiating with sateesh@xxxx at <192.168.2.100:3538
1/9 21:56:27 0 seconds so far
1/9 21:56:27   Calculating schedd limit with the following parameters
1/9 21:56:27     ScheddPrio       = 0.557410
1/9 21:56:27     ScheddPrioFactor = 1.000000
1/9 21:56:27     scheddShare      = 0.000000
1/9 21:56:27     scheddAbsShare   = 1.000000
1/9 21:56:27     ScheddUsage      = 3
1/9 21:56:27     scheddLimit      = 0
1/9 21:56:27     userprioCrumbs   = 0 (0)
1/9 21:56:27     MaxscheddLimit   = 0
1/9 21:56:27 Socket to <192.168.2.100:35388> already in cache, reusing
1/9 21:56:27     Over submitter resource limit (0) ... only consider startd rank
1/9 21:56:27     Sending SEND_JOB_INFO/eom
1/9 21:56:27     Getting reply from schedd ...
1/9 21:56:27     Got JOB_INFO command; getting classad/eom
1/9 21:56:27     Request 00129.00000:
1/9 21:56:27       Rejected 129.0 sateesh@xxxx <192.168.2.100:35388
1/9 21:56:27     Sending SEND_JOB_INFO/eom
1/9 21:56:27     Getting reply from schedd ...
1/9 21:56:27     Got NO_MORE_JOBS;  done negotiating
1/9 21:56:27   This schedd hit its scheddlimit.
1/9 21:56:27 ---------- Finished Negotiation Cycle ----------

--
Regards,
Sateesh

On Fri, Jan 9, 2009 at 7:48 PM, Matthew Farrellee <matt@xxxxxxxxxx> wrote:
Johnson koil Raj wrote:
Hi,

 I am using condor 7.2.0, and configured system for Dynamic slot.

when I submit 2 job at if the status shows Slot1@xxx it match only one
job to Slot1.1@xxx and for second job says
1 match but reject the job for unknown reasons
and negotiator log says following

1/9 19:03:06 Socket to <192.168.111.5:9661> already in cache, reusing
1/9 19:03:06     Over submitter resource limit (0) ... only consider
startd ranks
1/9 19:03:06     Sending SEND_JOB_INFO/eom
1/9 19:03:06     Getting reply from schedd ...
1/9 19:03:06     Got JOB_INFO command; getting classad/eom
1/9 19:03:06     Request 00053.00000:
1/9 19:03:06 Concurrency Limit: ccp is 3.000000
1/9 19:03:06       Rejected 53.0 idealgrid@xxxxxxxxxxxxxxxxx
<192.168.111.5:9661>: no match found
1/9 19:03:06     Sending SEND_JOB_INFO/eom
1/9 19:03:06     Getting reply from schedd ...
1/9 19:03:06     Got NO_MORE_JOBS;  done negotiating
1/9 19:03:06   This schedd hit its scheddlimit.
1/9 19:03:06 ---------- Finished Negotiation Cycle ----------


After restarting the negotiator the second job perfectly matches and get
executed in a Slot1.2@xxx machine that time the negotiator log says

1/9 19:12:05 Socket to <192.168.111.5:9661> not in cache, creating one
1/9 19:12:05 SocketCache:  Found unused slot 0
1/9 19:12:05     Sending SEND_JOB_INFO/eom
1/9 19:12:05     Getting reply from schedd ...
1/9 19:12:05     Got JOB_INFO command; getting classad/eom
1/9 19:12:05     Request 00053.00000:
1/9 19:12:05 Concurrency Limit: ccp is 3.000000
1/9 19:12:05       Connecting to startd slot1@xxx at
<192.168.111.200:9619>
1/9 19:12:05 File descriptor limits: max 1024, safe 820
1/9 19:12:05       Sending PERMISSION, claim id, startdAd to schedd
1/9 19:12:05       Matched 53.0 idealgrid@xxxxxxxxxxxxxxxxx
<192.168.111.5:9661> preempting none <192.168.111.200:9619> slot1@xxx

Why I Negotiator restart required to match the second Job, Help me in
this..

by
Johnson
It's not required. Slot1 is only split once per negotiation cycle. So
you'll get 1.1 after 1 cycle and 1.2 after a second. Your restart just
forced a second cycle.

Best,


matt

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/