[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] rooster on linux, take 3



On 11/29/2011 08:41 AM, Dan Bradley wrote:
> 
> After testing to see which machines match the job, the negotiator sorts
> the matching machines and chooses the most desirable one.  If it chooses
> an offline machine, it should inform the collector and update
> MachineLastMatchTime.  Can you confirm from your negotiator log whether
> it is choosing the offline machine or not?  From the log you posted, I
> can only see that the offline machine was selected as a candidate, not
> whether it was actually chosen.

This is unfortunately useless on a real-life pool. I'm getting close to
a hundred meg of


11/29/11 15:34:40 Job 977853.0 does match with slot8@xxxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot9@xxxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot1@xxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot2@xxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot3@xxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot4@xxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot1@xxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot2@xxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot1@xxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot2@xxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot1@xxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot2@xxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot1@xxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot3@xxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot2@xxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot4@xxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot3@xxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot4@xxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with
slot10@xxxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot5@xxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with
slot11@xxxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with
slot12@xxxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot6@xxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with
slot13@xxxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot7@xxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with
slot14@xxxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot8@xxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with
slot15@xxxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with
slot16@xxxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot1@xxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot2@xxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot1@xxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot2@xxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot3@xxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot4@xxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot1@xxxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot1@xxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot2@xxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot2@xxxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot3@xxxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot4@xxxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot5@xxxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot6@xxxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40 Job 977853.0 does match with slot7@xxxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40       Rejected 977853.0 bmrbgrid@xxxxxxxxxxxxx
<144.92.167.254:9617?sock=13250_c2fa_3>: no match found
11/29/11 15:34:40     Got NO_MORE_JOBS;  done negotiating
11/29/11 15:34:40 Phase 4.2:  Negotiating with schedds ...

-- then a job ad, then another list of "does match" slots, then

11/29/11 15:34:40 Job 977853.0 does match with slot7@xxxxxxxxxxxxxxxxxxxxxxx
11/29/11 15:34:40       Rejected 977853.0 bmrbgrid@xxxxxxxxxxxxx
<144.92.167.254:9617?sock=13250_c2fa_3>: no match found
11/29/11 15:34:40     Got NO_MORE_JOBS;  done negotiating
11/29/11 15:34:40  negotiateWithGroup resources used scheddAds length 0
11/29/11 15:34:40 ---------- Finished Negotiation Cycle ----------

Without any visible indication as to why "no match found": falcon and
robin are off-line.

I did manage to get one hibernating machine (falcon: the 1st one in
alphabetical order of hostnames) to wake up once today, it ran jobs for
maybe 5-10 minutes and went back to sleep. The other one (robin) never
woke up at all.

If I could log only negotiation for one specific user, maybe I could
then find something in there. As it is, I've already spent more time
than I can afford on this and I see no light at the end of the tunnel.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Attachment: signature.asc
Description: OpenPGP digital signature