[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_rooster failing to crow



Ian,

Sorry to hear you are having difficulties. If it is caused by a bug, I'll have to eat crow. Here are some things to help see where it might be going wrong.

The setting of MachineLastMatchTime is initiated by the negotiator. With D_FULLDEBUG turned on, you should see a line like the following in your NegotiatorLog:

Registering attempt to match offline machine MACHINE by USER.

This results in a MERGE_STARTD_AD command being sent to the collector. If you have D_COMMAND turned on in the collector, you should see that command being received in CollectorLog.

After that command has been received, the machine ad should contain MachineLastMatchTime. You should be able to see that with condor_status -long.

If something overwrites the offline machine ad, then MachineLastMatchTime will go away until the next time the negotiator sets it (i.e. the next negotiation cycle where a job matches the offline machine).

--Dan

Smith, Ian wrote:
Dear All,

I'm trying to use condor_rooster in Condor 7.4 to work with our Windows XP pool
but with only limited success. To keep comaptibility with our current power saving
set up I'm trying to avoid using the Condor power saving and intead I'm publishing
the ClassAds of offline machine via a cron so that condor_rooster can wake up
the relevant ones.

The crux of the matter seems to be in the UNHIBERNATE expression. In the documentation
(p 216) it states that the default value is MachineLastMatchTime =!= UNDEFINED although
I find that it is atually MY.MachineLastMatchTime =!= UNDEFINED. I've tried both and neither
seem to work as neither  MachineLastMatchTime nor  MY.MachineLastMatchTime seem
to be set. The manual says that
"the special attribute MachineLastMatchTime is updated in the ClassAds of offline machines
when the job would have been matched to the machine if it had been online"

but this doesn't seem to be happening. Using condor_q -ana reveals

019.009:  Run analysis summary.  Of 1 machines,
      0 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      0 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      1 match but are currently offline
      0 are available to run your job

so the matchmaking is definitely working - it just seems that the machine ClassAd isn't
updated. If I set MachineLastMatchTime to some arbitrary value myself then

ROOSTER_UNHIBERNATE=Offline && Unhibernate

seems to evaluate to TRUE and the wake up kicks in.

I've tried D_FULLBEBUG but I still can't track down where the problem is.

Any ideas ?

regards,

-ian.


--------------------------------------------
Dr Ian C. Smith,
e-Science Team,
The University of Liverpool,
Computing Services Departmen

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/