[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] condor_rooster failing to crow
- Date: Mon, 11 Jan 2010 09:02:00 -0800
- From: Dan Bradley <dan@xxxxxxxxxxxx>
- Subject: Re: [Condor-users] condor_rooster failing to crow
Sorry to hear you are having difficulties. If it is caused by a bug,
I'll have to eat crow. Here are some things to help see where it might
be going wrong.
The setting of MachineLastMatchTime is initiated by the negotiator.
With D_FULLDEBUG turned on, you should see a line like the following in
Registering attempt to match offline machine MACHINE by USER.
This results in a MERGE_STARTD_AD command being sent to the collector.
If you have D_COMMAND turned on in the collector, you should see that
command being received in CollectorLog.
After that command has been received, the machine ad should contain
MachineLastMatchTime. You should be able to see that with condor_status
If something overwrites the offline machine ad, then
MachineLastMatchTime will go away until the next time the negotiator
sets it (i.e. the next negotiation cycle where a job matches the offline
Smith, Ian wrote:
I'm trying to use condor_rooster in Condor 7.4 to work with our Windows XP pool
but with only limited success. To keep comaptibility with our current power saving
set up I'm trying to avoid using the Condor power saving and intead I'm publishing
the ClassAds of offline machine via a cron so that condor_rooster can wake up
the relevant ones.
The crux of the matter seems to be in the UNHIBERNATE expression. In the documentation
(p 216) it states that the default value is MachineLastMatchTime =!= UNDEFINED although
I find that it is atually MY.MachineLastMatchTime =!= UNDEFINED. I've tried both and neither
seem to work as neither MachineLastMatchTime nor MY.MachineLastMatchTime seem
to be set. The manual says that
"the special attribute MachineLastMatchTime is updated in the ClassAds of offline machines
when the job would have been matched to the machine if it had been online"
but this doesn't seem to be happening. Using condor_q -ana reveals
019.009: Run analysis summary. Of 1 machines,
0 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match but are serving users with a better priority in the pool
0 match but reject the job for unknown reasons
0 match but will not currently preempt their existing job
1 match but are currently offline
0 are available to run your job
so the matchmaking is definitely working - it just seems that the machine ClassAd isn't
updated. If I set MachineLastMatchTime to some arbitrary value myself then
ROOSTER_UNHIBERNATE=Offline && Unhibernate
seems to evaluate to TRUE and the wake up kicks in.
I've tried D_FULLBEBUG but I still can't track down where the problem is.
Any ideas ?
Dr Ian C. Smith,
The University of Liverpool,
Computing Services Departmen
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at: