[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] power management: ROOSTER_UNHIBERNATE not working



Hi Justin,

the classadd of the hibernated machine needs to contain Offline = true 

what does `condor_status <hibernated-host> -af Offline` say ? 

The 2nd factor for unhibernating is that there needs to be a match for the machine: 

 MachineLastMatchTime =!= UNDEFINED

Best
christoph



-- 
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx

----- UrsprÃngliche Mail -----
Von: "Justin Killebrew via HTCondor-users" <htcondor-users@xxxxxxxxxxx>
An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
CC: "Justin Killebrew" <jk@xxxxxxx>
Gesendet: Mittwoch, 15. November 2023 16:58:54
Betreff: [HTCondor-users] power management: ROOSTER_UNHIBERNATE not working

Hello.

My test machine, bench7, is hibernating as configured but when I submit jobs that should match it, the rooster doesnât try to unhibernate.  

Relevant excerpts:

RoosterLog:
11/15/23 06:09:46 Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS
11/15/23 06:09:46 Will perform unhibernate checks every ROOSTER_INTERVAL=180 seconds.
11/15/23 09:42:12 Cock-a-doodle-doo! (Time to look for machines to wake up.)
11/15/23 09:42:12 Trying to query collector <127.0.1.1:9618?alias=bench12.timehole.org>
11/15/23 09:42:12 Got 0 startd ads matching ROOSTER_UNHIBERNATE=Offline


PersistentAdLog:
103 <slot1@xxxxxxxxxxxxxxxxxxx> Offline true


bench7 config:
# Power management
HIBERNATE_CHECK_INTERVAL = 300
#  (2 * $(HOUR))
TimeToWait  = 300
ShouldHibernate = (    (State == "Unclaimed") \
                    && ($(StateTimer) > $(TimeToWait)) \
                    && (KeyboardIdle > $(TimeToWait)))
# this param is passed to the script so use the string "S5"
HibernateState  = "S5"
# 
HIBERNATE = ifThenElse( $(ShouldHibernate), $(HibernateState), "NONE" )
# point to my hibernation script
use HIBERNATION_PLUGIN = "/home/justin/jkcode/scripts/JKSuspend.sh"


CM config:
COLLECTOR_PERSISTENT_AD_LOG = /var/log/condor/PersistentAdLog
ABSENT_REQUIREMENTS = ( (HibernationLevel?:0) == 0 )
EXPIRE_INVALIDATED_ADS = True
CLASSAD_LIFETIME = 900
# 604800s is 7 days
ABSENT_EXPIRE_ADS_AFTER = 604800
OFFLINE_EXPIRE_ADS_AFTER = 604800
ROOSTER_INTERVAL = 180
ROOSTER_DEBUG = D_FULLDEBUG
ROOSTER_UNHIBERNATE = Offline


Is this the problem:

11/15/23 09:42:12 Got 0 startd ads matching ROOSTER_UNHIBERNATE=Offline

How do I troubleshoot and fix this?

Thanks,
JK



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/