[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Failed to run hibernation plugin



Hi,

easiest thing is replacing the powerplugin with something that is proven to work with your setup, we use this in the essence: 

(begin) 
if [[ $1 == ad ]]
then
    echo "HibernationMethod = \"DESY-utils\""
    HibernationMethod="DESY-utils"
    echo "HibernationRawMask = 8"
    HibernationRawMask="8"
    echo "HibernationSupportedStates = \"S5\""
    HibernationSupportedStates="S5"
fi

if [[ $@ == "set S5" ]]
then
sudo /sbin/poweroff
fi
(end)

Use HIBERNATION_PLUGIN = [ ... ] 

To point to your replacement script ...

best
christoph


-- 
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx

----- UrsprÃngliche Mail -----
Von: "Justin Killebrew via HTCondor-users" <htcondor-users@xxxxxxxxxxx>
An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
CC: "Justin Killebrew" <jk@xxxxxxx>
Gesendet: Mittwoch, 8. November 2023 15:24:14
Betreff: [HTCondor-users] Failed to run hibernation plugin

Hello.  Iâm wrestling with power management on Ubuntu 22.04.  The execution point StartLog shows this error:

11/08/23 08:30:44 ResMgr: This machine is about to enter hibernation
11/08/23 08:30:44 Failed to run hibernation plugin '/usr/libexec/condor/condor_power_state set S3': status = 63744

Hibernation is supported:

/usr/libexec/condor$ sudo ./condor_power_state ad
HibernationMethod = "/sys"
HibernationRawMask = 28
HibernationSupportedStates = "S3,S4,S5"

I can suspend/hibernate from the command line using:
$ sudo systemctl hibernate

But condor_power_state fails for both S3 and S4:

/usr/libexec/condor$ sudo ./condor_power_state -d set s4
11/08/23 08:24:55 LinuxHibernator: Error writing 'disk' to '/sys/power/state': Input/output error
condor_power_state: failed to switch the machine's power state.

Hereâs the relevant portion of the EP config:

# Power management
WOL_SUPPORTED = TRUE
HIBERNATE_CHECK_INTERVAL = 20
TimeToWait  = 120
ShouldHibernate = (    (State == "Unclaimed") \
                    && ($(StateTimer) > $(TimeToWait)) \
                    && ($(WOL_SUPPORTED)))
HibernateState  = "RAM"
HIBERNATE = ifThenElse( $(ShouldHibernate), $(HibernateState), "NONE" )


The central manager seems to be correct:
RoosterLog:
11/08/23 06:21:00 Will perform unhibernate checks every ROOSTER_INTERVAL=180 seconds.

And the relevant CM config:

# Rooster wakes nodes up
DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, ROOSTER, SHARED_PORT
COLLECTOR_PERSISTENT_AD_LOG = /var/log/condor/PersistentAdLog
ABSENT_REQUIREMENTS = ( (HibernationLevel?:0) == 0 )
EXPIRE_INVALIDATED_ADS = True
CLASSAD_LIFETIME = 900
# 604800s is 7 days
ABSENT_EXPIRE_ADS_AFTER = 604800
OFFLINE_EXPIRE_ADS_AFTER = 604800
ROOSTER_INTERVAL = 180
ROOSTER_UNHIBERNATE = ( Offline && Unhibernate )
ROOSTER_UNHIBERNATE_RANK = buf_cpuindex_avg


How do I debug condor_power_state?  Does condor_power_state support the "systemctl hibernateâ method?

Thanks,
JK




_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/