there would be a configure file /etc/condor/.config.STARTD.nodeonline created with 'NodeOnline' defined there. We set the default value of NodeOnline to False to avoid accidentally putting a new node online. We also configured our HTCondor pool to allow
the head node/admin node to use condor_config_val and condor_reconfig to change the value of NodeOnline and update it. You can also modify the file /etc/condor/.config.STARTD.nodeonline directly. Of course there are security risks in this way, but it is easier
for us to manage a large cluster.
In your case, I am not sure whether you need to add MAINTENANCE_MODE into the list of STARTD_ATTRS.
Was wondering if someone can help me trace an issue Iâm having, but also maybe let me know if my approach in general is terrible/if there is a better way to accomplish what I am trying.
So first off, I am wanting to find a way to set a condor node into a âmaintenance modeâ basically where the node will stop taking new jobs, but let what is already running to finish, for example if I need to reboot the nodes of a cluster
and donât want to interrupt running jobs. My thought was that I just need to set START = FALSE is some manner, and for a time, we could do just that and push a config.local file with that change to the startd nodes. However, wanting to make this a bit more
automated, the idea I had was to change START to something like
START = TRUE && ! MAINTENANCE_MODE
Where MAINTENANCE_MODE was a variable defined in /etc/condor/config.d/maintenance.conf like so:
MAINTENANCE_MODE = FALSE
That way I just need to have a script/config management just drop a new maintenance.conf file and not worry about blowing away any settings in the .local config file.
However, I cannot get jobs to run when MAINTENANCE_MODE = FALSE, almost as if the START statement is not getting evaluated correctly. I tried even putting the MAINTENANCE_MODE variable in the same .local file thinking maybe it had something
to do with the external file. But nothing has allowed jobs to run when the node is out of maintenance mode. As soon as I set START = TRUE again and run condor_reconfig, jobs launch.
I confirmed the syntax should be correct with:
classad_eval -file /etc/condor/config.d/maintenance.conf 'TRUE && !MAINTENANCE_MODE'
[ MAINTENANCE_MODE = false ]
So I must be missing something about how START gets evaluated, orâ?
For the record, I do know that I can use something like `condor_off âstartd âpeaceful` only reason I donât want to depend on this is if I am installing updates and will need a few reboots, the service will restart after a reboot. If there
is a better way I can accomplish this, Iâm happy to scrap what I am working on above.
This is all on condor version 8.8.17
Thanks in advance!
Computational System Analyst
Engineering IT Shared Services
University of Illinois @ Urbana-Champaign
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at: