Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] freezing Condor tasks for maintenance?

Date: Fri, 30 Nov 2018 17:13:52 +0100
From: Thomas Hartmann <thomas.hartmann@xxxxxxx>
Subject: [HTCondor-users] freezing Condor tasks for maintenance?

Hi all,

has somebody experiences with freezing whole Condor process trees for
node maintenance...?

background: we had to do a bit of hand work on a few jobs, where mounts
required a bit of attentions. On the way it seemed to be nice to freeze
the jobs, so to be able to work on mounts without affecting jobs, i.e.,
if a mount disappears for a moment.

Playing on a test node, it seems that one can add the Condor process
tree to a freezer cgroup and hibernate it for some time without
affecting the daemons health (provided that the freeze is sufficiently
short not to be assumed dead by the collector)
But maybe somebody has already experiences if it works for real-life
scenarios with user jobs, which might be more sensible to freeze, or how
the system reacts if a full node reappears with all jobs after being
absent for too long (and jobs got already resubmitted)?

Ideally, it would be nice to have frozen processes to survive a reboot,
but so far my attempts with CRIU [https://criu.org] where not very
successful (probably it works better with binaries than shell scripts
started in an active session...?)

Cheers,
  Thomas

[1]
> mkdir /sys/fs/cgroup/freezer/mycondorfreeze/
> while read X; do echo ${X} >>
/sys/fs/cgroup/freezer/mycondorfreeze/tasks; done <
/sys/fs/cgroup/memory/system.slice/condor.service/tasks
> cat /sys/fs/cgroup/freezer/mycondorfreeze/freezer.state
THAWED
> echo FROZEN > /sys/fs/cgroup/freezer/mycondorfreeze/freezer.state
...wait...
> echo THAWED > /sys/fs/cgroup/freezer/mycondorfreeze/freezer.state

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Follow-Ups:
- Re: [HTCondor-users] freezing Condor tasks for maintenance?
  - From: Greg Thain

Prev by Date: Re: [HTCondor-users] Fair-share limits reached while there are whole machines are available and idle jobs
Next by Date: Re: [HTCondor-users] freezing Condor tasks for maintenance?
Previous by thread: Re: [HTCondor-users] jobs stuck; cannot get rid of them.
Next by thread: Re: [HTCondor-users] freezing Condor tasks for maintenance?
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[HTCondor-users] freezing Condor tasks for maintenance?