[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HAS_CVMFS_something ?



On Tue, 2023-05-30 at 16:27:35 +0200, Steffen Grunewald wrote:
> On Mon, 2023-04-24 at 12:33:50 +0000, Bockelman, Brian wrote:
> > Hi Steffan,
> > 
> > Feel free to borrow as needed.  We have a goal of extracting commonly-useful logic from the OSPool and putting it into HTCondor itself (as an option); but that's more of a "TODO this year" rather than "it already exists".
> 
> Hallo Brian,
> 
> I felt free, and I borrowed (and renamed).
> 
> Now I'm getting HAS_CVMFS_... = False (something must have hung the CVMFS automounter),
> and despite the settings just copied from the first section of the script, it doesn't
> get re-run every 30 minutes it seems:
> 
> STARTD_CRON_OSG_ARGS = NONE
> STARTD_CRON_OSG_EXECUTABLE = /etc/condor/modules/osg-node
> STARTD_CRON_OSG_KILL = true
> STARTD_CRON_OSG_MODE = periodic
> STARTD_CRON_OSG_PERIOD = 30m
> STARTD_CRON_OSG_RECONFIG = true
> 
> Except for the time when the whole condor service was restarted days ago, I cannot find
> a single hint in the log files of the machine that the cron task would have been run
> again.

Adding a few "logger" commands to the shell script, I found that the executable had been
indeed run every half hour - but the HAS_CVMFS_* statuses were never changed.
(On the way there I found that the script was run in /var/log/condor ... I think this
isn't mentioned in the docs?)

Currently it looks like the logic never recovers from a malfunction of CVMFS, and even
waiting more than 4 hours didn't make a difference. I had to wait for the machine to
become idle, and restart the condor service, to fully recover and get the HAS_CVMFS_*
ad details set to "true" again.
This may be intended this way?

Thanks,
 Steffen