[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] misconfigured node



On Tue, Jun 24, 2014 at 11:14 AM, Michael Di Domenico
<mdidomenico4@xxxxxxxxx> wrote:
> i'm looking at tackling this issue currently and would be interested
> in any scripts or thoughts on how best to do the tests.  i'm a little
> leary as i understand a bunch of filesystem tests at the linux level
> will effectively hang on the IO's and never return.  if condor cron
> were setup to cycle every 1s and check the mount, i could see a stack
> of processes backing up

You'd probably want some kind of babysitter in your check so if the
test doesn't return after some defined time period, the check process
is killed. It is a little messy, though, so if you see this pretty
rarely, it might just be more worthwhile to let the black holes happen
and clean the up as you catch them.


Thanks,
BC

-- 
Ben Cotton
main: 888.292.5320

Cycle Computing
Leader in Utility HPC Software

http://www.cyclecomputing.com
twitter: @cyclecomputing