On Oct 7, 2013, at 4:09 AM, daniel popu <dpopu@xxxxxxxxx> wrote:
Can you provide more details on Condor being "blocked"? Which daemons and/or commands are causing trouble? How often does this happen? Do the affected daemons continue to write to their log files?
If possible, we'd like to diagnose the underlying problem, whether it's a bug in HTCondor or a problem with a resource it's trying to use.
HTCondor has a mechanism to deal with "blocked" daemons. Each daemons sends an "alive" message to the condor_master on a regular basis. If the master doesn't receive any messages for an hour, it will kill that daemon and restart it.