Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] condor completely stuck?
- Date: Tue, 22 Jul 2008 12:58:18 +0200
- From: Steffen Grunewald <steffen.grunewald@xxxxxxxxxx>
- Subject: Re: [Condor-users] condor completely stuck?
On Tue, Jul 22, 2008 at 09:22:39AM +0200, Steffen Grunewald wrote:
> One of the users, shortly after finding that his jobs weren't doing what
> they were expected to do, did a "condor_rm" of all of them. condor_q didn't
> come back afterwards, and shutting down Condor using the init script doesn't
> work anymore.
> These are the processes still around:
>
> # ps auxw | grep condor
> root 13082 0.0 0.1 15584 2688 ? S Jul21 0:00 condor_preen -m -r
> uglyuser 16836 0.0 0.4 25488 9368 ? Ss Jul19 0:09 condor_schedd -f
> root 16837 0.0 0.1 12220 2904 ? S Jul19 1:12 condor_procd -A /usr/share/condor/local/log/procd_pipe.SCHEDD -C 666
> condor 18150 0.0 0.1 18416 3736 ? Ss Jul03 25:35 /usr/sbin/condor_master
> root 19007 0.0 0.1 15584 2688 ? S Jul20 0:00 condor_preen -m -r
> root 25352 0.0 0.1 15584 2688 ? S Jul19 0:00 condor_preen -m -r
> root 30283 0.0 0.1 16380 3052 pts/3 S+ 09:16 0:00 condor_q -glo
After some time they disappeared; not sure whether gracefully or because I
shot several signals at the condor_master process...
> The last entries in the SchedLog are from a restart of condor_schedd.
I checked /proc/${PID}/fd and strace'd condor_schedd, to find that it was
referring to a log file in the user's space. I moved that away, and the
problem disappeared.
There should be a better way to "low-level condor_rm" before Condor is
started up - old jobs would stay referenced in hidden places and show
up again otherwise.
> This is 7.0.1; is this a known issue of that version, and would upgrading
> fix it? (& when will 7.0.4 be out? :)
Going to check the release notes...
Steffen