[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor claims jobs running forever (never terminate)



We're running a Windows pool using a Windows 2003 Server as Central Manager, another Windows 2003 Server as a dedicated submit node and a bunch of XP boxes for job execution. Condor version is 6.8.4 throughout all nodes.

Now I have submitted 351 jobs of which each should take about 50 minutes. 350 of them executed and terminated properly, while the last one has been kind of "stuck" for over two hours now. The execution node is still in "claimed" state and the job is marked as executing although all output data has already been transferred back to the submit node and the process is no longer running on the execute node! It seems as if Condor just loves the job and doesn't want to release it :-)

I have experienced this problem quite a few times with Condor 6.6.* in the past which was actually my primary reason for updating our pool to 6.8. Now I just can't imagine me being the only one running into that problem and would guess it should be a somewhat well-known problem...

Does anyone have a clue?

Thanks,


Thorsten



Jetzt Mails schnell in einem Vorschaufenster überfliegen. Dies und viel mehr bietet das neue Yahoo! Mail .