I have a cluster of 6.6.9 on W2k3. I have several jobs that were running and we removed (condor_rm), but after removal they stayed as an 'X' in the queue. An analysis of the queue said they were being removed. While in this state, the node's they were on were stuck being claimed with idle status. After leaving it a week I did a condor_rm -forcex. Now that removed them from the queue, but the nodes are still claimed. Looking in the schedd log I have this
Zombie process has not been cleaned up by reaper - pid 1300
How can I get the nodes unclaimed? Later I'll try to figure out how I got into this problem.
CONFIDENTIAL AND PRIVILEGED INFORMATION NOTICE
This e-mail, and any attachments, may contain information that
is confidential, subject to copyright, or exempt from disclosure.
Any unauthorized review, disclosure, retransmission,
dissemination or other use of or reliance on this information
may be unlawful and is strictly prohibited.
AVIS D'INFORMATION CONFIDENTIELLE ET PRIVILÉGIÉE
Le présent courriel, et toute pièce jointe, peut contenir de
l'information qui est confidentielle, régie par les droits
d'auteur, ou interdite de divulgation. Tout examen,
divulgation, retransmission, diffusion ou autres utilisations
non autorisées de l'information ou dépendance non autorisée
envers celle-ci peut être illégale et est strictement interdite.