[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor service failed to stop



Hello!


I'm using several Windows-machines for condor pool deployment. Two of this machines work fine (version 7.0.0 and 7.2.1), but another two machines don’t work properly (with version 7.0.0 and 7.2.1 result is the same) with different behavior.


For the first machine: When I start condor service two processes are started – condor_master and condor_startd. But in several seconds after start (10-15 sec) condor_startd dies and condor_master became consume 50% of CPU. After that I can’t stop condor service. When I try to do this – I receive an error message about unable to stop service due to exceeded response time. It should be noted that condor_status on the central manager doesn’t show this machine in the list neither when the service is “running”, nor after my attempt to stop it.


For the second machine: Condor service starts properly and condor_status shows this machine in the list, but when I try to submit any job on this host directly (Machine == <hostname>) it became idle in the queue permanently. “condor_q –better-analyze” shows that one match exists but job is rejected for unknown reason. Corresponding machine is idle of course.


Thanks for any help in advance,

Pavel