[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] jobs remain idle



Title: jobs remain idle
Hi All,

I have a condor pool with a Mac OSX Server central manager and a number of WinXP execute nodes running version 7.2.3. We’ve been running this configuration for some time and I’m not sure if there have been any changes to the nodes that might affect my problem. The problem is that when I submit Y number of jobs,  X of them start (where X=number of nodes), but the remaining ones (Y-X) will not start automatically after the X have finished, and continue to show idle in the queue. A condor_q –better-analyze shows the “X match but rejected for unknown reasons” message, and I don’t see anything in the log files that might give me a clue. I can do a condor_restart –all, and then resubmit the jobs. And I get the same problem again if the number of jobs exceed the number of nodes, so something is hanging condor up.

I have heard that there may be an issue with condor on the Mac, in the sense that, I have to do a condor_master after I do a restart on the central node, i.e., the condor_restart doesn’t work automatically on the Mac. (We’ve had to reboot the machines a couple of times due to power issues).

thanx
steve

--  
Stephen C. Upton
Research Associate
SEED (Simulation Experiments & Efficient Designs) Center for Data Farming
Naval Postgraduate School
Cell: 831-402-3888