[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Jobs hang after "condor_exec.exe" completed



Hello All,

We have a small cycle-scavenging set-up running in our lab on OSX &
linux machines that has been working very well, however we wanted to
extend the available cycles by including windows machines. The
majority of our lab programs in MATLAB, so we have gone with the fully
compiled option, whereby we create an .EXE that is run on the execute
node.

In the first few computer's that we've tried to implement as windows
execute nodes the job seems to hang (according to condor_q) for an
absurd amount of time. Watching the process manager on the execute
node, condor_exec.exe (running as user "condor -reuse-slot1") ends
after an appropriate amount of time (~2.5mins) , but the job isn't
listed as complete by condor_q until about 9.5 minutes, resulting in a
good portion of the available cycles wasted. The log, out, and error
files seem to be blank except for noting that the job starts and ends.

Does anyone have any ideas on where to start looking for the source of
the 7 minute hang?

Thanks,
-- Bill