[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problems with Windows jobs running indefinitely!



> I managed to get a windows Condor environment working fine on
> a simple multi pc isolated network using a common login for all pcs.
> I am now attempting to get Condor to work across a corporate
> network.....!  Well I can see the slots in the pool and can
> successfully submit jobs from one PC to a head node and the
> jobs get assigned to selected slots (aren't ClassADs
> useful!).  However, the jobs run indefinitely - last one I
> stopped after 4 days (the test model run is only a 15 minute
> task!).  Key files are meant to be transferred from (model
> input files) and to (model results file) the local drive of
> the submitting PC, and I have added my windows AD user
> ID/password using condor_store_cred to all machines in
> question (just in case!).  Is this 'hanging' behaviour
> permissions related or possibly something else?  I am using
> Condor version 7.0.1.
> Any help would be gratefully received!

Chris, I can't offer you any direct help but here are some tips for
debugging the problem. Windows makes running batch programs particularly
annoying because of it's security model and its insistence that even
batch, command line programs should generate graphical warnings and
dialog boxes. Keeps us in jobs though! :)

Download Process Explorer from Microsoft and install it on one of your
clients where you jobs are running. You can use this to take a better
look at the job processes:

http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx

Check to see if the job is actually taking up any CPU. My hunch is your
jobs aren't running indefinitely but waiting indefinitely for something.

They might be producing a pop-up Window (like a missing DLL error for
example) that's not visible (because Condor by default doesn't run the
jobs in a visible desktop) that needs to get clicked.

To check for the pop up windows problem set your machines to 'use a
visible desktop' -- this'll tell Condor to run the jobs on the desktop
of the logged in user. You'll see cmd windows pop up on the desktop when
Condor starts to run the jobs and you'll be able to see if they're
producing pop ups that are causing your softare to hang indefinitely.
You can learn more about USE_VISIBLE_DESKTOP here:

http://www.cs.wisc.edu/condor/manual/v7.0/3_3Configuration.html#14350

That should get you started. Good luck!

- Ian


Confidentiality Notice.
This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,  or copying  of this message, or any attachments, is strictly prohibited.  If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.