[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problems with Windows jobs running indefinitely!



Hi folks,

Thanks for all the help.  I started off by following Ian's advice and
turned on the visible desktop setting (thanks for the extra help Ian!).
This then revealed my culprit - a modelling error prompt!!  So my
problems lie elsewhere outside of Condor, because if I clicked the
prompt the process then ran fine and I got my results back on the
submitting machine.  :o)
FYI if you are a bit of newbie like me - Instructions on how to turn on
USE_VISIBLE_DESKTOP:

It is configuration setting that applies to the condor_startd daemon.
It's default value is false and because it's really only for debugging
it's not going to be found in the default condor_config files that your
machines have. Just add it to the condor_config file on the machine(s)
you want to debug on:
 
    USE_VISIBLE_DESKTOP = True
 
And then restart Condor on that machine - type in condor_restart in a
DOS prompt.
Again info supplied by Ian - Thanks!

Chris

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: 14 July 2008 14:45
To: Condor-Users Mail List
Subject: Re: [Condor-users] Problems with Windows jobs running
indefinitely!

> I managed to get a windows Condor environment working fine on a simple

> multi pc isolated network using a common login for all pcs.
> I am now attempting to get Condor to work across a corporate 
> network.....!  Well I can see the slots in the pool and can 
> successfully submit jobs from one PC to a head node and the jobs get 
> assigned to selected slots (aren't ClassADs useful!).  However, the 
> jobs run indefinitely - last one I stopped after 4 days (the test 
> model run is only a 15 minute task!).  Key files are meant to be 
> transferred from (model input files) and to (model results file) the 
> local drive of the submitting PC, and I have added my windows AD user 
> ID/password using condor_store_cred to all machines in question (just 
> in case!).  Is this 'hanging' behaviour permissions related or 
> possibly something else?  I am using Condor version 7.0.1.
> Any help would be gratefully received!

Chris, I can't offer you any direct help but here are some tips for
debugging the problem. Windows makes running batch programs particularly
annoying because of it's security model and its insistence that even
batch, command line programs should generate graphical warnings and
dialog boxes. Keeps us in jobs though! :)

Download Process Explorer from Microsoft and install it on one of your
clients where you jobs are running. You can use this to take a better
look at the job processes:

http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx

Check to see if the job is actually taking up any CPU. My hunch is your
jobs aren't running indefinitely but waiting indefinitely for something.

They might be producing a pop-up Window (like a missing DLL error for
example) that's not visible (because Condor by default doesn't run the
jobs in a visible desktop) that needs to get clicked.

To check for the pop up windows problem set your machines to 'use a
visible desktop' -- this'll tell Condor to run the jobs on the desktop
of the logged in user. You'll see cmd windows pop up on the desktop when
Condor starts to run the jobs and you'll be able to see if they're
producing pop ups that are causing your softare to hang indefinitely.
You can learn more about USE_VISIBLE_DESKTOP here:

http://www.cs.wisc.edu/condor/manual/v7.0/3_3Configuration.html#14350

That should get you started. Good luck!

- Ian


Confidentiality Notice.
This message may contain information that is confidential or otherwise
protected from disclosure. If you are not the intended recipient, you
are hereby notified that any use, disclosure, dissemination,
distribution,  or copying  of this message, or any attachments, is
strictly prohibited.  If you have received this message in error, please
advise the sender by reply e-mail, and delete the message and any
attachments.  Thank you.

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

The information contained in this email may be privileged and/or confidential. If you are not the intended recipient, use of this information (including disclosure, copying or distribution) may be unlawful, therefore please inform the sender and delete the message immediately. The views expressed in this email are not necessarily those held by ABP Marine Environmental Research Ltd who do not accept liability for any action taken in reliance on the contents of this message (other than where the company has a legal or regulatory obligation to do so) or for the consequences of any computer viruses which may have been transmitted by this email.

ABPmer is a wholly owned subsidiary of Associated British Ports, Registered in England number 1956748, Registered Office 150 Holborn London EC1N 2LR.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *