[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor losing stored credentials



It's just occurred again now. I looked immediately in SchedLog on the submit machine and found this:

03/02 13:16:07 (pid:3940) Failed to log in wibt@GHOSTVILLE with err=1311

Does anyone know what that error number indicates?

thanks,
 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of William Brodie-Tyrrell
Sent: Wednesday, 2 March 2011 10:52 AM
To: 'Condor-Users Mail List'
Subject: [Condor-users] Condor losing stored credentials

Hi all,
 
I have a program that produces condor jobs, waits for some to finish, inspects results and produces more jobs until convergence of a statistic is reached.  It sits in the background and runs for weeks, keeping a cluster occupied until it's done, which means it's running condor_submit about 5000 times per day (half-hour jobs, 100 slots).
 
The problem I'm having is that occasionally, Condor will lose its stored credentials (we're using windows auth but not run-as-owner) and cause condor_submit to fail.  Usually if I wait half an hour then run condor_store_cred add, all is good again but obviously I can't do that in the middle of the night and I don't want it to stop running overnight.
 
Sometimes, condor_store_cred itself fails with a "Bad password" error, even though the password is perfectly good.  I've had a few situations where store_cred is refusing to accept the correct password though authentication in general is working on the rest of the network, i.e. I can login to machines, etc.
 
Has anyone else seen this problem or know of a solution?
 
Can someone tell me the circumstances under which Condor decides to discard the stored credentials?  If there's some sort of transient issue with windows auth on our domain, that might help me look for it.  It looks like there is some sort of transient thing going on where condor fails to authenticate, which (I think) is the common cause for condor_submit to discard credentials and for condor_store_cred to refuse to accept a new password.  And it's only condor that's having this problem.
 
 
(This is 7.4.2.  I realise that there's now 7.4.4 released but getting an upgrade past the Config Control Board is hard because this is on a classified network.  I can certainly do an upgrade though if this is a known bug that has been fixed).
 
 
thanks,
 
--

William Brodie-Tyrrell, B.E, Ph.D

Systems Engineer                                      

Modelling & Analysis

 

Direct + 61 8 8343 3376

william.brodie-tyrrell@xxxxxxxxxxxxxxxx

 

Saab Systems                                                 

21 Third Avenue, Mawson Lakes

SA 5095 Australia                                             

www.saabsystems.com.au

------------------------

This e-mail is private and confidential between the sender and the addressee.

In the event of misdirection, the recipient is prohibited from using, copying or

disseminating it or any information in it. Please notify the above if any misdirection