[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor losing stored credentials



If this is a win32 error code it means ERROR_NO_LOGON_SERVERS

Condor Error codes
https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=MagicNumbers

Win32 Error codes
http://msdn.microsoft.com/en-us/library/ms681381%28v=vs.85%29.aspx

I will dig around some more today and see if I can find anything. 


mike





From:
William Brodie-Tyrrell <William.Brodie-Tyrrell@xxxxxxxxxxxxxxxx>
To:
"'Condor-Users Mail List'" <condor-users@xxxxxxxxxxx>
Date:
03/01/2011 08:17 PM
Subject:
Re: [Condor-users] Condor losing stored credentials
Sent by:
condor-users-bounces@xxxxxxxxxxx



It's just occurred again now. I looked immediately in SchedLog on the 
submit machine and found this:
03/02 13:16:07 (pid:3940) Failed to log in wibt@GHOSTVILLE with err=1311
Does anyone know what that error number indicates?
thanks,
 
From: condor-users-bounces@xxxxxxxxxxx [
mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of William 
Brodie-Tyrrell
Sent: Wednesday, 2 March 2011 10:52 AM
To: 'Condor-Users Mail List'
Subject: [Condor-users] Condor losing stored credentials

Hi all,
 
I have a program that produces condor jobs, waits for some to finish, 
inspects results and produces more jobs until convergence of a statistic 
is reached.  It sits in the background and runs for weeks, keeping a 
cluster occupied until it's done, which means it's running condor_submit 
about 5000 times per day (half-hour jobs, 100 slots).
 
The problem I'm having is that occasionally, Condor will lose its stored 
credentials (we're using windows auth but not run-as-owner) and cause 
condor_submit to fail.  Usually if I wait half an hour then run 
condor_store_cred add, all is good again but obviously I can't do that in 
the middle of the night and I don't want it to stop running overnight.
 
Sometimes, condor_store_cred itself fails with a "Bad password" error, 
even though the password is perfectly good.  I've had a few situations 
where store_cred is refusing to accept the correct password though 
authentication in general is working on the rest of the network, i.e. I 
can login to machines, etc.
 
Has anyone else seen this problem or know of a solution?
 
Can someone tell me the circumstances under which Condor decides to 
discard the stored credentials?  If there's some sort of transient issue 
with windows auth on our domain, that might help me look for it.  It looks 
like there is some sort of transient thing going on where condor fails to 
authenticate, which (I think) is the common cause for condor_submit to 
discard credentials and for condor_store_cred to refuse to accept a new 
password.  And it's only condor that's having this problem.
 
 
(This is 7.4.2.  I realise that there's now 7.4.4 released but getting an 
upgrade past the Config Control Board is hard because this is on a 
classified network.  I can certainly do an upgrade though if this is a 
known bug that has been fixed).
 
 
thanks,
 
-- 
William Brodie-Tyrrell, B.E, Ph.D
Systems Engineer 
Modelling & Analysis
 
Direct + 61 8 8343 3376
william.brodie-tyrrell@xxxxxxxxxxxxxxxx
 
Saab Systems 
21 Third Avenue, Mawson Lakes
SA 5095 Australia 
www.saabsystems.com.au
------------------------
This e-mail is private and confidential between the sender and the 
addressee. 
In the event of misdirection, the recipient is prohibited from using, 
copying or 
disseminating it or any information in it. Please notify the above if any 
misdirection
 _______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/