[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor losing stored credentials



Hi,

I don't have any core-files that are less than about nine months old and I'm not even running credd anyway, not since I gave up on run-as-owner.

I had a look in the logs on the submit and master machines and didn't find anything around the time of the last failure (about 2 hours ago).

The store_cred manpage says it talks to schedd, which puts the credentials in the local registry, and I see nothing out of the ordinary in SchedLog.  Certainly doesn't look like any schedd have died.
 

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Michael O'Donnell
Sent: Wednesday, 2 March 2011 11:36 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Condor losing stored credentials

Look on the server that has the CRED service running, and navigate to the Condor log files. Check and see if there is a core.CREDD.WIN32 file. If so look at the file and let me know what you see. The issues I had were tied to the CRED service dying. The symptoms were exactly like the ones you are reporting, so the cause may be the same. Sometimes the CRED would permanently die, and sometimes it would not. The CREDD moving in and out of an operable state might cause the problems you are seeing (I do not think the actual credentials (at least not in my case) are discarded but rather the service is unable to communicate. Also look at your other log files for errors (e.g., shadow, schedd). Let me know what you can find and I can explain in more depth the errors I was having.

mike





From:
William Brodie-Tyrrell <William.Brodie-Tyrrell@xxxxxxxxxxxxxxxx>
To:
"'Condor-Users Mail List'" <condor-users@xxxxxxxxxxx>
Date:
03/01/2011 05:31 PM
Subject:
[Condor-users] Condor losing stored credentials Sent by:
condor-users-bounces@xxxxxxxxxxx



Hi all,
 
I have a program that produces condor jobs, waits for some to finish, inspects results and produces more jobs until convergence of a statistic is reached.  It sits in the background and runs for weeks, keeping a cluster occupied until it's done, which means it's running condor_submit about 5000 times per day (half-hour jobs, 100 slots).
 
The problem I'm having is that occasionally, Condor will lose its stored credentials (we're using windows auth but not run-as-owner) and cause condor_submit to fail.  Usually if I wait half an hour then run condor_store_cred add, all is good again but obviously I can't do that in the middle of the night and I don't want it to stop running overnight.
 
Sometimes, condor_store_cred itself fails with a "Bad password" error, even though the password is perfectly good.  I've had a few situations where store_cred is refusing to accept the correct password though authentication in general is working on the rest of the network, i.e. I can login to machines, etc.
 
Has anyone else seen this problem or know of a solution?
 
Can someone tell me the circumstances under which Condor decides to discard the stored credentials?  If there's some sort of transient issue with windows auth on our domain, that might help me look for it.  It looks like there is some sort of transient thing going on where condor fails to authenticate, which (I think) is the common cause for condor_submit to discard credentials and for condor_store_cred to refuse to accept a new password.  And it's only condor that's having this problem.
 
 
(This is 7.4.2.  I realise that there's now 7.4.4 released but getting an upgrade past the Config Control Board is hard because this is on a classified network.  I can certainly do an upgrade though if this is a known bug that has been fixed).
 
 
thanks,
 
--
William Brodie-Tyrrell, B.E, Ph.D
Systems Engineer
Modelling & Analysis
 
Direct + 61 8 8343 3376
william.brodie-tyrrell@xxxxxxxxxxxxxxxx
 
Saab Systems
21 Third Avenue, Mawson Lakes
SA 5095 Australia
www.saabsystems.com.au
------------------------
This e-mail is private and confidential between the sender and the addressee. 
In the event of misdirection, the recipient is prohibited from using, copying or disseminating it or any information in it. Please notify the above if any misdirection  _______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/