[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] DC_AUTHENTICATE: sent DC_INVALIDATE



Hi Zachary,

Thanks for your replay.

But I already have the UPDATE_INTERVAL = 15 for both MASTER and STARTD
(15 sec for showing current status quickly)

This set up works fine when my pool have condor 7.0.1 in all system    
I have restarted Collector(Central Manager) only. That time there is no such issue about session.

so what went wrong...

by
Johnson




On Fri, 2008-07-18 at 09:29 -0500, Zachary Miller wrote:
> 7/18 18:43:42 DC_AUTHENTICATE: attempt to open invalid session
> scorpio:28204:1216378448:16, failing.
> 7/18 18:43:42 DC_AUTHENTICATE: sent DC_INVALIDATE
> scorpio:28204:1216378448:16 to <10.201.42.234:55636>.
> 
> If I restart the Executor Machine Manually.Then only it is showing up
> while Checking condor Status.


hello,

after restarting your collector, it could take 5 to 10 minutes (using the
default configuration) for the machine to show up again in the collector.
but if you wait, it definitely should.  this is normal behavior.


here's a more detail explanation for anyone interested:

the collector and startd have established a security session.  when the
collector gets restarted, it loses its side of the session.  the next time the
startd tries to update the collector, it does so using the old session, which
fails (the log message "attempt to open invalid session").  the collector then
sends a UDP message (the log message about DC_INVALIDATE) to the startd to let
it know that it needs to clear the old sessions and start a new one.  after
that, things should work fine.

as to why the other machine showed up without restarting, it is just a matter
of timing.  startd updates to the collector normally occur every 5 minutes.
i'm guessing the windows machine just happened to get through the above process
first.  eventually (within 10 minutes) all machines should be reporting to the
collector without restarting them.

you can change this default of 5 minutes to something shorter if you want the
machines to show up more quickly after a restart.  you could try setting:
  UPDATE_INTERVAL = 60
  # one minute update interval

then machines will report much more often to the collector.  the upside is
that your pool will be back to normal after 2 minutes instead of 10.  the
downside is increased network traffic to your collector.  if you have a
large enough pool, making the UPDATE_INTERVAL too small will eventually
overload your collector.

i hope this helps, and please let me know if you have more questions.


cheers,
-zach

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/

Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com