[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] 6.6.0 upgrade



On Tuesday, Jan 13, 2004, at 11:14 America/New_York, Mike Smorul wrote:
We've tried installing 6.6.0 on several RedHat 9 clients. Our central
manager is running 6.4.7 for now. After running for a few hours, all the
6.6.0 hosts will disappear on the central manager even though all the
daemons are running fine.

We see a very similar situation on a mix of Sparc Solaris systems under Condor 6.6.0. In our situation, we see the machines disappear one hour after bringing the flock online. The central manager is a Solaris 8 MU6 system, the rest of the flock are a mix of Solaris 8 and 9.


The only messages about their disappearance is in the CollectorLog

I found other errors in our log files; for example, the central manager's negotiator reported


---------- Started Negotiation Cycle ----------
Phase 1:  Obtaining ads from collector ...
  Getting all public ads ...
Couldn't fetch ads: communication error
Aborting negotiation cycle

I plan on re-upgrading our flock to 6.6.0 (it's currently downgraded back to 6.4.7) to check into this further. As I'm reviewing the logs I captured from our brief 6.6.0 run, I'm seeing other messages that might be red herrings, or might be further symptoms ("DC_AUTHENTICATE attempt to open invalid session..."); I want to see if those errors happen before or at the one hour failure point.

\bob


-- \def\bob{Bob Krzaczek, RIT Center for Imaging Science, krz@xxxxxxxxxxx}

Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>