[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] problem migrating to 7.4.0 on central manager



Hello All,

I'm running into a spot of bother moving our central manager
to Condor 7.4 on our production pool and could do with some
advice. The central manager runs on Solaris 10 and currently
uses 7.0.2. The execute hosts are all Win XP and currently run
Condor 7.0.2 as well. When I've tried using a test excute host all
is well but when I go to the production version I see the 
symptoms described in

https://lists.cs.wisc.edu/archive/condor-users/2008-February/msg00201.shtml

( "Re: [Condor-users] Collector error - ERROR: receiving new UDP message 
  but found a long message still waiting to be closed" )

where the (SSL) authentication seems to get hung up in a wierd state. So
the $64,000 question is how do I get the execute hosts to scrub their 
state info and start afresh rather than picking up where they left off -
preferably from my desk rather than going around each classroom !

We  have reimaging in place here so I'm hoping that this will scrub all
of the log files and solve the problem but surely there is a neater way
off doing it (incidently power cycling doesn't do work - Condor seems
to be far too smart to fall for that).

any suggestions most welcome,

cheers,

-ian.

--------------------------------------------
Dr Ian C. Smith,
e-Science Team,
The University of Liverpool,
Computing Services Department