[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] windows network issues and CONDOR

Title: windows network issues and CONDOR


We lost a network router for a couple of hours. After the network was restored, I noticed that the CONDOR daemons across the entire cluster were sitting in various error states. We are running CONDOR 6.6.9 under the Vanilla universe.

Snippit of masterlog from the submit machine:

3/15 10:48:49 Send_Signal: ERROR Connect to <> failed.3/15 10:48:49 ERROR: failed to send signal 15 to pid 2496

3/15 10:48:50 Can't connect to <>:0, errno = 10061
3/15 10:48:50 Will keep trying for 10 seconds...
3/15 10:48:59 Connect failed for 10 seconds; returning FALSE
3/15 10:48:59 ERROR:
SECMAN:2003:TCP connection to <> failed

We were forced to kill the condor services on each individual machine. (The services did not respond to a STOP signal). Not a big deal with only a half-dozen machines, but as our cluster grows, this won't continue to be the case.

How do other CONDOR window users deal with these kinds of issues? Have you built scripts to perform these kinds of network maintenance issues? Are there CONDOR utilities (that I'm obviously unaware of) that resolve these kinds of problems? Or do I need to upgrade to a new version of CONDOR?

Thanks for any and all suggestions,

Tammy Chin
CATHENA Code Development Section
Thermalhydraulics Branch
J.L. Grey Engineering Centre, Stn. E6
Atomic Energy of Canada Ltd
Chalk River, ON  K0J 1P0

Phone: 613.584.8811 x5010
Fax:     613.584.8023

Email: chint@xxxxxxx


This e-mail, and any attachments, may contain information that
is confidential, subject to copyright, or exempt from disclosure.
Any unauthorized review, disclosure, retransmission, 
dissemination or other use of or reliance on this information 
may be unlawful and is strictly prohibited.  


Le présent courriel, et toute pièce jointe, peut contenir de 
l'information qui est confidentielle, régie par les droits 
d'auteur, ou interdite de divulgation. Tout examen, 
divulgation, retransmission, diffusion ou autres utilisations 
non autorisées de l'information ou dépendance non autorisée 
envers celle-ci peut être illégale et est strictement interdite.