[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[condor-users] Windows Condor disappearing/reappearing on grid



No one ever responded to this problem, so I'm asking again:

I seem to be in a situation where machines disappear off the grid even
though they are both idle and continuing to operate. Rebooting each machine
in question seems to relieve the problem. More strangely, it sometimes seems
as if simply logging into the machine and doing a "condor_status" and
waiting a few minutes is all that's needed.

Does anyone have any suggestions?

In a related note: How do I determine/configure the e-mail address that
Condor uses as a "from" address when sending notifications? Our e-mail
system appears to be rejecting mail from condor because it's not from a
valid address.

-----Original Message-----
From: Heinz, Michael William [mailto:michael_heinz@xxxxxxxxx] 
Sent: Monday, November 10, 2003 11:48 AM
To: 'condor-users@xxxxxxxxxxx'
Subject: [condor-users] Windows Condor silently hanging after a day or so...


So,

I've got a small (4 node) demo installation of running Condor on XP and W2K
boxes. But I'm noticing that frequently when I come in in the morning,
condor_status has only one or two machines in it. The only way to restore
the other machines is to go to each one and perform a "net stop condor"
followed by a "net start condor".

The master log for all the machines contain many records like these:

11/8 13:38:19 DaemonCore: Command received via UDP from host
<54.14.48.190:4813> 11/8 13:38:19 DaemonCore: received command 60014
(DC_INVALIDATE_KEY), calling handler (handle_invalidate_key()) 11/8 16:13:19
DaemonCore: Command received via UDP from host <54.14.48.190:1120> 11/8
16:13:19 DaemonCore: received command 60014 (DC_INVALIDATE_KEY), calling
handler (handle_invalidate_key()) 11/8 23:48:23 DaemonCore: Command received
via UDP from host <54.14.48.190:1887> 11/8 23:48:23 DaemonCore: received
command 60014 (DC_INVALIDATE_KEY), calling handler (handle_invalidate_key())

Are these messages relevant to the problem? What do I need to change to keep
my grid running overnight?

Condor Support Information: http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with unsubscribe
condor-users <your_email_address>


Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>