[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] 7.4.2 / 7.4.4: condor_q trouble when pool PCs suddenly are powered off !?!



Hi,

I have a  linux (Fedora 12) condor master with condor version 7.4.2.
The Windows XP pool PCs are all running condor version 7.4.4.

The condor master is having trouble to produce the condor_q output at times when 
the pool PCs are switched off:

==============
$ condor_q
11/22 22:04:05 condor_read(): timeout reading 5 bytes from schedd at 
<115.105.120.71:60614>.
11/22 22:04:05 IO: Failed to read packet header
11/22 22:04:05 SECMAN: reconnected to schedd at <115.105.120.71:60614> from port 
52251 to send unauthenticated command 1111 QMGMT_CMD
11/22 22:04:26 condor_read(): timeout reading 5 bytes from schedd at 
<115.105.120.71:60614>.
11/22 22:04:26 IO: Failed to read packet header
11/22 22:04:46 condor_read(): timeout reading 5 bytes from schedd at 
<115.105.120.71:60614>.
11/22 22:04:46 IO: Failed to read packet header

-- Failed to fetch ads from: <115.105.120.71:60614> : condor.dns.org
==============


The pool PC is a set of over 300 public library Windows XP PCs, which are all 
centrally powered off at the same time in the evening. For a while the condor 
master keeps hanging on to the PCs' status before the poweroff (understandably, 
as it has no clue what has happened to the "disappeared" PCs). After a while the 
condor master then abandons whatever was going on on the PCs. During the 
transition time, the condor_q command seems to have trouble producing useful 
output.

Is this a "feature" or a bug?

Thanks,
Rob.