[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_restart and missing machines (enhancement request)

I'm sure I've seen this behaviour too (Windows machines not reporting in), as, sometimes, condor_q -g shows more jobs running than condor_status does. Not sure what to do about it. I have just taken the view of trying to grow our pool -- and those we can flock to -- so some "drop outs" don't matter so much.

Kewley, J (John) wrote:

I am currently still epxeriencing problems with the reports from Windows PCs
failing to arrive at the central manager. I suspect this problem will go away
when they are all upgraded from 6.6 to 6.8, but in the meantime have this request:

If you do condor_restart -master -all
all machines in your pool are sent a message to restart (subject to HOSTALLOW
settings of course).

Note that this sends request to machines that aren't currently reporting in.

What I would like is to be able to say:
   condor_restart -master -MIA
which would restart all the ones which aren't currently reporting in (missing in action),
but leave the others alone.

Is there any other way at getting at this full list of machines?

I could do this myself if I had a condor_MIA
command, but I can't see how to do this without storing the information myself.

Another alternative is of course to use TCP for the heartbeats to try and prevent this.



Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/

Ian Cottam
Information Systems Manager
Manchester Interdisciplinary Biocentre
The John Garside Building (Room G.002)
The University of Manchester
e: ian.cottam@xxxxxxxxxxxxxxxx
t: 0161 306 5198
m: 07856 849831