[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Matching to not responding machines



Thank you for this idea.
I'll have a look into it.

Regards,
Hermann
On Wed, 2012-03-28 at 12:14 +0200, Rob de Graaf wrote:
> Hi Hermann,
> 
> On 03/28/2012 11:32 AM, Hermann Fuchs wrote:
> > However, I would like to implement some kind of a failure detection for
> > the running grid as network problems will and do occur.
> > Is there a query which is only answered when the machines do
> > communicate?
> > condor_status seems to be misleading, the machines listed there which
> > stopped communicating remain there in some cases (e.g. the mentioned
> > case).
> 
> You could use INVALIDATE_STARTD_ADS (man condor_advertise) to make the 
> collector forget about specific machines. You would need to know which 
> machines to invalidate. The only way I can think of right now is to ask 
> them directly (condor_status -direct or maybe condor_config_val) and 
> check the exit status of those commands. The downside of this approach 
> is that you will have to endure a timeout for every machine that has the 
> problem. If you have hundreds or thousands of machines, it will quickly 
> become unfeasible.
> 
> Alternatively, you could tweak CLASSAD_LIFETIME on the collector to make 
> it forget about unresponsive machines more quickly, but it might also 
> accidentally invalidate working machines if any updates get lost on the 
> network. See: 
> http://research.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#SECTION004316000000000000000
> 
> Regards,
> 
> Rob
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
> 

-- 
-------------
DI Hermann Fuchs
Christian Doppler Laboratory for Medical Radiation Research for Radiation Oncology
Department of Radiation Oncology
Medical University Vienna
Währinger Gürtel 18-20
A-1090 Wien

Tel.  + 43 / 1 / 40 400 7271
Mail. hermann.fuchs@xxxxxxxxxxxxxxxx