[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor (down) nodes status



Hi Brian,

 

Indeed, that was what I was looking for, thanks.

 

I could test ghings with this config :

ABSENT_REQUIREMENTS = True

ABSENT_EXPIRE_ADS_AFTER = 30*3600*24

COLLECTOR_PERSISTENT_AD_LOG = $(LOG)/AbsentLog

EXPIRE_INVALIDATED_ADS = True

 

De : HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] De la part de Brian Bockelman
Envoyé : mardi 1 juillet 2014 14:53
À : HTCondor-Users Mail List
Objet : Re: [HTCondor-users] condor (down) nodes status

 

Hi Frederic,

 

I believe you are looking for the "absent ads" feature:

 

 

I link to the 8.2 manual, but I believe this was introduced in 7.6.

 

Brian

 

On Jul 1, 2014, at 7:33 AM, SCHAER Frederic <frederic.schaer@xxxxxx> wrote:



Hi,

 

Ah, not great…  I guess I’d be able to work that around with a script parsing the history (but parsing classads might not be that easy for the newbies that I am), or even just by building an auto-updated “nodes” file with puppet...

I’m wondering though how people do debug batch issues if they can’t even identify there are failing nodes from a batchsystem point of view ?

 

I guess people have monitoring scripts checking for the presence of a stard process (at least), and probably some other trivial things (but which ones ?) in order to be sure the start processes are correctly registered in the pool ?

 

Regards

 

De : HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] De la part de Marc Volovic
Envoyé : mardi 1 juillet 2014 12:42
À : HTCondor-Users Mail List
Objet : Re: [HTCondor-users] condor (down) nodes status

 

You can see drained nodes with condor_status.

For nodes that are down, that is a more difficult question – I'd do it using an external means.

 

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of SCHAER Frederic
Sent: Tuesday, July 01, 2014 1:14 PM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] condor (down) nodes status

 

Hi,

 

I’m used to torque, in which there is a “pbsnodes –l” command that displays nodes that are down or drained.

Strangely, I don’t find how to see this information in condor : what would be the condor way of finding this information ?

 

I’m sure this can become hard when the pool is dynamic, but even then there must be traces of nodes which belonged to the pool “one day” or in the last X days ?

 

Thanks

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/