[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] check which job is running on a wn at specific time

Hello Christoph,
not sure this address your need, but i wrote this some time ago:

[root@farm-ops sdalpra]# hjobs.py | head -4
JobId RemoteOwner GlobalJobId JobStart Cpus Machine TotalCpus LoadAvg CPUsUsage
5730832.42 ashish sn-01 2020-07-03:16:35:17 4 wn-204-13-05-05-a 40.0 1.0 0.0
764591.0 alicesgm008 ce04-htc 2020-07-07:02:39:28 1 wn-205-11-39-01-a 38.0 0.99 0.943992672915 1800684.0 belleprd ce02-htc 2020-07-07:20:48:27 1 cn-610-03-12 72.0 1.01 0.999113357875

to imitate ouput of bjobs -u all -w -r
which was one of my most frequently used LSF commands.
it is based on python bindings, but it can be reproduced with some amount of condor_status fu

On 08/07/20 11:34, Beyer, Christoph wrote:

this seems like an every-day-htc-admin-problem to me, so lateral brain in gear everyone :)

When it comes to certain effects on a workernode I often would like to know if it is job related or not, hence I would like to check quickly which jobs were running on a host at a certain point in time.

I know this sounds not spectacular but as you need to check active queue and history at the same time and get the timestamps right, maybe someone scripted somethin already to get a quick result ?

As an option I thought about running 'condor_who >> /var/log/condor/who.log' every couple of minutes or so but I am uncertain if this would put too much load on the sched or collector as the condor_who command seems to run around quite a bit to gather it's statistic ...