[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] check which job is running on a wn at specific time


May be not directly answering your question but thought this may provide some help:

Recently I came across the following command which I found very useful to get the history of jobs ran on the executor node. You need to fire this command on the executor node. It's very useful to see the jobs ran on the node submitted from different schedulers during the time of issue for troubleshooting purposes. It covers history not current runs.Â
condor_history -file `condor_config_val LOG`/startd_history -limit 2 -af remotehost globaljobid
We have clusters consisting of 400+ nodes. We do capture condor_who at intervals of 1 minute and it doesn't seem to be causing any issue for us.Â

Thanks & Regards,
Vikrant Aggarwal

On Wed, Jul 8, 2020 at 3:05 PM Beyer, Christoph <christoph.beyer@xxxxxxx> wrote:

this seems like an every-day-htc-admin-problem to me, so lateral brain in gear everyone :)

When it comes to certain effects on a workernode I often would like to know if it is job related or not, hence I would like to check quickly which jobs were running on a host at a certain point in time.

I know this sounds not spectacular but as you need to check active queue and history at the same time and get the timestamps right, maybe someone scripted somethin already to get a quick result ?

As an option I thought about running 'condor_who >> /var/log/condor/who.log' every couple of minutes or so but I am uncertain if this would put too much load on the sched or collector as the condor_who command seems to run around quite a bit to gather it's statistic ...


Christoph Beyer
DESY Hamburg

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

mail: christoph.beyer@xxxxxxx
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at: