[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] more information on incorrect RemoteUserCpu and -currentrun



And even more information : I noticed that the dates below are from 2021 - there was an admin âoopsâ with a date command, and this is likely responsible for all the slots having disconnected and reconnected. OTOH, it still stands, that the run time is shown is no longer a run time:

â> grep 49718.318 ferm.49718.log
000 (49718.318.000) 2022-11-24 11:10:44 Job submitted from host: <145.107.7.239:9618?addrs=145.107.7.239-9618+[2a07-8500-120-e070--3ef]-9618&alias=visar.nikhef.nl&noUDP&sock=schedd_1153_70c9>
001 (49718.318.000) 2022-11-24 11:11:33 Job executing on host: <145.107.5.45:9618?addrs=145.107.5.45-9618+[2a07-8500-120-e070--52d]-9618&alias=wn-lot-045.nikhef.nl&noUDP&sock=startd_58987_c10f>
006 (49718.318.000) 2022-11-24 11:11:42 Image size of job updated: 976780
[ â ]
006 (49718.318.000) 2022-12-04 09:56:08 Image size of job updated: 976780
006 (49718.318.000) 2022-12-10 22:02:15 Image size of job updated: 976780
022 (49718.318.000) 2021-10-02 12:58:00 Job disconnected, attempting to reconnect
023 (49718.318.000) 2021-10-02 12:58:01 Job reconnected to slot1_26@xxxxxxxxxxxxxxxxxxxx
022 (49718.318.000) 2022-12-14 14:38:05 Job disconnected, attempting to reconnect
023 (49718.318.000) 2022-12-14 14:38:05 Job reconnected to slot1_26@xxxxxxxxxxxxxxxxxxxx
â[kiwish-4.2]-(gofact_extendrange_ganymede/log)-[git:master*]-
â> condor_q -allusers -nobatch -currentrun -pr $HOME/an1.cpf 49718.318


-- Schedd: visar.nikhef.nl : <145.107.7.239:9618?... @ 12/15/22 15:21:35
JOB_ID    Username CMD                       CPUS MEMREQ   ST    RUN_TIME    WorkerNode
49718.318 templon  ferm.condor 368           1    128.0 MB R      1+00:43:30 wn-lot-045

Definition of RUN_TIME: RemoteUserCpu AS " RUN_TIME" PRINTAS CPU_TIME

JT