[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job finished with status 115



115JOB_EXITED_AND_CLAIM_CLOSINGthe job exited (not killed) but the condor_startd
  is not accepting any more jobs on this claim



--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx


Von: "Jean-Claude CHEVALEYRE" <jean-claude.chevaleyre@xxxxxxxxxxxxxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
CC: "Jean-Claude CHEVALEYRE" <chevaleyre@xxxxxxxxxxxxxxxxx>
Gesendet: Donnerstag, 11. Februar 2021 10:34:19
Betreff: [HTCondor-users] Job finished with status 115

Hello,

I have some Atlas jobs that are failling. I have look in the logs files.
I can see by example for this jobs number 93742.0. This job  finished with a status 115 . What does means exactly this status ?

Bellow are some extract of  logs outputs:

[root@gridarcce01 log]# grep -RH '93742' arc/arex-jobs* | more
arc/arex-jobs.log-20210211:2021-02-10 23:45:00 Finished - job id: 6PwKDm5cYTynOUEdEnzo691oABFKDmABFKDmzcfXDmDBFKDmDTZXHm, unix user: 41000:1307, name: "arc_pilot", owner: "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN
=atlpilo1/CN=614260/CN=Robot: ATLAS Pilot1", lrms: condor, queue: grid, lrmsid: 93742.gridarcce01


[root@gridarcce01 log]# grep -RH '93742' condor/EventLog | more

condor/EventLog:        937428  -  ResidentSetSize of job (KB)
condor/EventLog:006 (24968.000.000) 12/18 10:32:49 Image size of job updated: 937424
condor/EventLog:006 (26125.000.000) 12/19 11:22:07 Image size of job updated: 937424
condor/EventLog:006 (26254.000.000) 12/19 16:32:57 Image size of job updated: 937424
condor/EventLog:006 (26254.000.000) 12/19 16:37:57 Image size of job updated: 937424
condor/EventLog:        937424  -  ResidentSetSize of job (KB)
condor/EventLog:        937420  -  ResidentSetSize of job (KB)
condor/EventLog:006 (71776.000.000) 01/21 00:35:38 Image size of job updated: 937428
condor/EventLog:006 (73442.000.000) 01/22 02:29:37 Image size of job updated: 937428
condor/EventLog:        937428  -  ResidentSetSize of job (KB)
condor/EventLog:006 (78058.000.000) 01/26 02:56:24 Image size of job updated: 937428
condor/EventLog:000 (93742.000.000) 02/09 04:12:28 Job submitted from host: <193.55.252.153:9618?addrs=193.55.252.153-9618&noUDP&sock=3115801_e73c_4>
condor/EventLog:001 (93742.000.000) 02/09 19:03:03 Job executing on host: <193.55.252.169:9618?addrs=193.55.252.169-9618&noUDP&sock=2279_c86d_3>
condor/EventLog:006 (93742.000.000) 02/09 19:03:11 Image size of job updated: 2304
condor/EventLog:006 (93742.000.000) 02/09 19:08:11 Image size of job updated: 67160
condor/EventLog:006 (93742.000.000) 02/09 19:13:12 Image size of job updated: 110340
condor/EventLog:006 (93742.000.000) 02/09 19:18:13 Image size of job updated: 1410420
condor/EventLog:006 (93742.000.000) 02/09 19:23:13 Image size of job updated: 1887892
condor/EventLog:006 (93742.000.000) 02/09 19:33:15 Image size of job updated: 1887892
condor/EventLog:005 (93742.000.000) 02/10 23:38:21 Job terminated.


condor/ShadowLog.old:02/10/21 11:43:04 (93742.0) (3863434): Time to redelegate short-lived proxy to starter.
condor/ShadowLog.old:02/10/21 23:38:21 (93742.0) (3863434): File transfer completed successfully.
condor/ShadowLog.old:02/10/21 23:38:21 (93742.0) (3863434): Job 93742.0 terminated: exited with status 0
condor/ShadowLog.old:02/10/21 23:38:21 (93742.0) (3863434): WriteUserLog checking for event log rotation, but no lock
condor/ShadowLog.old:02/10/21 23:38:21 (93742.0) (3863434): **** condor_shadow (condor_SHADOW) pid 3863434 EXITING WITH STATUS 115


[root@gridarcce01 log]# grep -RH '93742' condor/SchedLog | more
condor/SchedLog:02/10/21 23:38:21 (pid:3115849) Shadow pid 3863434 for job 93742.0 exited with status 115
condor/SchedLog:02/10/21 23:38:21 (pid:3115849) Match record (slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <193.55.252.169:9618?addrs=193.55.252.169-9618&noUDP&sock=2279_c86d_3> for group_ATLAS.atlasprd_score.atlasprd, 937
42.0) deleted


Any ideas are welcome.

Thanks
Jean-Caude

------------------------------------------------------------------------
Jean-Claude Chevaleyre < Jean-Claude.Chevaleyre(at)clermont.in2p3.fr >
Laboratoire de Physique Clermont
Campus Universitaire des CÃzeaux
4 Avenue Blaise Pascal
TSA 60026
CS 60026
63178 AubiÃre Cedex

Tel : 04 73 40 73 60

-------------------------------------------------------------------------

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/