[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job finished with status 115



Ah,

OK, sorry for the rude reply :( 

This means the job finished somehow naturally (jobscript ended most likely) and the slot is set free (maybe joined back with the mother slot in case it is a partitionable slot).

Another scenario would be that the run-time of the last job was smaller than 'CLAIM_WORKLIFE' (defined on the sched - default 20min) in that case the slot would not be freeed but the scheduler would try to send another similar job from the same user to this slot. 


Best
christoph

-- 
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx

----- UrsprÃngliche Mail -----
Von: "Jean-Claude CHEVALEYRE" <jean-claude.chevaleyre@xxxxxxxxxxxxxxxxx>
An: "Christoph Beyer" <christoph.beyer@xxxxxxx>
CC: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Donnerstag, 11. Februar 2021 11:07:52
Betreff: Re: [HTCondor-users] Job finished with status 115

Hello Christop,


Yes, but it's not really clear for me. Waht does thaht mean exactly ?

Thanks
Jean-Claude

----- Mail original -----
De: "Christoph Beyer" <christoph.beyer@xxxxxxx>
Ã: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Cc: "Jean-Claude CHEVALEYRE" <chevaleyre@xxxxxxxxxxxxxxxxx>
EnvoyÃ: Jeudi 11 FÃvrier 2021 10:57:59
Objet: Re: [HTCondor-users] Job finished with status 115

115 	JOB_EXITED_AND_CLAIM_CLOSING 	the job exited (not killed) but the condor_startd 
		is not accepting any more jobs on this claim 



-- 
Christoph Beyer 
DESY Hamburg 
IT-Department 

Notkestr. 85 
Building 02b, Room 009 
22607 Hamburg 

phone:+49-(0)40-8998-2317 
mail: christoph.beyer@xxxxxxx 


Von: "Jean-Claude CHEVALEYRE" <jean-claude.chevaleyre@xxxxxxxxxxxxxxxxx> 
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx> 
CC: "Jean-Claude CHEVALEYRE" <chevaleyre@xxxxxxxxxxxxxxxxx> 
Gesendet: Donnerstag, 11. Februar 2021 10:34:19 
Betreff: [HTCondor-users] Job finished with status 115 

Hello, 

I have some Atlas jobs that are failling. I have look in the logs files. 
I can see by example for this jobs number 93742.0. This job finished with a status 115 . What does means exactly this status ? 

Bellow are some extract of logs outputs: 

[root@gridarcce01 log]# grep -RH '93742' arc/arex-jobs* | more 
arc/arex-jobs.log-20210211:2021-02-10 23:45:00 Finished - job id: 6PwKDm5cYTynOUEdEnzo691oABFKDmABFKDmzcfXDmDBFKDmDTZXHm, unix user: 41000:1307, name: "arc_pilot", owner: "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN 
=atlpilo1/CN=614260/CN=Robot: ATLAS Pilot1", lrms: condor, queue: grid, lrmsid: 93742.gridarcce01 


[root@gridarcce01 log]# grep -RH '93742' condor/EventLog | more 

condor/EventLog: 937428 - ResidentSetSize of job (KB) 
condor/EventLog:006 (24968.000.000) 12/18 10:32:49 Image size of job updated: 937424 
condor/EventLog:006 (26125.000.000) 12/19 11:22:07 Image size of job updated: 937424 
condor/EventLog:006 (26254.000.000) 12/19 16:32:57 Image size of job updated: 937424 
condor/EventLog:006 (26254.000.000) 12/19 16:37:57 Image size of job updated: 937424 
condor/EventLog: 937424 - ResidentSetSize of job (KB) 
condor/EventLog: 937420 - ResidentSetSize of job (KB) 
condor/EventLog:006 (71776.000.000) 01/21 00:35:38 Image size of job updated: 937428 
condor/EventLog:006 (73442.000.000) 01/22 02:29:37 Image size of job updated: 937428 
condor/EventLog: 937428 - ResidentSetSize of job (KB) 
condor/EventLog:006 (78058.000.000) 01/26 02:56:24 Image size of job updated: 937428 
condor/EventLog:000 (93742.000.000) 02/09 04:12:28 Job submitted from host: <193.55.252.153:9618?addrs=193.55.252.153-9618&noUDP&sock=3115801_e73c_4> 
condor/EventLog:001 (93742.000.000) 02/09 19:03:03 Job executing on host: <193.55.252.169:9618?addrs=193.55.252.169-9618&noUDP&sock=2279_c86d_3> 
condor/EventLog:006 (93742.000.000) 02/09 19:03:11 Image size of job updated: 2304 
condor/EventLog:006 (93742.000.000) 02/09 19:08:11 Image size of job updated: 67160 
condor/EventLog:006 (93742.000.000) 02/09 19:13:12 Image size of job updated: 110340 
condor/EventLog:006 (93742.000.000) 02/09 19:18:13 Image size of job updated: 1410420 
condor/EventLog:006 (93742.000.000) 02/09 19:23:13 Image size of job updated: 1887892 
condor/EventLog:006 (93742.000.000) 02/09 19:33:15 Image size of job updated: 1887892 
condor/EventLog:005 (93742.000.000) 02/10 23:38:21 Job terminated. 


condor/ShadowLog.old:02/10/21 11:43:04 (93742.0) (3863434): Time to redelegate short-lived proxy to starter. 
condor/ShadowLog.old:02/10/21 23:38:21 (93742.0) (3863434): File transfer completed successfully. 
condor/ShadowLog.old:02/10/21 23:38:21 (93742.0) (3863434): Job 93742.0 terminated: exited with status 0 
condor/ShadowLog.old:02/10/21 23:38:21 (93742.0) (3863434): WriteUserLog checking for event log rotation, but no lock 
condor/ShadowLog.old:02/10/21 23:38:21 (93742.0) (3863434): **** condor_shadow (condor_SHADOW) pid 3863434 EXITING WITH STATUS 115 


[root@gridarcce01 log]# grep -RH '93742' condor/SchedLog | more 
condor/SchedLog:02/10/21 23:38:21 (pid:3115849) Shadow pid 3863434 for job 93742.0 exited with status 115 
condor/SchedLog:02/10/21 23:38:21 (pid:3115849) Match record (slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <193.55.252.169:9618?addrs=193.55.252.169-9618&noUDP&sock=2279_c86d_3> for group_ATLAS.atlasprd_score.atlasprd, 937 
42.0) deleted 


Any ideas are welcome. 

Thanks 
Jean-Caude 

------------------------------------------------------------------------ 
Jean-Claude Chevaleyre < Jean-Claude.Chevaleyre(at)clermont.in2p3.fr > 
Laboratoire de Physique Clermont 
Campus Universitaire des CÃzeaux 
4 Avenue Blaise Pascal 
TSA 60026 
CS 60026 
63178 AubiÃre Cedex 

Tel : 04 73 40 73 60 

------------------------------------------------------------------------- 

_______________________________________________ 
HTCondor-users mailing list 
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a 
subject: Unsubscribe 
You can also unsubscribe by visiting 
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users 

The archives can be found at: 
https://lists.cs.wisc.edu/archive/htcondor-users/