[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] ImageSize increase too big



Hi,

we observed an unexplainable jump in the imagesize of an job.


-- Schedd: atlas2.atlas.local : <10.20.30.2:38705?... @ 05/05/18 12:52:07
 ID         OWNER            SUBMITTED     RUN_TIME ST PRI SIZE   CMD
2728869.0   XXXXX         4/28 11:29   7+00:14:54 R  0   9766.0	  XXXXXX 

But it never was using that much memory:

000 (2728869.000.000) 04/28 11:29:25 Job submitted from host: <10.20.30.2:38705?addrs=10.20.30.2-38705+[--1]-38705>
001 (2728869.000.000) 04/28 11:29:47 Job executing on host: <10.10.20.16:33435?addrs=10.10.20.16-33435+[--1]-33435>
006 (2728869.000.000) 04/28 11:29:56 Image size of job updated: 48676
006 (2728869.000.000) 04/28 11:34:56 Image size of job updated: 188768
006 (2728869.000.000) 04/28 11:39:56 Image size of job updated: 237552
006 (2728869.000.000) 04/28 11:44:57 Image size of job updated: 272380
006 (2728869.000.000) 04/28 12:19:59 Image size of job updated: 7411552
006 (2728869.000.000) 04/28 12:24:59 Image size of job updated: 7522440
...
006 (2728869.000.000) 05/05 02:16:14 Image size of job updated: 7522984
001 (2728869.000.000) 05/05 02:43:55 Job executing on host: <10.10.17.14:46639?addrs=10.10.17.14-46639+[--1]-46639>
001 (2728869.000.000) 05/05 04:36:51 Job executing on host: <10.10.23.1:46285?addrs=10.10.23.1-46285+[--1]-46285>
007 (2728869.000.000) 05/05 06:32:57 Shadow exception!
001 (2728869.000.000) 05/05 07:00:50 Job executing on host: <10.10.9.13:41637?addrs=10.10.9.13-41637+[--1]-41637>

The job still runs on 10.10.9.13 with in the expected memory usage.

The imagesize however is
condor_q 2728869 -l|grep "^Image"
ImageSize = 10000000
ImageSize_RAW = 7522980

Which hasn't been manipulated by the user.

Is this a known issue?

We are running condor 8.6.
Do you need more config or logs?


Cheers,
Henning