[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] ImageSize increase too big



That should have read "between 1/2 and 5 times"...

Tom

ïOn 5/8/18, 9:22 AM, "Thomas Patrick Downes" <downes@xxxxxxx> wrote:

    Henning:
    
    A couple suggestions.
    
    (1) Use ResidentSetSize rather than ImageSize which over-counts memory usage. RSS = RAM usage identically. The manual actually suggests ProportionalSetSize which doesn't seem to exist anymore.
    (2) Understand that both ResidentSetSize and ImageSize are rounded by the schedd. Their unrounded values are ResidentSetSize_raw and ImageSize_raw.
    
    According to the manual, the default behavior is to round these values by 25% by "order of magnitude". This is a bit hard to understand so break out your logarithms.
    
    The pattern is that if you're between 1/5 and 5 times a given power of 10, then it rounds to ceil(value/25% of that power of 10)*value. Does that make sense?
    
    Here are some corresponding values:
    
    32500 30024
    35000 34492
    37500 35832
    40000 37748
    75000 57688
    100000 76448
    225000 201740
    250000 225504
    275000 261584
    300000 289272
    325000 309824
    350000 328628
    425000 422872
    750000 541520
    1000000 900412
    1250000 1003828
    1500000 1362132
    1750000 1516100
    2000000 1873404
    2250000 2044680
    2500000 2250200
    7500000 5369316
    10000000 7597772
    17500000 15783236
    
    Tom
    `
    On 5/7/18, 8:24 AM, "HTCondor-users on behalf of Henning Fehrmann" <htcondor-users-bounces@xxxxxxxxxxx on behalf of henning.fehrmann@xxxxxxxxxx> wrote:
    
        Hi,
        
        we observed an unexplainable jump in the imagesize of an job.
        
        
        -- Schedd: atlas2.atlas.local : <10.20.30.2:38705?... @ 05/05/18 12:52:07
         ID         OWNER            SUBMITTED     RUN_TIME ST PRI SIZE   CMD
        2728869.0   XXXXX         4/28 11:29   7+00:14:54 R  0   9766.0	  XXXXXX 
        
        But it never was using that much memory:
        
        000 (2728869.000.000) 04/28 11:29:25 Job submitted from host: <10.20.30.2:38705?addrs=10.20.30.2-38705+[--1]-38705>
        001 (2728869.000.000) 04/28 11:29:47 Job executing on host: <10.10.20.16:33435?addrs=10.10.20.16-33435+[--1]-33435>
        006 (2728869.000.000) 04/28 11:29:56 Image size of job updated: 48676
        006 (2728869.000.000) 04/28 11:34:56 Image size of job updated: 188768
        006 (2728869.000.000) 04/28 11:39:56 Image size of job updated: 237552
        006 (2728869.000.000) 04/28 11:44:57 Image size of job updated: 272380
        006 (2728869.000.000) 04/28 12:19:59 Image size of job updated: 7411552
        006 (2728869.000.000) 04/28 12:24:59 Image size of job updated: 7522440
        ...
        006 (2728869.000.000) 05/05 02:16:14 Image size of job updated: 7522984
        001 (2728869.000.000) 05/05 02:43:55 Job executing on host: <10.10.17.14:46639?addrs=10.10.17.14-46639+[--1]-46639>
        001 (2728869.000.000) 05/05 04:36:51 Job executing on host: <10.10.23.1:46285?addrs=10.10.23.1-46285+[--1]-46285>
        007 (2728869.000.000) 05/05 06:32:57 Shadow exception!
        001 (2728869.000.000) 05/05 07:00:50 Job executing on host: <10.10.9.13:41637?addrs=10.10.9.13-41637+[--1]-41637>
        
        The job still runs on 10.10.9.13 with in the expected memory usage.
        
        The imagesize however is
        condor_q 2728869 -l|grep "^Image"
        ImageSize = 10000000
        ImageSize_RAW = 7522980
        
        Which hasn't been manipulated by the user.
        
        Is this a known issue?
        
        We are running condor 8.6.
        Do you need more config or logs?
        
        
        Cheers,
        Henning
        _______________________________________________
        HTCondor-users mailing list
        To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
        subject: Unsubscribe
        You can also unsubscribe by visiting
        https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
        
        The archives can be found at:
        https://lists.cs.wisc.edu/archive/htcondor-users/