[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_q shows a huge SIZE and condor job completetime is longer than expected

I think 50mins is the time period from idle to run, then complete.
You can try "condor_reschedule" after you submit your job.

More "Requirements" may make less resource avalible. 


------------------ Original ------------------
From:  "Zhuo Zhang";<zhuo.zhang@xxxxxxxx>;
Date:  Tue, Jul 11, 2017 10:43 PM
To:  "HTCondor-Users Mail List"<htcondor-users@xxxxxxxxxxx>;
Subject:  [HTCondor-users] condor_q shows a huge SIZE and condor job completetime is longer than expected


We have a test case which takes between 20mins to 30mins to complete locally, but takes around 50 mins to finish when run as a condor job. We do not see any problem from the log:

    Partitionable Resources :    Usage  Request Allocated
           Cpus                 :                11        11
           Disk (KB)            :     1644     1700     62159
           Memory (MB)          :     2006     2100      2100

But condor_q command displays a huge SIZE of the job 17089.8MB. Manual condor_q (http://research.cs.wisc.edu/htcondor/manual/current/condor_q.html) shows the definition of SIZE:

(Non-batch mode only) The peak amount of memory in Mbytes consumed by the job; note this value is only refreshed periodically. The actual value reported is taken from the job ClassAd attribute MemoryUsage if this attribute is defined, and from job attribute ImageSize otherwise.
the Size Should come from MemoryUsage (if defined) or ImageSize (Otherwise). Condor_q shows the attributes of this job:

        MemoryUsage = ( ( ResidentSetSize + 1023 ) / 1024 )
        ImageSize = 17500000
        ImageSize_RAW = 15226024

Apparently, the SIZE matches ImageSize attribute of this job. So why does this job have huge ImageSize? Based on manual (http://research.cs.wisc.edu/htcondor/manual/v7.6/7_3Running_Condor.html#SECTION008310000000000000000), I added

Requirements = Memory > 2100

to submit file, but after this change, the job takes more than 6 hours to complete. I hope someone can answer some of my questions or give me some hints on what is going on:
1. Why this condor job run time is always about twice of the local machine run time?
2. How SIZE is calculated?
3. Why does a simple addition of "Requirements = Memory > 2100" affect the run time dramatically?

Thank you for your time and help in advance,