[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] condor_q shows a huge SIZE and condor job complete time is longer than expected



Hi,

We have a test case which takes between 20mins to 30mins to complete locally, but takes around 50 mins to finish when run as a condor job. We do not see any problem from the log:

 Partitionable Resources : Usage Request Allocated
ÂÂÂ ÂÂÂ ÂÂ CpusÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ :ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 11ÂÂÂÂÂÂÂ 11
ÂÂÂ ÂÂÂ ÂÂ Disk (KB)ÂÂÂÂÂÂÂÂÂÂÂ :ÂÂÂÂ 1644ÂÂÂÂ 1700ÂÂÂÂ 62159
ÂÂÂ ÂÂÂ ÂÂ Memory (MB)ÂÂÂÂÂÂÂÂÂ :ÂÂÂÂ 2006ÂÂÂÂ 2100ÂÂÂÂÂ 2100

But condor_q command displays a huge SIZE of the job 17089.8MB. Manual condor_q (http://research.cs.wisc.edu/htcondor/manual/current/condor_q.html) shows the definition of SIZE:

SIZE
(Non-batch mode only) The peak amount of memory in Mbytes consumed by the job; note this value is only refreshed periodically. The actual value reported is taken from the job ClassAd attribute MemoryUsageÂif this attribute is defined, and from job attributeÂImageSizeÂotherwise.
the Size Should come from MemoryUsage (if defined) or ImageSize (Otherwise). Condor_q shows the attributes of this job:

ÂÂÂ ÂÂÂ MemoryUsage = ( ( ResidentSetSize + 1023 ) / 1024 )
ÂÂÂ ÂÂÂ ImageSize = 17500000
ÂÂÂ ÂÂÂ ImageSize_RAW = 15226024

Apparently, the SIZE matches ImageSize attribute of this job. So why does this job have huge ImageSize? Based on manual (http://research.cs.wisc.edu/htcondor/manual/v7.6/7_3Running_Condor.html#SECTION008310000000000000000), I added

Requirements = Memory > 2100

to submit file, but after this change, the job takes more than 6 hours to complete. I hope someone can answer some of my questions or give me some hints on what is going on:
1. Why this condor job run time is always about twice of the local machine run time?
2. How SIZE is calculated?
3. Why does a simple addition of "Requirements = Memory > 2100" affect the run time dramatically?

Thank you for your time and help in advance,
Zhuo