[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job Runtime Problem



On 07/16/2013 01:23 PM, Vishal Shah wrote:
Hello,

When submitting N instances of a job, generally N/2 jobs run in the
expected time and the other N/2 jobs take longer to complete. The system
has 10 nodes each with 32 slots and uses a shared filesystem
(GlusterFS). All of the executables and data files are located on the
shared file system; however, the problem does not seem to be an I/O or
network bottleneck.

When submitting 2 instances, the two times are the following:

Instance 1
real7m13.950s
user5m36.766s
sys0m14.436s

Instance 2
real6m2.555s
user5m35.747s
sys0m13.170s

When submitting 22 instances, the difference in times are more drastic.
The two categories that the times fall into are the following:

Category 1:
real18m28.193s
user5m39.153s
sys0m15.111s

Category 2:
real6m12.578s
user5m36.433s
sys0m12.644s

Does anybody have insight into this issue?

Thanks,
Vishal

Share your goal, so we can tell what the issue may be.

FYI, condor_submit <-> condor_schedd communication is very chatty and the condor_schedd is single threaded. The schedd may have ignored your submit for a period while doing some job maintenance, which resulted in a 18min runtime for submit.

Best,


matt