[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Job Runtime Problem


When submitting N instances of a job, generally N/2 jobs run in the expected time and the other N/2 jobs take longer to complete. The system has 10 nodes each with 32 slots and uses a shared filesystem (GlusterFS). All of the executables and data files are located on the shared file system; however, the problem does not seem to be an I/O or network bottleneck.

When submitting 2 instances, the two times are the following:

Instance 1
real 7m13.950s
user 5m36.766s
sys 0m14.436s

Instance 2
real 6m2.555s
user 5m35.747s
sys 0m13.170s

When submitting 22 instances, the difference in times are more drastic. The two categories that the times fall into are the following:

Category 1:
real 18m28.193s
user 5m39.153s
sys 0m15.111s

Category 2:
real 6m12.578s
user 5m36.433s
sys 0m12.644s

Does anybody have insight into this issue?