[HTCondor-users] Job Runtime Problem

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

Hello,

When submitting N instances of a job, generally N/2 jobs run in the expected time and the other N/2 jobs take longer to complete. The system has 10 nodes each with 32 slots and uses a shared filesystem (GlusterFS). All of the executables and data files are located on the shared file system; however, the problem does not seem to be an I/O or network bottleneck.

When submitting 2 instances, the two times are the following:

Instance 1

real 7m13.950s

user 5m36.766s

sys 0m14.436s

Instance 2

real 6m2.555s

user 5m35.747s

sys 0m13.170s

When submitting 22 instances, the difference in times are more drastic. The two categories that the times fall into are the following:

Category 1:

real 18m28.193s

user 5m39.153s

sys 0m15.111s

Category 2:

real 6m12.578s

user 5m36.433s

sys 0m12.644s

Does anybody have insight into this issue?

Thanks,
Vishal

Mailing List Archives

Public Access

[HTCondor-users] Job Runtime Problem