[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Job Runtime Problem
- Date: Tue, 16 Jul 2013 13:48:49 -0400
- From: Matthew Farrellee <matt@xxxxxxxxxx>
- Subject: Re: [HTCondor-users] Job Runtime Problem
On 07/16/2013 01:23 PM, Vishal Shah wrote:
When submitting N instances of a job, generally N/2 jobs run in the
expected time and the other N/2 jobs take longer to complete. The system
has 10 nodes each with 32 slots and uses a shared filesystem
(GlusterFS). All of the executables and data files are located on the
shared file system; however, the problem does not seem to be an I/O or
When submitting 2 instances, the two times are the following:
When submitting 22 instances, the difference in times are more drastic.
The two categories that the times fall into are the following:
Does anybody have insight into this issue?
Share your goal, so we can tell what the issue may be.
FYI, condor_submit <-> condor_schedd communication is very chatty and
the condor_schedd is single threaded. The schedd may have ignored your
submit for a period while doing some job maintenance, which resulted in
a 18min runtime for submit.