Re: [HTCondor-users] Most of the time in Condor jobs gets wasted in I/o

I would suspect that your NFS share is not optimized for your deployment, or use case, which is likely causing the issue when reading+writing your files. 


If the share is common among all machines, make certain 'should_transfer_files = NO' on your submission too. Also, if you still experience long wait times you can always enforce concurrency limits on your jobs, so they don't all hit at the same shared resource at one time.

Long term, you may want to look into other distributed filesystems to reduce load on a single source e.g (Gluster, HDFS, QFS, etc.) 


Hello experts,

I am submitting 120 jobs in 120 nodes using condor. What I am basically doing is that I have approx 20,000 input files in /rdata2 dir.

/dev/sdd1              39T   19T   21T  48% /NFSv3exports/rdata2

 I have a file containing name and path of 20,000 input files (i.e Full2013.list) containing paths of the files. I split that file (containing 20,000 lines corresponds to 20,000 files) into 120 jobs as 120parts so my each job have approx. 20,000/120= 166 files.

In Condor, its taking 1 day to finish my jobs.

I ran  one job interactively which is running over one node :****Finishes in 40 min

15126.0   bawa            4/23 04:56   0+03:50:16 R  0   317.4 parallel_90.sh

Statistics for comparison:-  
real    63m57.321s
user    42m17.957s
sys     1m24.413s

Statistics for
Condor Node:

condor_q -analyze 15126.0

15126.000:  Request is being serviced

The jobs are running since 1 day, If I see Real CPUTime of this job, its
[bawa@t3nfs Wstar_sin0_NewCalib17]$ condor_q 15126.0 -cputime

 ID      OWNER            SUBMITTED     CPU_TIME ST PRI SIZE CMD              
15126.0   bawa            4/23 04:56   0+00:06:47 R  0   317.4 parallel_90.sh

If I understand correctly, CPUtime(CPU time is time of running CPU) is just 6min 47 sec Out of  RunTime which is 3 Hr 50 min
. I suspect there is something serious in data transfer going on.(i/o)

Is there any suggestion how to debug that.


