[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Most of the time in Condor jobs gets wasted in I/o
- Date: Wed, 24 Apr 2013 22:01:09 +0200
- From: "Dr. Harinder Singh Bawa" <harinder.singh.bawa@xxxxxxxxx>
- Subject: Re: [HTCondor-users] Most of the time in Condor jobs gets wasted in I/o
Thanks for your reply. I am also understanding the system. Few of your queries are:
>>>>To understand whether i/o is a factor it is important to know how much raw data is being processed, i.e. what is the total size of the 20,000 files?
**********The total size of 20,000 files is 16TB.
What I am doing is that, I have fulllist.txt file which contains name of 20k files and its path. Since my cluster have 120 nodes, I split 20k files into 20 parts. So I have instead of one file of 20k inputfiles, I have 120 listfiles containg 20k/120 files = 200files approx running on each node.
All 20k files are on /rdata2 dir. When I submit 120 jobs on 120 nodes, Each job which is now getting 200 files take input from /rdata2 dir.(parallely). So each job needs approx 16TB/120= 500GB of input from /rdata2.
You said: So if your average file size is 15 Megabytes or more that could explain the throughput limitation. Then it hints to your stated limitations.
PS: BTW, I am not able to run the following command:
"iostat -dx 10 300"
it says iostat command not found. Is this some OS specific? I am using linux .
following is df command. I run condor jobs on /disk dir and /rdata2 is dir containing all input files.
[bawa@t3nfs ~]$ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 10153988 5656168 3973704 59% /
203147960 20286340 172375860 11% /nfs/t3nfs/share/atlas
203147960 191892 192470308 1% /nfs/t3nfs/share/pilot
50786940 184272 47981228 1% /nfs/t3nfs/share/osg
10157368 154236 9478844 2% /nfs/t3nfs/home
101573920 32019424 64311616 34% /NFSv3exports/opt-csu
253934980 18641784 222185996 8% /NFSv3exports/home
10157368 160592 9472488 2% /tmp
10157368 9293392 339688 97% /var
10157368 1922196 7710884 20% /opt
1032090368 1500980 978162228 1% /NFSv3exports/archive
182833140 191952 173204004 1% /vmsystems
253934980 1567704 239260076 1% /disk
50786940 13433864 34731636 28% /var/cache/cvmfs2
10157368 161428 9471652 2% /var/log/condor
10157368 211448 9421632 3% /var/lib/condor
203147960 191892 192470308 1% /nfs/t3nfs/share/pandat3-output
/dev/sdc1 14640611456 409333600 14231277856 3% /NFSv3exports/rdata1
/dev/sdd1 41012297692 19402289068 21610008624 48% /NFSv3exports/rdata2
tmpfs 12336172 0 12336172 0% /dev/shm
xrootdfs 10153988 7633004 2520984 76% /xrootdfs/atlas
pt3head:/xdata 3565950376 386530112 2998280400 12% /nfs/t3head/xdata
10157368 154256 9478824 2% /nfs/t3head/condor-etc
On Wed, Apr 24, 2013 at 9:20 PM, David Hentchel <dhentchel@xxxxxxxxx>
I am not a Condor specialist, but I work in the area of performance engineering.
As your subject implies, slow response coupled with low CPU usage often indicates the system is i/o bound, but it is also possible there is some kind of resource lock that is forcing operations to be single-threaded when they don't need to be.
To understand whether i/o is a factor it is important to know how much raw data is being processed, i.e. what is the total size of the 20,000 files? If it takes 40 minutes to process 160 files, that is 15 seconds per file. Typical i/o subsystems can process data at rates ranging from 1 to 100 Megabytes per second (depends on device, read versus write, random versus sequential, block/packet size and other stuff). So if your average file size is 15 Megabytes or more that could explain the throughput limitation.
Next question is what kind of i/o, disk or network? You can start by logging in to the machine where the files are stored and running "iostat -dx 10 300", which will track the number of bytes being read and written over five minutes. You can compare that to the specs for the disk, or simply do a copy of a big file and time it yourself to determine whether that's near the limit.
Even if the disk capacity is not the problem there is the possibility that you are network bound. You say this is a single-node test - but is there a chance that within that test the application does some network i/o? If the files you are using are located on a remote disk (use the "df" command to see which filepaths are mounted from remote hosts) it could be the network, not the physical disk, causing the problem. You need to know whether the networks for your system are configured as 100 megabit versus 1 gigabit (high end systems have 10 gigabit networks) - but remember a "bit" is 1/8 of a "byte", so the 1 gigabit network has a limitation around 125 Megabytes per second. While the test is running you can login to the machine and run "sar -n DEV 10 36" to check network traffic.
The final possibility (since you know you're not CPU bound) is that there is contention on some other kind of system or application resource that is slowing down operations and/or causing work to be single-threaded across the run. Pay particular attention to any external service (e.g. a web service call that runs slowly) but if the job is self-contained you'd have to use a profiling tool to find any opportunities to tune any underlying bottlenecks.
More relevant, I think, is whether running Condor jobs in parallel could help you finish the batch faster. If the average job job takes 40 minutes, as your sample did, it would take 120 x 40 minutes = 80 hours. Since you indicated the complete run finishes in 24 hours, it is clear that Condor is succeeding in scheduling the jobs concurrently. You could look into whether you could increase the number of condor nodes per host or more likely figure out if there's a way you could spread the work over multiple machines. For example, is there a good reason why all the files end up in the same location?
It would be desirable for condor to somehow track Disk capacity as it does CPU. Also pretty nifty if condor tools made it easier to manage clusters of hosts, maybe a Group ClassAd that advertised the network speed and aggregate disk resource for a ParallelSchedulingGroup mapping to a physical machine cluster or rack. If anyone knows of Condor features that help in these areas, I'd be keenly interested to learn about them.
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at:
Dr. Harinder Singh Bawa
Experimental High Energy Physics