I am not a Condor specialist, but I work in the area of performance engineering.
As your subject implies, slow response coupled with low CPU usage often indicates the system is i/o bound, but it is also possible there is some kind of resource lock that is forcing operations to be single-threaded when they don't need to be.
To understand whether i/o is a factor it is important to know how much raw data is being processed, i.e. what is the total size of the 20,000 files? If it takes 40 minutes to process 160 files, that is 15 seconds per file. Typical i/o subsystems can process data at rates ranging from 1 to 100 Megabytes per second (depends on device, read versus write, random versus sequential, block/packet size and other stuff). So if your average file size is 15 Megabytes or more that could explain the throughput limitation.
Next question is what kind of i/o, disk or network? You can start by logging in to the machine where the files are stored and running "iostat -dx 10 300", which will track the number of bytes being read and written over five minutes. You can compare that to the specs for the disk, or simply do a copy of a big file and time it yourself to determine whether that's near the limit.
Even if the disk capacity is not the problem there is the possibility that you are network bound. You say this is a single-node test - but is there a chance that within that test the application does some network i/o? If the files you are using are located on a remote disk (use the "df" command to see which filepaths are mounted from remote hosts) it could be the network, not the physical disk, causing the problem. You need to know whether the networks for your system are configured as 100 megabit versus 1 gigabit (high end systems have 10 gigabit networks) - but remember a "bit" is 1/8 of a "byte", so the 1 gigabit network has a limitation around 125 Megabytes per second. While the test is running you can login to the machine and run "sar -n DEV 10 36" to check network traffic.
The final possibility (since you know you're not CPU bound) is that there is contention on some other kind of system or application resource that is slowing down operations and/or causing work to be single-threaded across the run. Pay particular attention to any external service (e.g. a web service call that runs slowly) but if the job is self-contained you'd have to use a profiling tool to find any opportunities to tune any underlying bottlenecks.
More relevant, I think, is whether running Condor jobs in parallel could help you finish the batch faster. If the average job job takes 40 minutes, as your sample did, it would take 120 x 40 minutes = 80 hours. Since you indicated the complete run finishes in 24 hours, it is clear that Condor is succeeding in scheduling the jobs concurrently. You could look into whether you could increase the number of condor nodes per host or more likely figure out if there's a way you could spread the work over multiple machines. For example, is there a good reason why all the files end up in the same location?
It would be desirable for condor to somehow track Disk capacity as it does CPU. Also pretty nifty if condor tools made it easier to manage clusters of hosts, maybe a Group ClassAd that advertised the network speed and aggregate disk resource for a ParallelSchedulingGroup mapping to a physical machine cluster or rack. If anyone knows of Condor features that help in these areas, I'd be keenly interested to learn about them.