[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Most of the time in Condor jobs gets wasted in I/o



Hi Dimitri Maziuk,

I understand that nfs is not meant for higher data transfer. But whatever the resources I have, I can test some of the ideas I get from this thread.


I tried using "should_transfer_files = NO" but I see no difference.

Second, I see you suggested to transfer file to local area i.e /var/tmp/ and you suggested script for that too. I can certainly try this but just wondering that "cp" command will work for the data of say 150GB on one node.(15TB divided into 100 jobs/nodes). Certainly it works but take lot of time to copy. Is there any other command I can use instead of cp.which is meant to be faster

-Harinder



On Wed, Apr 24, 2013 at 11:28 PM, Dimitri Maziuk <dmaziuk@xxxxxxxxxxxxx> wrote:
On 04/24/2013 03:36 PM, Dr. Harinder Singh Bawa wrote:
> Hi,
> Idea of Transferring files to worker node also comes to my mind but I am
> not sure how to do this.

One way would be to run your job as a DAG and have a PRE script that
does 'for i in xargs -a file.list ; do cp /scratch/$i /var/tmp ; done'
(the job then reads from /var/tmp): see
http://research.cs.wisc.edu/htcondor/manual/v7.8/2_10DAGMan_Applications.html

You could also use 'transfer_input_files', but the above is probably
simpler
(http://research.cs.wisc.edu/htcondor/manual/v7.8/2_5Submitting_Job.html#SECTION00354200000000000000)

> One more thing I dont understand is that if I test to run , one such
> command interactively on one such node, it finishes in 40 minutes. The same
> command in condor doesn't finishes in more than 1 day

It only looks bad if you haven't seen it before. We tried to have BLAST
mmap the same set of 2GB database files over NFS. I think it only took a
couple of dozen nodes, nowhere near 120, to push overall i/o wait times
over 24 hours. It is surprising how bad nfs gets once the filesystem is
over ~75% full, too...

--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Dr. Harinder Singh Bawa
Experimental High Energy Physics
ATLAS Experiment
@CERN, Geneva