[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Most of the time in Condor jobs gets wasted in I/o



Hi,
Idea of Transferring files to worker node also comes to my mind but I am not sure how to do this. I have one full2013.list file which contain 20k files path only. The files are in common area /scratch. What condor job does is to split full2013.list to 120 job$i.list files for 120 jobs  and those job$i.list file is copied to each worker node but still pointing to /scratch file where actual data is. Now, If I want to copy the data physically to each worker node, then I have to read job$i.list and copy the files accordingly from /scratch. That is what I am not sure how to do. I can see those files in job.list by

xargs -a file.list but How can I copy the files into worker node. and if I get some command then copying too will take very long time, No?

One more thing I dont understand is that if I test to run , one such command interactively on one such node, it finishes in 40 minutes. The same command in condor doesn't finishes in more than 1 day

thanks
Harinder

On Wed, Apr 24, 2013 at 10:22 PM, Dimitri Maziuk <dmaziuk@xxxxxxxxxxxxx> wrote:
On 04/24/2013 03:01 PM, Dr. Harinder Singh Bawa wrote:

> All 20k files are on /rdata2 dir. When I submit 120 jobs on 120 nodes, Each
> job which is now getting 200 files take input from /rdata2 dir.(parallely).
> So each job needs approx 16TB/120= 500GB of input from /rdata2.

There's more to it, e.g. exactly how they're reading the input, but in
general if you're trying to read 120x500GB over NFS in parallel, expect
it to be slow.

Try condor's file transfer and manual copying of the input files to
worker hosts (e.g. to /var/tmp), see what works best.

> PS: BTW, I am not able to run the following command:
> "iostat -dx 10 300"
>
> it says iostat command not found. Is this some OS specific? I am using
> linux .

It probably isn't installed. On redhat and derivatives it's in 'sysstat'
package.

HTH
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Dr. Harinder Singh Bawa
Experimental High Energy Physics
ATLAS Experiment
@CERN, Geneva