UNCLASSIFIED Hi, I am new to Condor and was wondering what schemes people in
the Condor community use to manage the amount of data and network traffic their
jobs produce. For example, my Condor requirements are that I submit a job
specifying an input and output file via command line arguments i.e.
MyExecutable.exe -in inputFile -out outputFile. In extreme cases, each batch
may contain 10 million jobs, each creating an output file 1GB in size. I would
want the output files to be transferred back to the submit machine as the jobs
complete in order to limit network traffic, and the submit machines won't have
a large amount of direct attached storage so each execution machine in the pool
would have direct attached storage for the output files. I will then have a
daemon to bring all the output files together to a central location for
analysis. Does this sound like a feasible solution? Is there a better solution and how would this be implemented
i.e. network architecture, ClassAds etc? How do other users in the Condor community deal with large
data files and network traffic? PL IMPORTANT: This email remains the property of the Department of Defence and is subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in error, you are requested to contact the sender and delete the email.
|