On Wednesday, April 20, 2011 at 9:30 PM, Lenou, Peter (Contractor) wrote:
I am in the process of setting up a dedicated Condor pool.
The jobs that will be run will summarise results in output files. In order to
reduce network traffic, I don’t want to send these files back to the
submitter, but rather to a network file server. Is this the best way to do
this? At the moment, I wrap the executable I want to run in a batch file that
moves the output files to the file server. Is there any functionality within
Condor i.e. in the ClassAd that will transfer output files directly to a
dedicated file server?
Condor's file transfer mechanisms, as far as I know, always use the condor_shadow daemon on the scheduler to proxy the transfers to and from the remote execute node. This is definitely true in the case where the scheduler and the execute node don't share a file system (FILESYSTEM_DOMAIN on the machines doesn't match). When it does match I believe it only causes the job to run on the shared file system instead of the remote execute node's temporary working directory that Condor sets up for the job. If they match it'll also mean Condor will try to write the stderr/stdout streams from the job directly to the shared file system instead of on local disk and then copying them back to the shared file system at the end of the job. But anything requiring a transfer (like the combination of should_transfer_files=yes and transfer_input=true) will still use the shadow to move the file to the execute node.
Your approach, with a wrapper script to do your data transfer, is definitely the best way to do data transfer without having to worry if it's going to end up passing through the scheduler for the job.