[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] problems using transfer_output_remaps



Adam,

Two things:

1. transfer_output_remaps only applies when you turn on Condor's file-transfer mode. I will add a warning so that condor_submit lets you know when you use remaps but have not turned on file-transfers. Here is how you turn on transfers:

ShouldTransferFiles = True
WhenToTransferOutput = ON_EXIT


2. The output and error files are handled specially for you, so you should never need to explicitly "remap" them. For these files, just specify the final path where you want the files. (And make sure you have turned on file-transfers.) If you look at the resulting ClassAd (with condor_q -l), you will see that your error file will be automatically modified to a temporary filename that will be used in the execute directory, and on download, it will be remapped to the final path that you specified.

--Dan

Adam Lathers wrote:

Hi all,

I'm having some issues using the transfer_output_remaps option in a submit file. Specifically, I'm submitting a DAG as a proof of concept to work out the bugs before implementing a similar solution for our big data processing codes. Essentially, the layout of our architecture looks something like this. Our pool manager host (schedd, collector, negotiator), exists "outside" our trusted realm, so it has no access to our shared filesystem. All the worker nodes exist inside the trusted realm, and all share a filesystem. (Yes, I know there are some security paradigm issues there, but I can't solve those presently). What I do need to deal with is, the data we will be working with is "big"...total in and out data is something in the order of 100GB presently, and presently, it's not segmented into "small" pieces, so each worker node, were it to ship the input data, would have to grab a 20-50GB dataset before processing started.

My goal in the short term is basically this. I'd like to rely on the shared file system, and just "mimic" what I need to on the submit node. Thus far, this works, but to make it happen, I need to duplicate a directory structure on the submit node to look just like the worker nodes. What I'd "prefer" to do is leverage the transfer_output_remaps option, so that when logs and output and such get shipped back to the submit machine, it just goes into a single large log directory, with some sort of intelligent naming mechanism.


an example submit that I've tried looks something like this.
(note, for the transfer_output_remaps, I've also tried just naming A.err and so on. Maybe I just missed the proper permutation?)

Universe        = vanilla
Executable      = /home/alathers/condor_matlab/condor_test/matlab.sh
InitialDir      = /home/alathers/condor_matlab/condor_test
Error = /home/alathers/condor_matlab/condor_test_submitdir/ A.err Log = /home/alathers/condor_matlab/condor_test_submitdir/ A.log transfer_output_remaps = "/home/alathers/condor_matlab/ condor_test_submitdir/A.err = /home/alathers/condor_matlab/logs/A.err"
GetEnv          = true
Arguments	= A
Requirements 	= FileSystemDomain == "ncmir.ucsd.edu"
Notification    = Error
Notify_user     = alathers@xxxxxxxxxxxxxx
Queue


In the end, when the job finishes, the .log and .err files are sent back to the submit node, and put in /home/alathers/condor_matlab/ condor_test_submitdir/

I'm sure I'm forgetting some vital piece of info, so please feel free to let me know. Any thoughts, or insight would be REALLY appreciated. As noted, I know there are a LOT of problems with the present approach, but for various reasons my role is to solve this step first, before redesigning the process. Thanx everyone.


_______________________________________________________
Adam Lathers
NCMIR: National Center for Microscopy and Imaging Research
Distributed Systems Engineer
phone: (858) 822-0735
fax:   (858) 822-0828
web:   http://ncmir.ucsd.edu


_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users