[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] POST script user privileges in DAG



Dear list,

I have a NFS (network share) that I use to share files between my various condor workers and my pipeline starts with a submission file downloading large datasets to NFS and eventually processing and deleting this data. Something like this:
download (1 core, some memory) -> process (lots of cores and memory) -- POST (remove downloaded files) --> calculate metrics on output files (few cores, little memory). I don't want to have my workers all doing the 'download' job and none of them being open for 'processing' job and overloading the space I have. The process jobs are shared with other DAGs that work w/smaller datasets that are not deleted (so i dont want to hardcode a delete in there).

Since the NFS could quickly reach capacity due to the size of the input files, I created a POST script that will remove the input files if the $RETURN is 0 (exit successfully). However, I suspect I am running into a permission error since the files are not being deleted (my submit node has access to NFS, however, the user does not have permissions to remove a file created by nobody:nogroup).

When files are created by my condor workers VMs, the user/group is nobody:nogroup (I am running Ubuntu 12.04) while if the POST script creates a file the user:group is the same as the user:group that ran the condor_submit_dag. I was wondering if it was possible to keep the user:group the same when running the POST script and if there was any tips on debugging PRE/POST scripts since stdout and stderr don't seem to be captured.

As a sidenote, I am open to suggestions on better ways of creating pipelines that involve downloading large datasets. Originally I was planning on downloading these large datasets into scratch space on each worker VM, however, it did not seem straightforward to force all job submission files in DAG to run on the same worker VM so instead I planned on downloading to NFS and use the --max-idle parameter in condor_submit_dag to limit the number of datasets on NFS (since too many idle jobs would be queued if only the download job ran).

Thanks,
Ying Wu