[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Condor communication between submit machine and workers during a run



I am running Condor on a private network with one machine that acts as submit/master/scheduler for a number of worker machines. All the machines are CentOS7.

I am having a problem understanding/troubleshooting some Condor permission-related issues that can be briefly described as follows:


To clarify: each Condor job consists of several sequential simulations (i.e., NOT one Condor job = one simulation), with each simulation requiring an updated input file that depends on previous simulations that have been performed on all Condor workers. So I cannot package all the required input files and send them with each worker upon condor_submit. Neither can I wait for a Condor job to finish before collecting the output.

Some additional information:ÂI have installed ssh keys across the network such that each of the worker machine can passwordless ssh back to the submit machine and vice versa. In addition, I am running Condor jobs as the user and can successfully CONDOR_SSH_TO_JOB and manually perform the file transfers between the worker and the submit using rsync or scp. But I of course need to automate this.

Happy to provide more information. Thanks for any help.

Wes Zell