[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] dagman: Possible to run a "PRE" script on same node as the main program will run?

Hi all,

maybe tricky maybe stupid question:

Quite a few of our users need to read data from our data servers and of course 
would not like to thrash those with too many jobs hitting the servers at the 
same time.

Initially I thought, hey dagman has the ability of PRE scripts and along with 
-maxpre this should solve this problem. However, the manual states

"Scripts are optional for each job, and any scripts are executed on the 
machine from which the DAG is submitted; this is not necessarily the same 
machine upon which the node's Condor or Stork job is run. Further, a single 
cluster of Condor jobs may be spread across several machines. "

If I understand this correctly, it is not guaranteed that the pre script is 
run on the same compute/worker node than the main task but rather on the 
submit host - is that correct?

If so, how should I model this:

I have 20k jobs each needing to read in say 5 GB of data from central file 
servers, each job will run for a couple of hours on this data and to spare the 
servers from too much load, I want to limit the number of nodes reading from 
the data servers at any time.

Thus, I would like to run a "PRE"-like script which copies data from the data 
servers to the local workers' disk where the main task can then read from. 
That in itself will not reduce the load of the data servers, however, if I 
could limit the number of "PRE"-like jobs and ensure that only say 50 are 
running at any time, but many more compute tasks can be running simultaneously 
over time.

Any idea how to do that[1]? Is the problem clear enough?



[1] Other then starting to write "lock" files into a "lock" directory to keep 
track who is allowed to read data at the moment...