[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Restrincting the number of jobs copying data




Hello Mark -

We have encountered a similar problem at Notre Dame, and solved it by limiting the maximum number of active clients at the file server.

If you are willing to use our Chirp file server to serve data for your Condor jobs, you can easily set it up to limit the maximum number of clients to, say, 50, and then to disconnect clients that have been idle for, say, 5 seconds.  On the server side, just run "chirp_server -M 50 -t 5".  Then, in your Condor job, use the chirp_get and chirp_put commands to copy data to and from the server.

Chirp is quite easy to build and get going, you can get more information here:

http://www.cse.nd.edu/~ccl/software/chirp

Cheers,
Doug


On Wed, Jun 17, 2009 at 3:26 AM, Mark Assad <massad@xxxxxxxxx> wrote:
Hi,

  I have a situation where I'd like to limit the number of nodes
which are able to read files from a file server and would appreciate
any hints on how to go about doing it.

 In summary we have about 500 CPU nodes all trying to read data from
a single file server. The way the jobs are currently set up is that
when they start they copy the data they are going to work on from the
file server to the local node, and then process that data. When the
processing is finished, they copy the data back to the file server.

 What I would like to be able to do is submit all the jobs to Condor,
then restrict the number of jobs that are allowed to be copying, as
each job finishes copying its data, the processing part of the job
will start. Allowing a new job will be started for copying data.
Each job has three parts, Part A copies data, then Part B processes
the data, then Part C writes the results. I'd like to limit the number
of jobs that are in Part A and C at any one time, while at the same
time allow any number of jobs to be in Part B.

 The 'simple' solution would be to only allow 50 jobs to run. But,
once the 50 jobs have finished copying the data, they can start doing
the processing, and then let the next 50 start copying data.

 I'm pretty much at a loss as to where to start to create this
restriction, so any hints would be appreciated.

Thank-you,
Mark Assad
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/