[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Ways to limit total job output?





Steven Timm wrote:

On Thu, 28 Dec 2006, Erik Paulson wrote:

On Thu, Dec 28, 2006 at 02:07:39PM -0600, Steven Timm wrote:
I have a large cluster of execution machines on which I run
five VM's apiece, all of which share the same 250GB staging area.
Is there any mechanism within condor to limit the total amount
of disk I/O before the job is killed? I don't see anywhere that the
classad is keeping track of this quantity, just an initial check
between the disk that the job claims it needs and the actual disk
available to the VM.  Any ideas?  Obviously it would be nice
to have such a feature so the one rogue job doesn't kill the other 4.

There is a DiskUsage attribute in the job ad, and I believe that it is
updated through the lifetime of the job, and that the startd always has
the most recent number available to it. You could use it as part of a
PREEMPT/KILL expression.

Nope.  All the jobs in my queue are showing the same initial DiskUsage
of 10000 that they started with, whether running or not.  And
these jobs are running in the job's execute directory.  This is
true for several different schedd's, both condor 6.8.1 and condor 6.8.2.
What's the next idea?

My (perhaps too quick) reading of the code says that DiskUsage is measured every STARTER_UPDATE_INTERVAL by the starter. This information is then sent to the shadow so that it will get updated in the job ClassAd when the job exits, but I don't see any place where the information is getting sent to the startd for use in policy expressions such as PREEMPT/KILL.

Therefore, I don't think you can enforce DiskUsage limits in the startd policy expressions, but you could do so with periodic_hold or periodic_remove, because these are evaluated by the shadow. It really would be better to be able to do this from the startd side, so, assuming I am right about the flow of DiskUsage information, this seems like a good feature request: propagate DiskUsage information to the startd.

--Dan