[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] POST script user privileges in DAG



On Fri, 6 Feb 2015, Ying W wrote:

I'm running through the example mentally and I'm not sure how using MAXJOBS
and PRIO will help.
Lets say D requires 1 core, P requires 4 cores, C requires 8 cores and R and
S both require 1 core free
And there are two worker node situations, one where i have exclusive access
to 15 x 8-core nodes and another where I have 4 x 8-core nodes

With 15 x 4 core VMs what would happen is
  You mean 15 x 8 core VMs, right?
1. 10 x D jobs would start across multiple VMs
2. Will have a mix of D and P jobs with D+P = 10
3. Once the first P job finishes, a C job would run and since D+P+C = 10, I
will always be using less than my total number of VMs (15) so the C jobs
will always be able to run

However, if the # of VMs is lower than the nfs_limit (4x 8-core vms)
1. 10 x D jobs would start with some VMs having more than one job
2. Same as above, combination of D and P jobs running
3. The first P job that finishes will try to queue a C job that requires 8
cores (an entire node) to be free. Its possible that all 4 workers are
running D and P jobs and even though the C job has higher priority, when a D
or P job finishes another D or P job will be queued instead because
resources are not available for the C job. This is the situation I am afraid
would happen.

DAGMan doesn't know anything about how many cores a node job requires. So what will actually happen when a P job finishes is that the corresponding C job will be submitted. It may not run right away, but it will be submitted before any more P jobs are submitted, because of the priority. As more P jobs finish, more C jobs will be submitted, and eventually they'll be able to run.

Ideally, all the D and P jobs should run on the same workers
instead of having say 1 D and 1 P job on every worker.

You're saying that all D jobs should run on one machine, and all P jobs should run on one (different) machine, right? You can at least encourage HTCondor to run those jobs on the same machine by doing this in your submit files:

  For D jobs:
    Rank = machine == "...D machine..."
  For P jobs:
    Rank = machine == "...P machine..."
  where you replace "...D machine..." and "...P machine..." with the
  actual machine names.

(You can be more aggressive by using Requirements intead of Rank to force the jobs to only run on the given machine, if that's what you want.)

This is just a simplified example but I have seen it happen where most of
the jobs that are run are jobs that require few resources and the jobs that
need more resources just stay queued since jobs are not evicted to make room
for higher priority jobs. I am running jobs using vanilla universe and I do
not have defragment process enabled.

Okay, I guess I misunderstood a bit what you're trying to achieve. I thought that your big limitation was that you didn't want to have more than 10 (or whatever number) sets of input files on NFS at the same time.

Is your main objective to try to run as many calculate jobs as you have VMs for? If that's the case, you should be able to do it by throttling the D and P jobs, and encouraging or forcing them to run on specific machines, and then not throttling the C jobs. The DAG node priorities are also passed along to HTCondor, so if you have some D, some P, and some C jobs the queue, HTCondor will try to match the C jobs first.

At any rate, if you use a node job instead of a POST script to delete your input files, I think you'll avoid the permissions problem (that was the original question, right?).

Kent