[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] POST script user privileges in DAG




With 15 x 4 core VMs what would happen is
 You mean 15 x 8 core VMs, right?
Â
ÂSorry, yea I ment 8core

You're saying that all D jobs should run on one machine, and all P jobs should run on one (different) machine, right? You can at least encourage HTCondor to run those jobs on the same machine by doing this in your submit files:

 For D jobs:
  Rank = machine == "...D machine..."
 For P jobs:
  Rank = machine == "...P machine..."
 where you replace "...D machine..." and "...P machine..." with the
 actual machine names.

This would be hardcoding in the hosts into the submit files right?
Â
(You can be more aggressive by using Requirements intead of Rank to force the jobs to only run on the given machine, if that's what you want.)

This is just a simplified example but I have seen it happen where most of
the jobs that are run are jobs that require few resources and the jobs that
need more resources just stay queued since jobs are not evicted to make room
for higher priority jobs. I am running jobs using vanilla universe and I do
not have defragment process enabled.

Okay, I guess I misunderstood a bit what you're trying to achieve. I thought that your big limitation was that you didn't want to have more than 10 (or whatever number) sets of input files on NFS at the same time.

Is your main objective to try to run as many calculate jobs as you have VMs for? If that's the case, you should be able to do it by throttling the D and P jobs, and encouraging or forcing them to run on specific machines, and then not throttling the C jobs. The DAG node priorities are also passed along to HTCondor, so if you have some D, some P, and some C jobs the queue, HTCondor will try to match the C jobs first.

My first priority was to make sure the that NFS would have enough space to process the files so the jobs wouldn't fail (I already have limits for concurrency). I originally set out to achieve this by having a post script remove the downloaded files after one of the DAG steps.
My next priority was to make sure I was utilizing resources efficiently which was related to the first priority by not having all my workers running download.
I think by adding in the delete as a node in the DAG and through careful throttling/priorities settings I could probably achieve these tasks.
Â

At any rate, if you use a node job instead of a POST script to delete your input files, I think you'll avoid the permissions problem (that was the original question, right?).

ÂYes, that was the original problem and it is solved. I got a bit sidetracked by this new issue, thanks for your help!

Best,
Ying