[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] POST script user privileges in DAG






If I'm understanding correctly what you want to do, I think a combination of category throttles and priorities would do what you want. You could do something like this:

 Job D0 download.sub # single threaded
 Job P0 preprocess.sub # requires a lot of memory
 Job C0 calculate.sub # uses lots of cores
 Job R0 remove.sub  Â# cleans up input files
 Job S0 summarize.sub # takes a while mostly I/O bound

 VARS D0 id="<uuid0>"
 VARS P0 id="<uuid0>"
 VARS C0 id="<uuid0>"
 VARS R0 id="<uuid0>"
 VARS S0 id="<uuid0>"

 PARENT D0 CHILD P0
 PARENT P0 CHILD C0
 PARENT C0 CHILD R0 S0 # remove and summarize can run in parallel?

 MAXJOBS nfs_limit 10
 CATEGORY D0 nfs_limit
 CATEGORY P0 nfs_limit
 CATEGORY C0 nfs_limit
 CATEGORY R0 nfs_limit
 # S0 not here because it doesn't depend on downloaded files

 PRIORITY P0 10
 PRIORITY P0 100
 PRIORITY C0 1000
 PRIORITY R0 10000
 # Not sure about priority for summarize

If you do something like this, your DAG should start out by submitting 10 download jobs. When the first download job finishes, the corresponding preprocess job will be submitted before any more download jobs, because of the higher priority. Then, as you work your way along, calculate jobs will be favored over preprocess jobs, and remove jobs will be the most favored.

I'm running through the example mentally and I'm not sure how using MAXJOBS and PRIO will help.
Lets say D requires 1 core, P requires 4 cores, C requires 8 cores and R and S both require 1 core free
And there are two worker node situations, one where i have exclusive access to 15 x 8-core nodes and another where I have 4 x 8-core nodes

With 15 x 4 core VMs what would happen is
1. 10 x D jobs would start across multiple VMs
2. Will have a mix of D and P jobs with D+P = 10
3. Once the first P job finishes, a C job would run and since D+P+C = 10, I will always be using less than my total number of VMs (15) so the C jobs will always be able to run

However, if the # of VMs is lower than the nfs_limit (4x 8-core vms)
1. 10 x D jobs would start with some VMs having more than one job
2. Same as above, combination of D and P jobs running
3. The first P job that finishes will try to queue a C job that requires 8 cores (an entire node) to be free. Its possible that all 4 workers are running D and P jobs and even though the C job has higher priority, when a D or P job finishes another D or P job will be queued instead because resources are not available for the C job. This is the situation I am afraid would happen. Ideally, all the D and P jobs should run on the same workers instead of having say 1 D and 1 P job on every worker.

This is just a simplified example but I have seen it happen where most of the jobs that are run are jobs that require few resources and the jobs that need more resources just stay queued since jobs are not evicted to make room for higher priority jobs. I am running jobs using vanilla universe and I do not have defragment process enabled.

Thanks,
Ying