[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Designing a good Scheduling approach



Hi all,

we are trying to set up HTCondor to schedule our Deep-Learning research.
While we have a basic system setup and running, it still does not really cover our requirements perfectly.

First, we would like to have 3 user-groups of ascending priority:

    1. Deadline
    2. Staff
    3. Students

Which should receive different quotas, accepting Surplus. After reading the documentation I figure this will be easily implemented using hierarchical groups with quotas accepting surplus (anything non-obvious to consider here?).


However, we would also like to factor the GPU as a resource in the priority calculation scheme (see section 3.6.4). Unfortunately we did not find any way to access the formula and directly factor resource into the calculation scheme. Is this possible at all?

As an alternative workaround, we found out that while SLOT_WEIGHT may not be set to a custom resource (see p.257 of the docs for release 8.8), we should be able to always set it to 1 by setting NEGOTIATOR_USE_SLOT_WEIGHTS to FALSE (see p.298). Further, we are able to add availability of GPUs as a custom resource into the consumption policy (section 3.7.1, p. 391). However, we are unsure what effect this would have on the resources required by the job. Would 1 GPU requested now count as 1 resource (by applying quantize(target.RequestGpus,{1} for example)?

E.g. an overall job costing 2 resources CPU and 1 resource Memory would, with the addition of a consumption policy for GPUs, now also cost 1 resource GPU?

Does this workaround seem feasible to you? Any other ideas on how to get the priority calculation focused on GPU usage as a resource?

Best and thanks for taking the time to read this!
Oliver