3. Don't run any compute intensive or memory intensive tasks on the
server acting as your central manager, and make your central manager
Linux (even if most of your pool consists of Windows or Mac
servers).
4. You can tell the condor_negotiator to use multiple CPU cores on
your central manager. So if, for example, your central manager has
16 or more CPU cores and is only running the condor_collector and
condor_negotiator, I'd suggest adding to the config:
# Use multiple threads in the negotiator
to improve performance; default value is 1
NEGOTIATOR_NUM_THREADS = 8
5. Extend CLAIM_WORKLIFE to something longer than 20 minutes; this
controls how long a slot can be claimed by a specific user and
reused for multiple jobs without going back to the negotiator. The
upside of making this longer is less work for the negotiator, the
downside is the system will be slower to react to moving slots from
low priority users to high priority users. E.g.
# Extend the amount of time a user can
re-use a slot for multiple jobs to 200 min (default is 20min)
CLAIM_WORKLIFE = 12000
6. Given you are talking about 10k slots and a significant number
of short jobs, your bottleneck may not be the negotiator, but the
schedd's ability to launch jobs at a rate of greater than 10 per
second. Thus you may want to have multiple schedds (i.e.
horizontally scale to multiple schedds). Also do not submit jobs to
the schedds one at a time; e.g. use "queue 500" or job late
materialization if able to submit in batches of more than 500 (see
https://htcondor.readthedocs.io/en/v9_1/users-manual/submitting-a-job.html#submitting-lots-of-jobs).
If you have an overburdened/slow schedd, it will slow down the
negotiation cycle.
7. Worst case: you can horizontally scale negotiators. For example,
your central manager could have one condor_collector and two
condor_negotiators, where each negotiator is responsible for finding
matches for 50% of the startds in your pool. The trick here is to
use knob NEGOTIATOR_SLOT_CONTRAINT to tell each negotiator which
slots they are matching. I doubt you will need to go this far, but
it is a technique we've used with pools that have more than 200,000
slots...