[HTCondor-users] Scalability of condor

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

Hi All

I’ve been experimenting with using condor_credd to allow run_as_owner in submit files.

I’ve tested this successfully on a small test pool. Linux central manager (8.8.13), windows submit node,

windows condor_credd node, windows execute nodes, (all 8.8.12). Submit nodes and condor_cred node

are running windows server 2016, submit nodes 8-core 32Gb RAM, condor_credd node 4-core 16Gb RAM

(test credd node – would probably go 8-core 32Gb RAM for production).

All works OK.

Then was able to make it work across 2 pools using the one condor_credd node by making the condor_credd node

report to both pools, i.e.

CONDOR_HOST = test-pool-cm, other-pool-cm

in condor_config on the condor_credd node.

Our production system has 9 pools (with flocking enabled across all) with a total of approximately 2,000+ machines

and 10,000+ slots/cores.

Typically have a maximum of ~5,000 cores available at any one time (user activity, machines off overnight, etc.) and

therefore a max of ~5,000 single core jobs running simultaneously.

Does anyone have a feel for how the single condor_credd node would handle this?

OK?

Sluggish?

Curl up and die?

Thanks for any help/advice/comments/suggestions.

Cheers

Greg

Mailing List Archives