[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Scalability of condor_credd



Hi All

 

I’ve been experimenting with using condor_credd to allow run_as_owner in submit files.

 

I’ve tested this successfully on a small test pool. Linux central manager (8.8.13), windows submit node,

windows condor_credd node, windows execute nodes, (all 8.8.12). Submit nodes and condor_cred node

are running windows server 2016, submit nodes 8-core 32Gb RAM, condor_credd node 4-core 16Gb RAM

(test credd node – would probably go 8-core 32Gb RAM for production).

 

All works OK.

 

Then was able to make it work across 2 pools using the one condor_credd node by making the condor_credd node

report to both pools, i.e.

CONDOR_HOST = test-pool-cm, other-pool-cm

in condor_config on the condor_credd node.

 

Our production system has 9 pools (with flocking enabled across all) with a total of approximately 2,000+ machines

and 10,000+ slots/cores.

 

Typically have a maximum of ~5,000 cores available at any one time (user activity, machines off overnight, etc.) and

therefore a max of ~5,000 single core jobs running simultaneously.

 

Does anyone have a feel for how the single condor_credd node would handle this?

OK?

Sluggish?

Curl up and die?

 

Thanks for any help/advice/comments/suggestions.

 

Cheers

 

Greg